We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.
translated by 谷歌翻译
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.
translated by 谷歌翻译
This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S encoders. Extensive experiments on four datasets, i.e., NTU-60, NTU-120, PKU-MMD I and II, show that HiCo achieves a new state-of-the-art for unsupervised skeleton-based action representation learning in two downstream tasks including action recognition and retrieval, and its learned action representation is of good transferability. Besides, we also show that our framework is effective for semi-supervised skeleton-based action recognition. Our code is available at https://github.com/HuiGuanLab/HiCo.
translated by 谷歌翻译
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). X-Prompt instructs an LLM with not only NL but also an extensible vocabulary of imaginary words that are introduced to help represent what NL words hardly describe, allowing a prompt to be more descriptive. Like NL prompts, X-Prompt is out-of-distribution (OOD) robust, for which we propose context-guided learning with prompt augmentation to learn its imaginary words for general usability, enabling them to use in different prompt contexts for fine-grain specifications. The promising results of X-Prompt demonstrate its potential of approaching advanced interaction between humans and LLMs to bridge their communication gap.
translated by 谷歌翻译
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
translated by 谷歌翻译
Prompts with different control signals (e.g., length, keywords, etc.) can be used to control text summarization. When control signals are available, they can control the properties of generated summaries and potentially improve summarization quality (since more information are given). Unfortunately, control signals are not already available during inference time. In this paper, we propose Lotus (shorthand for Latent Prompt Tuning for Summarization), which is a single model that can be applied in both controlled and uncontrolled (without control signals) modes. During training, Lotus learns latent prompt representations from prompts with gold control signals using a contrastive learning objective. Experiments show Lotus in uncontrolled mode consistently improves upon strong (uncontrollable) summarization models across four different summarization datasets. We also demonstrate generated summaries can be controlled using prompts with user specified control tokens.
translated by 谷歌翻译
具有高分辨率(HR)的磁共振成像(MRI)提供了更详细的信息,以进行准确的诊断和定量图像分析。尽管取得了重大进展,但大多数现有的医学图像重建网络都有两个缺陷:1)所有这些缺陷都是在黑盒原理中设计的,因此缺乏足够的解释性并进一步限制其实际应用。可解释的神经网络模型引起了重大兴趣,因为它们在处理医学图像时增强了临床实践所需的可信赖性。 2)大多数现有的SR重建方法仅使用单个对比度或使用简单的多对比度融合机制,从而忽略了对SR改进至关重要的不同对比度之间的复杂关系。为了解决这些问题,在本文中,提出了一种新颖的模型引导的可解释的深层展开网络(MGDUN),用于医学图像SR重建。模型引导的图像SR重建方法求解手动设计的目标函数以重建HR MRI。我们通过将MRI观察矩阵和显式多对比度关系矩阵考虑到末端到端优化期间,将迭代的MGDUN算法展示为新型模型引导的深层展开网络。多对比度IXI数据集和Brats 2019数据集进行了广泛的实验,证明了我们提出的模型的优势。
translated by 谷歌翻译
当前的文本到视频检索方法(T2VR)经过培训和测试,并在视频捕获方向的数据集(例如MSVD,MSR-VTT和VATEX)上进行了测试。这些数据集的一个关键属性是,假定视频在短时间内被暂时预先修剪,而提供的字幕很好地描述了视频内容的要旨。因此,对于给定的配对视频和标题,该视频应该与标题完全相关。但是,实际上,由于查询尚不清楚,因此预处理的视频剪辑可能不包含足够的内容来完全满足查询。这表明文学与现实世界之间存在差距。为了填补空白,我们在本文中提出了一个新颖的T2VR子任务,称为部分相关的视频检索(PRVR)。未修剪的视频被认为是部分相关的W.R.T.给定的文本查询是否包含与查询相关的时刻。 PRVR旨在从大量未修剪视频中检索此类相关视频。 PRVR与单个视频时刻检索和视频语料库时刻的检索有所不同,因为后两个是要检索时刻而不是未修剪的视频。我们将PRVR作为多个实例学习(MIL)问题,同时将视频视为一袋视频片段和一袋视频帧。剪辑和帧表示不同时间尺度的视频内容。我们提出了一个多尺度的相似性学习(MS-SL)网络,该网络共同学习PRVR的剪辑规模和框架尺度相似性。在三个数据集(TVR,ActivityNet字幕和Charades-STA)上进行了广泛的实验,证明了该方法的可行性。我们还表明,我们的方法可用于改善视频语料库时刻的检索。
translated by 谷歌翻译
现有的假音频检测系统通常依靠专家经验来设计声学功能或手动设计网络结构的超参数。但是,人工调整参数可能会对结果产生相对明显的影响。几乎不可能手动设置最佳参数集。因此,本文提出了一种完全自动化的终端伪造音频检测方法。我们首先使用WAV2VEC预训练模型来获得语音的高级表示。此外,对于网络结构,我们使用了名为Light-Darts的可区分体系结构搜索(飞镖)的修改版本。它学习了深厚的语音表示,同时自动学习和优化包括卷积操作和残留块组成的复杂神经结构。 ASVSPOOF 2019 LA数据集的实验结果表明,我们提出的系统达到的错误率(EER)为1.08%,这表现优于最先进的单个系统。
translated by 谷歌翻译
住房质量是区域财富,安全和健康的重要代理。了解住房质量的分布对于揭示农村发展状况并提供政治建议至关重要。但是,目前的农村房屋质量数据在很大程度上取决于在国家或省级的自上而下,耗时的调查,但未能在村庄一级解开住房质量。为了填补准确描述农村住房质量条件和数据不足之间的空白,我们收集大量的农村图像,并邀请用户按大规模评估其住房质量。此外,提出了一个深度学习框架,以根据众包农村图像自动有效地预测住房质量。
translated by 谷歌翻译