学习高质量的对话表示对于解决各种面向对话的任务至关重要,尤其是考虑到对话系统通常会遇到数据稀缺。在本文中,我们介绍了对话句子嵌入(DSE),这是一种自我监督的对比学习方法,它学习有效的对话表示,适合各种对话任务。 DSE通过连续进行与对比度学习的正面对话的连续对话来从对话中学习。尽管它很简单,但DSE的表现能力比其他对话表示和普遍的句子表示模型要好得多。我们评估DSE的五个下游对话任务,这些任务检查了不同语义粒度的对话表示。几次射击和零射击设置的实验表明,DSE的表现要优于基线。例如,它在6个数据集中的1-Shot意图分类中比最强的无监督基线实现了13%的平均绩效提高。我们还提供了有关模型的好处和局限性的分析。
translated by 谷歌翻译
基于检索的对话响应选择旨在为给定多转中下文找到候选集的正确响应。基于预先训练的语言模型(PLMS)的方法对此任务产生了显着的改进。序列表示在对话背景和响应之间的匹配程度中扮演关键作用。然而,我们观察到相同上下文共享的不同的上下文响应对始终在由PLM计算的序列表示中具有更大的相似性,这使得难以区分来自负面的正响应。由此激励,我们提出了一种基于PLMS的响应选择任务的新颖\ TextBF {f} ine- \ textbf {g}下载\ textbf {g} unfrstive(fgc)学习方法。该FGC学习策略有助于PLMS在细粒中产生每个对话的更可区分的匹配表示,并进一步提高选择正反应的预测。两个基准数据集的实证研究表明,所提出的FGC学习方法一般可以提高现有PLM匹配模型的模型性能。
translated by 谷歌翻译
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap', resulting in generated responses which are dull, repetitive, and often inconsistent with dialogue history. Comparing ranked lists of multiple generated responses against the 'gold response' (from training data) reveals a wide diversity in response quality, with many good responses placed lower in the ranked list. The main challenge, addressed in this work, is then how to reach beyond greedily generated system responses, that is, how to obtain and select such high-quality responses from the list of overgenerated responses at inference without availability of the gold response. To this end, we propose a simple yet effective reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system. The idea is to use any sequence-level (similarity) scoring function to divide the semantic space of responses into high-scoring versus low-scoring partitions. At training, the high-scoring partition comprises all generated responses whose similarity to the gold response is higher than the similarity of the greedy response to the gold response. At inference, the aim is to estimate the probability that each overgenerated response belongs to the high-scoring partition, given only previous dialogue history. We validate the robustness and versatility of our proposed method on the standard MultiWOZ dataset: our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results. Additional experiments on the BiTOD dataset and human evaluation further ascertain the generalisability and effectiveness of the proposed framework.
translated by 谷歌翻译
最近,培训预培训方法在以任务为导向的对话框(TOD)系统中表现出了很大的成功。但是,大多数现有的预培训模型用于TOD专注于对话的理解或对话生成,但并非两者兼而有之。在本文中,我们提出了Space-3,这是一种新型的统一的半监督预培训的预训练的对话模型,从大规模对话CORPORA中学习有限的注释,可以有效地对广泛的下游对话任务进行微调。具体而言,Space-3由单个变压器中的四个连续组件组成,以维护TOD系统中的任务流:(i)对话框编码模块编码对话框历史记录,(ii)对话框理解模块以从任一用户中提取语义向量查询或系统响应,(iii)一个对话框策略模块,以生成包含响应高级语义的策略向量,以及(iv)对话框生成模块以产生适当的响应。我们为每个组件设计一个专门的预训练目标。具体而言,我们预先培训对话框编码模块,使用跨度掩码语言建模,以学习上下文化对话框信息。为了捕获“结构化对话框”语义,我们通过额外的对话注释通过新颖的树诱导的半监视对比度学习目标来预先培训对话框理解模块。此外,我们通过将其输出策略向量与响应响应的语义向量之间的L2距离最小化以进行策略优化,从而预先培训对话策略模块。最后,对话框生成模型由语言建模预先训练。结果表明,Space-3在八个下游对话框基准中实现最新性能,包括意图预测,对话框状态跟踪和端到端对话框建模。我们还表明,在低资源设置下,Space-3比现有模型具有更强的射击能力。
translated by 谷歌翻译
预训练的语言模型在对话任务上取得了长足的进步。但是,这些模型通常在表面对话文本上进行训练,因此被证明在理解对话环境的主要语义含义方面是薄弱的。我们研究抽象含义表示(AMR)作为预训练模型的明确语义知识,以捕获预训练期间对话中的核心语义信息。特别是,我们提出了一个基于语义的前训练框架,该框架通过三个任务来扩展标准的预训练框架(Devlin等,2019)。根据AMR图表示。关于聊天聊天和面向任务的对话的理解的实验表明了我们的模型的优势。据我们所知,我们是第一个利用深层语义表示进行对话预训练的人。
translated by 谷歌翻译
具有对比性学习目标的预训练方法在对话了解任务中表现出了显着的成功。但是,当前的对比学习仅将自调查的对话样本视为正样本,并将所有其他对话样本视为负面样本,即使在语义上相关的对话框中,也会强制执行不同的表示。在本文中,我们提出了一个树木结构化的预培训对话模型Space-2,该模型从有限标记的对话框和大规模的无标记的对话框COLPORA通过半监督的对比度预培训来学习对话框表示。具体而言,我们首先定义一个通用的语义树结构(STS),以统一不同对话框数据集的注释模式,以便可以利用所有标记数据中存储的丰富结构信息。然后,我们提出了一个新颖的多视图分数功能,以增加共享类似STS的所有可能对话框的相关性,并且在监督的对比预训练期间仅推开其他完全不同的对话框。为了充分利用未标记的对话,还增加了基本的自我监督对比损失,以完善学习的表示。实验表明,我们的方法可以在DialogLue基准测试中实现新的最新结果,该基准由七个数据集和四个流行的对话框组成。为了获得可重复性,我们在https://github.com/alibabaresearch/damo-convai/tree/main/main/space-2上发布代码和数据。
translated by 谷歌翻译
转移学习技术和预先培训的最新进展,大型上下文编码器在包括对话助理在内的现实应用程序中促进了创新。意图识别的实际需求需要有效的数据使用,并能够不断更新支持意图,采用新的意图并放弃过时的意图。尤其是,对模型的广义零拍范例,该模型受到了可见意图的训练并在可见和看不见的意图上进行了测试,这是新的重要性。在本文中,我们探讨了用于意图识别的广义零拍设置。遵循零击文本分类的最佳实践,我们使用句子对建模方法对待任务。对于看不见的意图,使用意图标签和用户话语,而无需访问外部资源(例如知识库),我们的表现优于先前的最先进的F1量化,最多可达16 \%。进一步的增强包括意图标签的词汇化,可提高性能高达7%。通过使用从其他句子对任务(例如自然语言推论)转移的任务传输,我们会获得其他改进。
translated by 谷歌翻译
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to the previous best results. We also show-both theoretically and empirically-that the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available. 1 2 We randomly sample 10 6 sentences from English Wikipedia and fine-tune BERTbase with learning rate = 3e-5, N = 64. In all our experiments, no STS training sets are used.
translated by 谷歌翻译
Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.
translated by 谷歌翻译
Personalized chatbots focus on endowing the chatbots with a consistent personality to behave like real users and further act as personal assistants. Previous studies have explored generating implicit user profiles from the user's dialogue history for building personalized chatbots. However, these studies only use the response generation loss to train the entire model, thus it is prone to suffer from the problem of data sparsity. Besides, they overemphasize the final generated response's quality while ignoring the correlations and fusions between the user's dialogue history, leading to rough data representations and performance degradation. To tackle these problems, we propose a self-supervised learning framework MCP for capturing better representations from users' dialogue history for personalized chatbots. Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history, and generate the pre-training samples for enhancing the model. We design three pre-training tasks based on three types of contrastive pairs from user dialogue history, namely response pairs, sequence augmentation pairs, and user pairs. We pre-train the utterance encoder and the history encoder towards the contrastive objectives and use these pre-trained encoders for generating user profiles while personalized response generation. Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
translated by 谷歌翻译
Chatbots are expected to be knowledgeable across multiple domains, e.g. for daily chit-chat, exchange of information, and grounding in emotional situations. To effectively measure the quality of such conversational agents, a model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another. This calls for a dedicated network architecture for domain generalization. To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. To validate the idea, we construct a high-quality multi-domain dialogue dataset leveraging data augmentation and pseudo-labeling. The PoE network is comprehensively assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. It achieves state-of-the-art performance in terms of mean Spearman correlation over all the evaluation datasets. It exhibits better zero-shot generalization than existing state-of-the-art ADEMs and the ability to easily adapt to new domains with few-shot transfer learning.
translated by 谷歌翻译
对比学习一直吸引着学习无监督的句子嵌入。当前的最新无监督方法是无监督的SIMCSE(UNSUP-SIMCSE)。 Unsup-Simcse将辍学作为最小数据增强方法,并将相同的输入句子传递给预训练的变压器编码器(带有掉落的掉落)两次,以获取两个相应的嵌入式以构建正对。由于句子的长度信息通常会由于使用嵌入变压器中的位置嵌入而编码到句子嵌入中,因此Unsup-Simcse中的每个正对实际上包含相同的长度信息。因此,接受这些正面对训练的Unsup-Simcse可能是有偏见的,这往往会考虑到语义上相同长度或相似长度的句子更相似。通过统计观察,我们发现Unsup-Simcse确实存在这样的问题。为了减轻它,我们应用了一个简单的重复操作来修改输入句子,然后分别将输入句子及其修改后的对应物传递给预训练的变压器编码器,以获取阳性对。此外,我们从计算机视觉社区中汲取灵感,并引入动量对比度,从而扩大了负面对的数量,而没有其他计算。提出的两种修改分别应用于正和负对,并构建一种新的句子嵌入方法,称为增强的Unsup-Simcse(ESIMCSE)。我们在几个基准数据集W.R.T上评估了所提出的ESIMCSE,语义文本相似性(STS)任务。实验结果表明,ESIMCSE的表现优于最先进的undup-Simcse,而Bert基碱的平均长矛相关性为2.02%。
translated by 谷歌翻译
预训练的语言模型(PLM)在自然语言理解中的许多下游任务中取得了显着的性能增长。已提出了各种中文PLM,以学习更好的中文表示。但是,大多数当前模型都使用中文字符作为输入,并且无法编码中文单词中包含的语义信息。虽然最近的预训练模型同时融合了单词和字符,但它们通常会遭受不足的语义互动,并且无法捕获单词和字符之间的语义关系。为了解决上述问题,我们提出了一个简单而有效的PLM小扣手,该小扣子采用了对单词和性格表示的对比度学习。特别是,Clower通过对多透明信息的对比学习将粗粒的信息(即单词)隐式编码为细粒度表示(即字符)。在现实的情况下,小电动器具有很大的价值,因为它可以轻松地将其纳入任何现有的基于细粒的PLM中而无需修改生产管道。在一系列下游任务上进行的扩展实验表明,小动物的卓越性能超过了几个最先进的实验 - 艺术基线。
translated by 谷歌翻译
对话机器人已广泛应用于客户服务方案,以提供及时且用户友好的体验。这些机器人必须对对话的适当域进行分类,了解用户的意图并产生适当的响应。现有的对话预训练模型仅针对多个对话任务而设计,而忽略了弱监督的客户服务对话中的专家知识。在本文中,我们提出了一个新颖的统一知识提示预训练框架,ufa(\ textbf {u} nified Model \ textbf {f}或\ textbf {a} ll任务),用于客户服务对话。我们将客户服务对话的所有任务作为统一的文本到文本生成任务,并引入知识驱动的及时策略,以共同从不同的对话任务中学习。我们将UFA预先训练UFA,从实用场景中收集的大型中国客户服务语料库中,并对自然语言理解(NLU)和自然语言生成(NLG)基准进行了重大改进。
translated by 谷歌翻译
产品匹配是全球对电子商务消费者行为的理解的基本步骤。实际上,产品匹配是指确定来自不同数据源(例如零售商)是否提供两个产品的任务。标准管道使用以前的阶段,称为阻止,其中给定产品提供了一组潜在的匹配候选者,以相似的特征(例如相同的品牌,类别,风味等)检索。从这些类似的候选产品中,那些不匹配的产品可以被视为艰难的负面因素。我们提出了Block-SCL,该策略使用阻止输出来充分利用监督的对比度学习(SCL)。具体而言,块-SCL使用在阻塞阶段获得的硬性样本来构建丰富的批处理。这些批次提供了一个强大的训练信号,导致该模型了解产品匹配的更有意义的句子嵌入。几个公共数据集中的实验结果表明,尽管仅将短产品标题作为输入,没有数据增强和更轻的变压器主链比竞争方法,但Block-SCL仍取得了最新的结果。
translated by 谷歌翻译
We present Relational Sentence Embedding (RSE), a new paradigm to further discover the potential of sentence embeddings. Prior work mainly models the similarity between sentences based on their embedding distance. Because of the complex semantic meanings conveyed, sentence pairs can have various relation types, including but not limited to entailment, paraphrasing, and question-answer. It poses challenges to existing embedding methods to capture such relational information. We handle the problem by learning associated relational embeddings. Specifically, a relation-wise translation operation is applied to the source sentence to infer the corresponding target sentence with a pre-trained Siamese-based encoder. The fine-grained relational similarity scores can be computed from learned embeddings. We benchmark our method on 19 datasets covering a wide range of tasks, including semantic textual similarity, transfer, and domain-specific tasks. Experimental results show that our method is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art sentence embedding methods. https://github.com/BinWang28/RSE
translated by 谷歌翻译
这项工作结合了有关预先训练模型编码的对话历史的信息,其含义表示当前系统话语,以实现面向任务对话中的语境语言生成。我们利用预先训练的多上下文转换模型进行从头开始培训的模型中的上下文表示;并利用从预训练的GPT-2调整的模型中的上下文生成的立即使用前面的用户话语。与多种数据集的两个实验表明,通过预先训练的模型编码的上下文信息可提高自动指标和人类评估中的响应生成的性能。我们所呈现的上下文发电机使得更高种类的响应能够更好地适应正在进行的对话。分析上下文大小显示,较长的上下文不会自动导致更好的性能,但是前面的用户话语的直接对上下文生成起着重要作用。此外,我们还提出了一种基于GPT的生成模型的重新排名。实验表明,RE-Ranker选择的响应对自动度量有重大改进。
translated by 谷歌翻译
存在预训练模型在各种文本分类任务上取得了最先进的性能。这些模型已被证明可用于学习普遍语言表示。然而,通过先进的预训练模型无法有效地区分类似文本之间的语义差异,这对难以区分类的性能产生了很大的影响。为了解决这个问题,我们在这项工作中提出了一种与标签距离(CLLD)的新型对比学习。灵感来自最近对比学习的进步,我们专门设计了一种具有标签距离的分类方法,用于学习对比类。 CLLD可确保在导致不同标签分配的细微差别中的灵活性,并为同时具有相似性的每个类生成不同的表示。关于公共基准和内部数据集的广泛实验表明,我们的方法提高了预先训练模型在分类任务上的性能。重要的是,我们的实验表明,学习的标签距离减轻了细胞的对抗性质。
translated by 谷歌翻译
口语理解(SLU)是机器理解人类语音以进行更好互动的必不可少的任务。但是,自动语音识别器(ASR)的错误通常会损害理解表现。实际上,对于目标方案,ASR系统可能不容易调整。因此,本文着重于学习使用对比目标对ASR错误进行鲁棒性的学习话语表示,并通过结合监督的对比度学习和自我验证在模型微调中进一步增强概括能力。三个基准数据集的实验证明了我们提出的方法的有效性。
translated by 谷歌翻译
对比学习被出现为强大的代表学习方法,促进各种下游任务,特别是当监督数据有限时。如何通过数据增强构建有效的对比样本是其成功的关键。与视觉任务不同,语言任务中尚未对对比学习进行对比学习的数据增强方法。在本文中,我们提出了一种使用文本摘要构建语言任务的对比样本的新方法。我们使用这些样本进行监督的对比学习,以获得更好的文本表示,这极大地利用了具有有限注释的文本分类任务。为了进一步改进该方法,除了交叉熵损失之外,我们将从不同类中的样本混合并添加一个名为MIXSUM的额外正则化。真实世界文本分类数据集(Amazon-5,Yelp-5,AG新闻和IMDB)的实验展示了基于摘要的数据增强和MIXSUM正规化的提议对比学习框架的有效性。
translated by 谷歌翻译