作为对话系统的基本组成部分,响应选择旨在挑选候选人之间的最佳反应,以继续对话。在现有研究中,这项任务通常被视为二进制分类问题,其中每个候选人分别排名以获取适当性。为了提高其性能,我们将此任务重构为一个多项选择问题,允许在一次性推断中进行最佳选择。这个新的视图激励我们提出一个名为全景 - 编码器的架构(我们的工作将是再现性和未来研究的开放来源。)具有新的候选人注意机制(CAM),这允许在响应之间进行情境方面的关注并导致良好-Gremator比较。此外,我们研究并纳入了一些已被证明有效改善响应选择的技术。三个基准测试的实验表明,我们的方法推动了最先进的,同时实现了大约3x的推理速度。
translated by 谷歌翻译
基于检索的对话响应选择旨在为给定多转中下文找到候选集的正确响应。基于预先训练的语言模型(PLMS)的方法对此任务产生了显着的改进。序列表示在对话背景和响应之间的匹配程度中扮演关键作用。然而,我们观察到相同上下文共享的不同的上下文响应对始终在由PLM计算的序列表示中具有更大的相似性,这使得难以区分来自负面的正响应。由此激励,我们提出了一种基于PLMS的响应选择任务的新颖\ TextBF {f} ine- \ textbf {g}下载\ textbf {g} unfrstive(fgc)学习方法。该FGC学习策略有助于PLMS在细粒中产生每个对话的更可区分的匹配表示,并进一步提高选择正反应的预测。两个基准数据集的实证研究表明,所提出的FGC学习方法一般可以提高现有PLM匹配模型的模型性能。
translated by 谷歌翻译
个性化对话代理对于对话系统非常重要,以产生更具体,一致,并从事和吸引力的反应。然而,大多数当前对话的个性化方法依赖于推理期间的明确人物描述,严重限制其应用。在本文中,我们提出了一种新颖的方法,该方法将根据对话历史来预测人物信息,以个性化对话代理而不依赖于推理期间的任何明确的人格描述。 Personachat数据集上的实验结果表明,当在对话剂的预测轮廓上调节(即“自身角色”)时,所提出的方法可以提高所产生的响应的一致性,并在预测的角色调节时改善所产生的响应的接合对话伙伴(即“他们的角色”)。我们还发现培训的角色预测模型可以成功转移到其他数据集,并帮助生成更相关的响应。
translated by 谷歌翻译
个性化响应选择系统通常基于角色。但是,角色和同理心之间存在共同关联,这些系统在这些系统中并不是很好。本文试图通过提出一套融合策略来解决这些问题,以捕捉角色,情感和话语中的综合信息之间的相互作用。关于角色chat数据集的消融研究表明,结合情绪和累积可提高响应选择的准确性。我们将融合策略和概念流编码结合在一起,以训练基于BERT的模型,该模型的表现优于原始角色的利润率大于2.3%,而修订后的角色的命中率是1.9%(前1位准确性),在角色chat数据集上实现新的最新性能。
translated by 谷歌翻译
Personalized chatbots focus on endowing the chatbots with a consistent personality to behave like real users and further act as personal assistants. Previous studies have explored generating implicit user profiles from the user's dialogue history for building personalized chatbots. However, these studies only use the response generation loss to train the entire model, thus it is prone to suffer from the problem of data sparsity. Besides, they overemphasize the final generated response's quality while ignoring the correlations and fusions between the user's dialogue history, leading to rough data representations and performance degradation. To tackle these problems, we propose a self-supervised learning framework MCP for capturing better representations from users' dialogue history for personalized chatbots. Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history, and generate the pre-training samples for enhancing the model. We design three pre-training tasks based on three types of contrastive pairs from user dialogue history, namely response pairs, sequence augmentation pairs, and user pairs. We pre-train the utterance encoder and the history encoder towards the contrastive objectives and use these pre-trained encoders for generating user profiles while personalized response generation. Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
translated by 谷歌翻译
学习高质量的对话表示对于解决各种面向对话的任务至关重要,尤其是考虑到对话系统通常会遇到数据稀缺。在本文中,我们介绍了对话句子嵌入(DSE),这是一种自我监督的对比学习方法,它学习有效的对话表示,适合各种对话任务。 DSE通过连续进行与对比度学习的正面对话的连续对话来从对话中学习。尽管它很简单,但DSE的表现能力比其他对话表示和普遍的句子表示模型要好得多。我们评估DSE的五个下游对话任务,这些任务检查了不同语义粒度的对话表示。几次射击和零射击设置的实验表明,DSE的表现要优于基线。例如,它在6个数据集中的1-Shot意图分类中比最强的无监督基线实现了13%的平均绩效提高。我们还提供了有关模型的好处和局限性的分析。
translated by 谷歌翻译
Conversational AI has become an increasingly prominent and practical application of machine learning. However, existing conversational AI techniques still suffer from various limitations. One such limitation is a lack of well-developed methods for incorporating auxiliary information that could help a model understand conversational context better. In this paper, we explore how persona-based information could help improve the quality of response generation in conversations. First, we provide a literature review focusing on the current state-of-the-art methods that utilize persona information. We evaluate two strong baseline methods, the Ranking Profile Memory Network and the Poly-Encoder, on the NeurIPS ConvAI2 benchmark dataset. Our analysis elucidates the importance of incorporating persona information into conversational systems. Additionally, our study highlights several limitations with current state-of-the-art methods and outlines challenges and future research directions for advancing personalized conversational AI technology.
translated by 谷歌翻译
缺乏外部知识使同志对话系统难以察觉隐含的情绪,并从有限的对话历史上学习情绪相互作用。为了解决上述问题,我们建议利用外部知识,包括致命知识和情绪词汇知识,以明确了解和表达在同情对话中的情绪。我们首先通过与外部知识共同互动并构建情感语境图来丰富对话史。然后,我们从知识丰富的情绪上下文图和蒸馏情绪信号中学习情绪背景陈述,这是在反应中表达的谓词情绪的先决条件。最后,为了产生同志反应,我们提出了一种情绪跨关注机制来从情绪上下文图中学习情绪依赖。在基准数据集上进行的广泛实验验证了该方法的有效性。此外,我们发现通过与正交工作的预先训练的模型集成,可以进一步提高我们的方法的性能。
translated by 谷歌翻译
预先接受训练的语言模型的最新进展具有显着改善的神经反应生成。但是,现有方法通常将对话背景视为令牌的线性序列,并通过令牌级自我关注学习生成下一个单词。这些令牌级编码阻碍了话语中话语水平一致性的探索。本文介绍了对话贝特,这是一种新的会话响应生成模型,可以增强以前的基于PLM的对话模型。 DialogBert采用分层变压器架构。为了有效地捕捉话语中的话语水平一致性,我们提出了两种培训目标,包括蒙面的话语回归和分布式话语秩序与原始BERT训练相比。在三个多转对谈话数据集上的实验表明,在定量评估方面,我们的方法非常优于BART和Dialogpt等基线。人类评估表明,DialogBert比具有显着利润率的基线产生更加连贯,信息和人类的反应。
translated by 谷歌翻译
预训练的语言模型在对话任务上取得了长足的进步。但是,这些模型通常在表面对话文本上进行训练,因此被证明在理解对话环境的主要语义含义方面是薄弱的。我们研究抽象含义表示(AMR)作为预训练模型的明确语义知识,以捕获预训练期间对话中的核心语义信息。特别是,我们提出了一个基于语义的前训练框架,该框架通过三个任务来扩展标准的预训练框架(Devlin等,2019)。根据AMR图表示。关于聊天聊天和面向任务的对话的理解的实验表明了我们的模型的优势。据我们所知,我们是第一个利用深层语义表示进行对话预训练的人。
translated by 谷歌翻译
Chatbots are expected to be knowledgeable across multiple domains, e.g. for daily chit-chat, exchange of information, and grounding in emotional situations. To effectively measure the quality of such conversational agents, a model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another. This calls for a dedicated network architecture for domain generalization. To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. To validate the idea, we construct a high-quality multi-domain dialogue dataset leveraging data augmentation and pseudo-labeling. The PoE network is comprehensively assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. It achieves state-of-the-art performance in terms of mean Spearman correlation over all the evaluation datasets. It exhibits better zero-shot generalization than existing state-of-the-art ADEMs and the ability to easily adapt to new domains with few-shot transfer learning.
translated by 谷歌翻译
随着在线聊天的日益普及,贴纸在我们的在线沟通中变得越来越重要。在开放域对话中选择适当的贴纸需要对对话和贴纸以及两种类型的方式之间的关系有全面的了解。为了应对这些挑战,我们提出了一种由三个辅助任务组成的多任务学习方法,以增强对对话历史,情感和语义含义的理解。在最近的一个具有挑战性的数据集中进行的广泛实验表明,我们的模型可以更好地结合多模式信息,并在强质基础上获得更高的精度。消融研究进一步验证了每个辅助任务的有效性。我们的代码可在\ url {https://github.com/nonstopfor/sticker-selection}中找到
translated by 谷歌翻译
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap', resulting in generated responses which are dull, repetitive, and often inconsistent with dialogue history. Comparing ranked lists of multiple generated responses against the 'gold response' (from training data) reveals a wide diversity in response quality, with many good responses placed lower in the ranked list. The main challenge, addressed in this work, is then how to reach beyond greedily generated system responses, that is, how to obtain and select such high-quality responses from the list of overgenerated responses at inference without availability of the gold response. To this end, we propose a simple yet effective reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system. The idea is to use any sequence-level (similarity) scoring function to divide the semantic space of responses into high-scoring versus low-scoring partitions. At training, the high-scoring partition comprises all generated responses whose similarity to the gold response is higher than the similarity of the greedy response to the gold response. At inference, the aim is to estimate the probability that each overgenerated response belongs to the high-scoring partition, given only previous dialogue history. We validate the robustness and versatility of our proposed method on the standard MultiWOZ dataset: our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results. Additional experiments on the BiTOD dataset and human evaluation further ascertain the generalisability and effectiveness of the proposed framework.
translated by 谷歌翻译
对话是人类沟通与合作的重要组成部分。现有研究主要关注一对一时尚的短对话情景。然而,现实世界中的多人互动,例如会议或访谈,经常超过几千个字。仍然缺乏相应的研究和强大的工具来了解和处理这么长的对话。因此,在这项工作中,我们为长时间对话理解和总结提供了预先培训框架。考虑到长期交谈的性质,我们提出了一种基于窗口的去噪方法,用于生成预训练。对于对话框,它损坏了一个带有对话激发灵感噪声的文本窗口,并指导模型基于剩余对话的内容来重建此窗口。此外,为了更长的输入,我们增加了稀疏关注模型,这些模型以混合方式与传统的关注相结合。我们在长对话的五个数据集进行广泛的实验,涵盖对话摘要的任务,抽象问题回答和主题分割。实验,我们表明,我们的预先训练的模型DialogLM显着超越了数据集和任务的最先进的模型。我们的GitHub存储库(HTTPS:/github.com/microsoft/dialoglm上有源代码和所有预先训练的型号。
translated by 谷歌翻译
在本文中,我们建议利用对话的独特特征,共享参与者的常识性知识,以解决总结它们的困难。我们提出了病态的框架,该框架使用常识推论作为其他背景。与以前仅依赖于输入对话的工作相比,Sick使用外部知识模型来生成丰富的常识推断,并选择具有基于相似性选择方法的最可能的推理。基于生病的,病人++的理解为监督,在总结多任务学习环境中的对话时,添加了产生常识推断的任务。实验结果表明,通过注入常识性知识,我们的框架比现有方法产生更多信息和一致的摘要。
translated by 谷歌翻译
在对话系统中,具有类似语义的话语可能在不同的环境下具有独特的情绪。因此,与扬声器依赖关系建模的远程语境情绪关系在对话情绪识别中起重要作用。同时,区分不同的情绪类别是非微不足道的,因为它们通常具有语义上类似的情绪。为此,我们采取监督对比学习,使不同的情绪相互排斥,以更好地识别类似的情绪。同时,我们利用辅助响应生成任务来增强模型处理上下文信息的能力,从而强迫模型在不同的环境中识别与类似语义的情绪。为了实现这些目标,我们使用预先训练的编码器 - 解码器模型架作为我们的骨干模型,因为它非常适合理解和生成任务。四个数据集的实验表明,我们所提出的模型在对话情绪认可中获得比最先进的模型更有利的结果。消融研究进一步展示了监督对比损失和生成损失的有效性。
translated by 谷歌翻译
良好的善解人意对话系统应首先跟踪并理解用户的情绪,然后以适当的情感回复。但是,目前对此任务的方法要么集中于提高对用户情绪的理解或提出更好的反应策略,而且很少有作品同时考虑这两种工作。我们的工作试图填补这一空缺。受到任务导向对话系统的启发,我们提出了一种具有情感感知对话管理的新颖善解人意的响应生成模型。情绪感知对话管理包含两个部分:(1)情绪状态跟踪保持当前用户的情绪状态,(2)善解人意的对话策略选择预测目标情绪和用户的意图,基于情绪状态跟踪的结果。然后,预测信息用于指导响应的产生。实验结果表明,与自动评估和人类评估下的几个基准相比,动态管理不同的信息可以帮助模型产生更多的移情反应。
translated by 谷歌翻译
聊天旨在跨越不同域的人类对话,例如普通的Chit-Chat,知识交流和角色接地对话。为了衡量此类会话代理人的质量,预计对话评估员也会在域中进行评估。但是,大多数最先进的自动对话评估指标(ADMS)不是用于多域评估的。我们有动力设计一般和强大的框架MDD-eval,解决问题。具体而言,我们首先将教师评估员与人类注释的数据一起培训,获取评级技能,以便在特定领域的坏人中讲述良好的对话响应,然后采取自我培训策略,以培训具有教师注释的新评估员的新评估人员域数据,有助于新评估程序遍历多个域。 MDD-EVAL在六个对话评估基准上进行了广泛评估。经验结果表明,在所有评估基准的平均矛盾的普通相关评分方面,MDD-ex律师框架在最先进的adms方面取得了强大的表现,绝对改善了7%。
translated by 谷歌翻译
The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.
translated by 谷歌翻译
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
translated by 谷歌翻译