对话系统必须能够随着时间的推移通过更新来纳入新技能,以反映新的用例或部署方案。同样,此类ML驱动系统的开发人员需要能够在已经存在的数据集中添加新的培训数据,以支持这些新技能。在意图分类系统中,如果培训数据的新技能意图与已经存在的意图重叠,则可能会出现问题。我们称此类案件发生冲突。本文介绍了多个数据集之间意图碰撞检测的任务,以提高系统的技能。我们介绍了几种检测碰撞的方法,并评估我们在展示碰撞的真实数据集上的方法。为了强调对意图碰撞检测的需求,我们表明,如果添加新数据,则模型性能会受到影响。最后,我们使用碰撞检测来构建和基准一个新的数据集Redwood,该数据集由13个原始意图分类数据集中的451个Nentent类别组成,使其成为最大的公开可用意图分类基准。
translated by 谷歌翻译
在过去的十年中,对对话系统的兴趣已经大大增长。从扩展过程中,也有兴趣开发和改进意图分类和插槽填充模型,这是两个组件,这些组件通常在以任务为导向的对话框系统中使用。此外,良好的评估基准对于帮助比较和分析结合此类模型的系统很重要。不幸的是,该领域的许多文献仅限于对相对较少的基准数据集的分析。为了促进针对任务的对话系统的更强大的分析,我们对意图分类和插槽填充任务进行了公开可用数据集的调查。我们分类每个数据集的重要特征,并就每个数据集的适用性,优势和劣势进行讨论。我们的目标是,这项调查有助于提高这些数据集的可访问性,我们希望它们能够在未来评估意图分类和填充插槽模型中用于以任务为导向的对话框系统。
translated by 谷歌翻译
对实际对话系统的用户查询有时可能出现在系统功能的范围之外,但适当的系统响应将在整个人机交互中进行平滑处理。本文涉及用户的意图,并专注于对话系统中的范围内意图分类。虽然用户意图与应用领域高度相关,但很少有研究利用意图分类这种相关性。而不是开发一个首先对域进行分类的两阶段方法,而是意图,我们提出了一种基于联合模型的分层多任务学习方法来分类域同时和意图。拟议方法中的Noveltize包括:(1)分享域的联合建模中的监督范围信号和意图分类,以取代两级管道; (2)引入分层模型,分别在较高层和下层中学习意图和域表示。实验表明,该模型在准确性,范围外召回和F1方面优于现有方法。此外,基于阈值的后处理进一步通过平衡精度和调用意图中的分类来提高性能。
translated by 谷歌翻译
转移学习技术和预先培训的最新进展,大型上下文编码器在包括对话助理在内的现实应用程序中促进了创新。意图识别的实际需求需要有效的数据使用,并能够不断更新支持意图,采用新的意图并放弃过时的意图。尤其是,对模型的广义零拍范例,该模型受到了可见意图的训练并在可见和看不见的意图上进行了测试,这是新的重要性。在本文中,我们探讨了用于意图识别的广义零拍设置。遵循零击文本分类的最佳实践,我们使用句子对建模方法对待任务。对于看不见的意图,使用意图标签和用户话语,而无需访问外部资源(例如知识库),我们的表现优于先前的最先进的F1量化,最多可达16 \%。进一步的增强包括意图标签的词汇化,可提高性能高达7%。通过使用从其他句子对任务(例如自然语言推论)转移的任务传输,我们会获得其他改进。
translated by 谷歌翻译
We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM's effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM's "generation-simulation-remediation" paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLi5iSoly30. We have open-sourced the toolkit at https://github.com/salesforce/botsim
translated by 谷歌翻译
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
translated by 谷歌翻译
学习高质量的对话表示对于解决各种面向对话的任务至关重要,尤其是考虑到对话系统通常会遇到数据稀缺。在本文中,我们介绍了对话句子嵌入(DSE),这是一种自我监督的对比学习方法,它学习有效的对话表示,适合各种对话任务。 DSE通过连续进行与对比度学习的正面对话的连续对话来从对话中学习。尽管它很简单,但DSE的表现能力比其他对话表示和普遍的句子表示模型要好得多。我们评估DSE的五个下游对话任务,这些任务检查了不同语义粒度的对话表示。几次射击和零射击设置的实验表明,DSE的表现要优于基线。例如,它在6个数据集中的1-Shot意图分类中比最强的无监督基线实现了13%的平均绩效提高。我们还提供了有关模型的好处和局限性的分析。
translated by 谷歌翻译
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
translated by 谷歌翻译
具有对比性学习目标的预训练方法在对话了解任务中表现出了显着的成功。但是,当前的对比学习仅将自调查的对话样本视为正样本,并将所有其他对话样本视为负面样本,即使在语义上相关的对话框中,也会强制执行不同的表示。在本文中,我们提出了一个树木结构化的预培训对话模型Space-2,该模型从有限标记的对话框和大规模的无标记的对话框COLPORA通过半监督的对比度预培训来学习对话框表示。具体而言,我们首先定义一个通用的语义树结构(STS),以统一不同对话框数据集的注释模式,以便可以利用所有标记数据中存储的丰富结构信息。然后,我们提出了一个新颖的多视图分数功能,以增加共享类似STS的所有可能对话框的相关性,并且在监督的对比预训练期间仅推开其他完全不同的对话框。为了充分利用未标记的对话,还增加了基本的自我监督对比损失,以完善学习的表示。实验表明,我们的方法可以在DialogLue基准测试中实现新的最新结果,该基准由七个数据集和四个流行的对话框组成。为了获得可重复性,我们在https://github.com/alibabaresearch/damo-convai/tree/main/main/space-2上发布代码和数据。
translated by 谷歌翻译
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
translated by 谷歌翻译
面向目标的对话系统的核心组件之一是意图检测的任务。由于可用的附带话语的稀缺性,目的检测时的几次射门学习是挑战。尽管最近的作品已经提出了使用基于度量的基于优化的方法,但任务仍然在大标签空间中挑战,射击数量小得多。由于在测试阶段,由于两种新颖和看到的课程存在,概括的少量学习更加困难。在这项工作中,我们提出了一种基于自然语言推理的简单有效的方法,不仅解决了几次射击意图检测问题,而且在零射击和广义少数射击学习问题中证明是有用的。我们对许多自然语言理解(NLU)和口语理解(SLU)数据集的大量实验表明了我们的方法的有效性。此外,我们突出了我们基于NLI的方法的设置,通过巨大的利润率优于基线。
translated by 谷歌翻译
意图发现是NLP的一项基本任务,它与各种工业应用越来越相关(Quarteroni 2018)。主要的挑战在于需要从投入性话语中识别出新颖的范围。在此,我们提出了Z-Bert-A,这是一种依赖变压器结构的两阶段方法(Vaswani等人,2017; Devlin等人,2018年),用适配器进行了微调(Pfeiffer等,2020),,),等等。最初接受了自然语言推断(NLI)的培训,后来在零射击设置中申请了未知的内部分类。在我们的评估中,我们首先在已知类别的自适应微调后分析模型的质量。其次,我们将其性能铸造意图分类评估为NLI任务。最后,我们在看不见的类别上测试了模型的零射击性能,以表明Z-Bert-A可以通过产生与地面真实者的语义相似(即使不是平等)的意图,如何有效地执行周期发现。我们的实验表明,Z-Bert-A在两个零射击设置中的表现如何超过各种基线:已知意图分类和看不见的意图发现。拟议的管道具有广泛应用于各种客户服务应用程序的潜力。它可以使用轻巧的模型来实现自动化动态分流,该模型与大型语言模型不同,可以轻松地在各种业务场景中进行部署和缩放。尤其是在考虑具有有限的硬件可用性和性能的设置时,必须进行原始或资源云部署低的设置。 Z-Bert-A可以从单一话语中预测新颖的意图,代表了一种创新的意图发现方法,从而使在线一代的新颖意图能够。该管道可作为可安装的Python软件包获得以下链接:https://github.com/gt4sd/zberta。
translated by 谷歌翻译
Conversation designers continue to face significant obstacles when creating production quality task-oriented dialogue systems. The complexity and cost involved in schema development and data collection is often a major barrier for such designers, limiting their ability to create natural, user-friendly experiences. We frame the classification of user intent as the generation of a canonical form, a lightweight semantic representation using natural language. We show that canonical forms offer a promising alternative to traditional methods for intent classification. By tuning soft prompts for a frozen large language model, we show that canonical forms generalize very well to new, unseen domains in a zero- or few-shot setting. The method is also sample-efficient, reducing the complexity and effort of developing new task-oriented dialogue domains.
translated by 谷歌翻译
意图理解在对话系统中发挥着重要作用,通常被制定为监督的学习问题。然而,从头开始设计新领域的意图是挑战性和耗时的,通常需要很多人工域专家的手动努力。本文提出了一种无监督的两阶段方法来发现意图,并从域中的未标记的话语集合自动生成有意义的意图标签。在第一阶段,我们的目标是生成一组语义相干群集,其中每个簇内的话语传达相同的意图。我们从各种预先训练的句子嵌入中获取话语表示,并呈现平衡分数的度量,以确定用于平衡数据集的K-means群集中的k-means群集中的最佳簇数。在第二阶段,目标是为每个群集自动生成意图标签。我们使用依赖性解析器从每个话语中提取动作对象对,并在每个群集中采取最常用的对,例如书籍餐厅,作为生成的意图标签。我们经验证明,提出的无监督方法可以自动生成有意义的意图标签,并在话语聚类和意图发现中实现高精度并召回。
translated by 谷歌翻译
人工智能的最新趋势是将验证的模型用于语言和视觉任务,这些模型已经实现了非凡的表现,但也令人困惑。因此,以各种方式探索这些模型的能力对该领域至关重要。在本文中,我们探讨了模型的可靠性,在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能,而且在许多涉及不确定性(例如选择性预测,开放式设置识别)的决策任务上,在许多决策任务上表现出色,而且表现良好。强大的概括(例如,准确性和适当的评分规则,例如在分布数据集中和分发数据集上的对数可能性)和适应性(例如,主动学习,几乎没有射击不确定性)。我们设计了40个数据集的10种任务类型,以评估视觉和语言域上可靠性的不同方面。为了提高可靠性,我们分别开发了VIT-PLEX和T5-PLEX,分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进,并简化了传统协议,因为它可以改善开箱即用的性能,并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果,并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能,包括零射门的开放式识别,主动学习和对话语言理解中的不确定性。
translated by 谷歌翻译
In natural language understanding (NLU) production systems, users' evolving needs necessitate the addition of new features over time, indexed by new symbols added to the meaning representation space. This requires additional training data and results in ever-growing datasets. We present the first systematic investigation of this incremental symbol learning scenario. Our analysis reveals a troubling quirk in building broad-coverage NLU systems: as the training dataset grows, performance on the new symbol often decreases if we do not accordingly increase its training data. This suggests that it becomes more difficult to learn new symbols with a larger training dataset. We show that this trend holds for multiple mainstream models on two common NLU tasks: intent recognition and semantic parsing. Rejecting class imbalance as the sole culprit, we reveal that the trend is closely associated with an effect we call source signal dilution, where strong lexical cues for the new symbol become diluted as the training dataset grows. Selectively dropping training examples to prevent dilution often reverses the trend, showing the over-reliance of mainstream neural NLU models on simple lexical cues. Code, models, and data are available at https://aka.ms/nlu-incremental-symbol-learning
translated by 谷歌翻译
传统意图分类模型基于预定义的意图集,仅识别有限的内域(IND)意图类别。但是用户可以在实用的对话系统中输入室外(OOD)查询。这样的OOD查询可以提供未来改进的方向。在本文中,我们定义了一项新任务,广义意图发现(GID),旨在将IND意图分类器扩展到包括IND和OOD意图在内的开放世界意图集。我们希望在发现和识别新的未标记的OOD类型的同时,同时对一组标记的IND意图类进行分类。我们为不同的应用程序方案构建了三个公共数据集,并提出了两种框架,即基于管道的框架和端到端,以实现未来的工作。此外,我们进行详尽的实验和定性分析,以理解关键挑战,并为未来的GID研究提供新的指导。
translated by 谷歌翻译
Many efforts have been made to construct dialog systems for different types of conversations, such as task-oriented dialog (TOD) and open-domain dialog (ODD). To better mimic human-level conversations that usually fuse various dialog modes, it is essential to build a system that can effectively handle both TOD and ODD and access different knowledge sources. To address the lack of available data for the fused task, we propose a framework for automatically generating dialogues that combine knowledge-grounded ODDs and TODs in various settings. Additionally, we introduce a unified model PivotBot that is capable of appropriately adopting TOD and ODD modes and accessing different knowledge sources in order to effectively tackle the fused task. Evaluation results demonstrate the superior ability of the proposed model to switch seamlessly between TOD and ODD tasks.
translated by 谷歌翻译
由于细微偏见,主观性和难以在规模上获得良好质量的数据集,尤其考虑到社会偏见和社会的不断变化本质,检测文本中的社会偏见是挑战。为了解决这些挑战,我们提出了一些基于指令的基于指令的方法,以提示预先接受预先接受的语言模型(LMS)。我们从最接近查询的小型支持存储库中选择一些标签平衡的示例,以便在嵌入空间中标记。然后,我们向LM提供由标记示例的此子集的指令,查询文本被分类,偏差定义,并提示它做出决定。我们证明了几次上下文中使用的大型LMS可以检测不同类型的细粒度偏差,具有与微调模型的相似且有时卓越的精度。我们观察到,与较小模型相比,最大的530B参数模型在检测社会偏差方面明显更有效(与其他模型相比,在AUC度量上实现至少20%)。它还在几张拍摄设置中保持高AUC(掉落小于5%),其中标记的存储库减少到100个样本的少量。因此,大型预制语言模型使得更容易且更快地建立新的偏置探测器。
translated by 谷歌翻译
我们提出语言学家,这是一种通过微调Alexatm 5B生成带注释数据的方法,用于生成意图分类和插槽标记(IC+ST),这是一种5亿参数的多语言序列到序列(SEQ2SEQ)模型,在灵活的指令上迅速的。在SNIP数据集的10次新颖意图设置中,语言学家超过了最新的方法(反向翻译和示例外推),可以通过宽阔的边距,显示出IC回忆中+1.9点的目标意图的绝对改善ST F1分数和+2.5分。在MATIS ++数据集的零击跨语言设置中,语言学家表现出强大的机器翻译基线,插槽对齐的基线是+4.14的+4.14点在6个语言上绝对在ST F1分数上,同时在IC上匹配IC的性能。最后,我们在用于对话代理IC+ST的内部大规模多语言数据集上验证了我们的结果,并显示了使用背面翻译,释义和插槽目录重新采样采样的基线的显着改进。据我们所知,我们是第一个展示大规模SEQ2SEQ模型的指导微调的人,以控制多语言意图和插槽标记的数据生成的输出。
translated by 谷歌翻译