We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment"). We propose nine if-then relation types to distinguish causes vs. effects, agents vs. themes, voluntary vs. involuntary events, and actions vs. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.
translated by 谷歌翻译
近年来带来了对自然语言理解领域的勤义代表和推理的重新兴趣。新的致辞知识图表(CSKG)的发展是这些进步的核心,因为他们的不同事实可以通过机器学习模型来解决新的和具有挑战性的任务。与此同时,由于全面地涵盖了一般勤杂朗知识所需的大规模规模,对这些资源的质量和覆盖率仍存在疑问。在这项工作中,我们将手动构建的CSKGS分配在NLP代理商遇到的所有情况下,我们将永远不会实现适用所需的覆盖范围。因此,我们提出了一种新的评估框架,用于测试KGS的效用,基于如何从中学习有效的隐式知识表示。通过这一新目标,我们提出了一个含有知识的全新CSKG的新CSKG,该知识不容易获得预用的语言模型。我们与其他领先的CSKG相比,评估其属性,表现了对勤杂朗语言知识资源的第一个大规模对研究。接下来,我们显示原子2020更适合培训知识模型,可以为新的,看不见的实体和事件产生准确,代表知识。最后,通过人类评估,我们表明,尽管使用超过430倍的参数,但GPT-3(175B参数)的几次射击性能较低,而令人印象深刻,令人印象深刻,令人印象深刻,令人印象深刻,仍然低于原子型2020的巴特的知识模型。
translated by 谷歌翻译
We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and Con-ceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET ) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.
translated by 谷歌翻译
The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.
translated by 谷歌翻译
对事件序列的预测对于信息检索和自然语言处理中的许多现实世界应用至关重要。在事件序列预测中,未来的活动生成(FEG)是一项具有挑战性的任务,因为它不仅需要流利的文本生成,而且需要常识性推理才能保持整个事件故事的逻辑连贯性。在本文中,我们提出了一个新颖的可解释的FEG框架COEP。它突出并整合了两种类型的事件知识,对直接事件事件关系的顺序知识以及推论知识,这些知识反映了事件之间的中间角色心理学(例如意图,原因,反应),这些心理本质地将故事推向了故事。为了减轻知识遗忘问题,我们为每种类型的知识设计了两个模块,即IM和GM,它们是通过及时调整组合的。首先,IM专注于理解推论知识,以产生常识性解释并为通用汽车提供软提示向量。我们还设计了一种对比歧视器,以提高概括能力。其次,GM通过用IM的指导对直接顺序知识进行建模来生成未来事件。自动和人类评估表明,我们的方法可以产生更连贯,具体和逻辑的未来事件。
translated by 谷歌翻译
Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structure knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematical taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.
translated by 谷歌翻译
自动化讲故事长期以来一直抓住了研究人员在日常生活中的叙述中的难以感受。但是,在用神经语言模型产生叙述时,保持一致性并保持对特定结束的特定结束挑战。在本文中,我们介绍了读者模型(Storm)的故事生成,这是一个框架,其中读者模型用于推理故事的推理应该进步。读者模型是人类读者相信关于虚构故事世界的概念,实体和关系的人。我们展示了如何作为知识图表所代表的明确读者模型提供故事一致性,并以实现给定的故事世界目标的形式提供可控性。实验表明,我们的模型产生了显着更加连贯和主题的故事,优于尺寸的基线,包括情节合理性并保持主题。我们的系统也优于在未订购的情况下在组成给定概念时占总引导的故事生成基线。
translated by 谷歌翻译
When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little general background. To investigate question answering with prior knowledge, we present COMMONSENSEQA: a challenging new dataset for commonsense question answering. To capture common sense beyond associations, we extract from CON-CEPTNET (Speer et al., 2017) multiple target concepts that have the same semantic relation to a single source concept. Crowd-workers are asked to author multiple-choice questions that mention the source concept and discriminate in turn between each of the target concepts. This encourages workers to create questions with complex semantics that often require prior knowledge. We create 12,247 questions through this procedure and demonstrate the difficulty of our task with a large number of strong baselines. Our best baseline is based on BERT-large (Devlin et al., 2018) and obtains 56% accuracy, well below human performance, which is 89%.
translated by 谷歌翻译
Knowledge about outcomes is critical for complex event understanding but is hard to acquire. We show that by pre-identifying a participant in a complex event, crowd workers are able to (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground the outcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96 weighted Fleiss Kappa). Our dataset, POQue (Participant Outcome Questions), enables the exploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant's influence over the event culmination.
translated by 谷歌翻译
叙事中的事件可以通过其参与者的基本状态理解为一致的整体。通常,这些参与者在叙述中没有明确提及,而是通过常识性或推论填写。理解叙述的模型应该能够推断出这些隐性参与者状态,以及有关这些状态对叙事的影响的原因。为了促进这一目标,我们介绍了一个新的众包参与者指出的数据集意大利面。该数据集包含有效的,可推断的参与者状态;对国家的反事实扰动;如果反事实是真实的,那么故事的变化将是必要的。我们介绍了三项基于州的推理任务,这些任务测试了一个故事何时由故事启用,修改一个反事实状态的故事,并解释给定经过修订的故事的最有可能的状态变化。我们的基准测试实验表明,尽管当今的LLM能够在某种程度上推理有关州的推理,但仍有很大的改进空间,这表明了未来研究的潜在途径。
translated by 谷歌翻译
情绪分析中最突出的任务是为文本分配情绪,并了解情绪如何在语言中表现出来。自然语言处理的一个重要观察结果是,即使没有明确提及情感名称,也可以通过单独参考事件来隐式传达情绪。在心理学中,被称为评估理论的情感理论类别旨在解释事件与情感之间的联系。评估可以被形式化为变量,通过他们认为相关的事件的人们的认知评估来衡量认知评估。其中包括评估事件是否是新颖的,如果该人认为自己负责,是否与自己的目标以及许多其他人保持一致。这样的评估解释了哪些情绪是基于事件开发的,例如,新颖的情况会引起惊喜或不确定后果的人可能引起恐惧。我们在文本中分析了评估理论对情绪分析的适用性,目的是理解注释者是否可以可靠地重建评估概念,如果可以通过文本分类器预测,以及评估概念是否有助于识别情感类别。为了实现这一目标,我们通过要求人们发短信描述触发特定情绪并披露其评估的事件来编译语料库。然后,我们要求读者重建文本中的情感和评估。这种设置使我们能够衡量是否可以纯粹从文本中恢复情绪和评估,并为判断模型的绩效指标提供人体基准。我们将文本分类方法与人类注释者的比较表明,两者都可以可靠地检测出具有相似性能的情绪和评估。我们进一步表明,评估概念改善了文本中情绪的分类。
translated by 谷歌翻译
We present SODA: the first publicly available, million-scale high-quality social dialogue dataset. Using SODA, we train COSMO: a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets. In contrast to most existing crowdsourced, small-scale dialogue corpora, we distill 1.5M socially-grounded dialogues from a pre-trained language model (InstructGPT; Ouyang et al., 2022). Dialogues are distilled by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x; West et al., 2022). Human evaluation shows that dialogues in SODA are more consistent, specific, and (surprisingly) natural than prior human-authored datasets - e.g., DailyDialog (Li et al., 2017), BlendedSkillTalk (Smith et al., 2020). In addition, extensive evaluations show that COSMO is significantly more natural and consistent on unseen datasets than best-performing dialogue models - e.g., GODEL (Peng et al., 2022), BlenderBot (Roller et al., 2021), DialoGPT (Zhang et al., 2020). Furthermore, it is sometimes even preferred to the original human-written gold responses. We make our data, models, and code public.
translated by 谷歌翻译
相同上下文的可能后果可能会因我们所指的情况而异。但是,当前在自然语言处理中的研究并不集中于多种可能情况下的常识性推理。本研究通过短篇小说文字提出与候选人答案相同的结尾的多个问题来构成这项任务。我们由此产生的数据集,可能的故事,包括超过1.3k的故事文本超过4.5k的问题。我们发现,即使是目前的强训练性语言模型也很难始终如一地回答问题,这强调了无监督环境中最高的准确性(60.2%)远远落后于人类准确性(92.5%)。通过与现有数据集进行比较,我们观察到数据集中的问题包含答案选项中的最小注释伪像。此外,我们的数据集还包括需要反事实推理的示例,以及需要读者的反应和虚构信息的示例,这表明我们的数据集可以作为对未来常识性推理的未来研究的挑战性测试。
translated by 谷歌翻译
可以利用致辞知识来识别文本中的因果关系。在这项工作中,我们在Atomic2020中言语三元组,广泛的覆盖率致辞推理知识图表,到自然语言文本,并不断预先预留伯特普瑞赖林模型。我们评估了回答勤杂朗语言推理问题所产生的模型。我们的研究结果表明,通过致致通知推理知识增强了不断预付费的语言模型在两个致辞语言推理基准测试,COPA和BCOPA-CE上表现出我们的基线,而无需对基础模型的额外改进或使用质量增强的数据进行微调。
translated by 谷歌翻译
个人之间日常谈话的关键特征是能够向他人表达同理心,并探索实施同理心的方法是对人类对话系统的关键步骤。本主题的先前方法主要集中在检测和利用用户的情绪以产生同理反应。但是,由于同情包括感情和认知的两个方面,我们认为除了识别用户的情绪之外,还应该考虑对用户情况的认知理解。为此,我们提出了一种新的方法来实现同志响应生成,它利用致辞来绘制更多信息的信息,并使用这些附加信息来进一步增强所生成的响应中的同情表达。我们在EmpatheticDialogues上评估我们的方法,这是一个广泛使用的基准数据集,用于致力于响应生成。经验结果表明,我们的方法在自动和人类评估中表明了基线模型,可以产生更丰富的信息和致力学的反应。
translated by 谷歌翻译
通过使用其他域的知识来推理一个域的人类能力已经研究了50多年,但正式声音和预测认知过程的模型是稀疏的。我们提出了一种正式的声音方法,通过调整逻辑推理机制来模拟关联推理。特别地,表明,在单一推理系统中,具有大的结合知识的组合,对高效和强大的关联技术的要求。这种方法也用于建模思维徘徊和远程关联测试(RAT)以进行测试。在一般性讨论中,我们展示了该模型对具有意识的广泛认知现象的影响。
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
translated by 谷歌翻译
Expressing empathy is important in everyday conversations, and exploring how empathy arises is crucial in automatic response generation. Most previous approaches consider only a single factor that affects empathy. However, in practice, empathy generation and expression is a very complex and dynamic psychological process. A listener needs to find out events which cause a speaker's emotions (emotion cause extraction), project the events into some experience (knowledge extension), and express empathy in the most appropriate way (communication mechanism). To this end, we propose a novel approach, which integrates the three components - emotion cause, knowledge graph, and communication mechanism for empathetic response generation. Experimental results on the benchmark dataset demonstrate the effectiveness of our method and show that incorporating the key components generates more informative and empathetic responses.
translated by 谷歌翻译
Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer.Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and highquality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (∼45%).To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. R2C helps narrow the gap between humans and machines (∼65%); still, the challenge is far from solved, and we provide analysis that suggests avenues for future work.
translated by 谷歌翻译