The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 35K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.
translated by 谷歌翻译
许多用户转向记录检索系统(例如搜索引擎)以寻求有争议的问题的答案。回答此类用户查询通常需要识别Web文档中的响应,并根据其不同的视角汇总响应。经典文档检索系统在为用户提供一系列直接和不同的响应时下降。当然,识别文档中的此类答复是一种自然语言理解任务。在本文中,我们研究了用文件检索综合这种语言理解目标的挑战,并研究了一个新的视角导向文档检索范式。我们讨论并评估内在的自然语言理解挑战,以实现目标。在设计挑战和原则之后,我们展示并评估了一个实用的原型管道系统。我们使用原型系统进行用户调查,以便评估我们的范例的效用,并理解用户信息需要有争议的查询。
translated by 谷歌翻译
在过去的几年中,短视频在淘宝等电子商务平台上见证了迅速的增长。为了确保内容的新鲜感,平台需要每天发布大量新视频,从而使传统的点击率(CTR)预测方法遇到了该项目冷启动问题。在本文中,我们提出了一种有效的图形引导功能传输系统的礼物,以完全利用加热视频的丰富信息,以补偿冷启动的视频。具体而言,我们建立了一个异质图,其中包含物理和语义链接,以指导从热视频到冷启动视频的功能传输过程。物理链接代表明确的关系,而语义链接衡量了两个视频的多模式表示的接近性。我们精心设计功能传输功能,以使图表上不同Metapaths的不同类型的转移功能(例如,ID表示和历史统计)。我们在大型现实世界数据集上进行了广泛的实验,结果表明,我们的礼品系统的表现明显优于SOTA方法,并在TAOBAO APP的主页上为CTR带来了6.82%的提升。
translated by 谷歌翻译
从随机字段或纹理中提取信息是科学中无处不在的任务,从探索性数据分析到分类和参数估计。从物理学到生物学,它往往通过功率谱分析来完成,这通常过于有限,或者使用需要大型训练的卷积神经网络(CNNS)并缺乏解释性。在本文中,我们倡导使用散射变换(Mallat 2012),这是一种强大的统计数据,它来自CNNS的数学思想,但不需要任何培训,并且是可解释的。我们表明它提供了一种相对紧凑的汇总统计数据,具有视觉解释,并在广泛的科学应用中携带大多数相关信息。我们向该估算者提供了非技术性介绍,我们认为它可以使数据分析有利于多种科学领域的模型和参数推断。有趣的是,了解散射变换的核心操作允许人们解读CNN的内部工作的许多关键方面。
translated by 谷歌翻译
敏锐环境中的敏捷四号飞行有可能彻底改变运输,运输和搜索和救援应用。非线性模型预测控制(NMPC)最近显示了敏捷四足电池控制的有希望的结果,但依赖于高度准确的模型以获得最大性能。因此,模拟了非模型复杂空气动力学效果,不同有效载荷和参数错配的形式的不确定性将降低整体系统性能。本文提出了L1-NMPC,一种新型混合自适应NMPC,用于在线学习模型不确定性,并立即弥补它们,大大提高了与非自适应基线的性能,最小计算开销。我们所提出的体系结构推广到许多不同的环境,我们评估风,未知的有效载荷和高度敏捷的飞行条件。所提出的方法展示了巨大的灵活性和鲁棒性,在大未知干扰下的非自适应NMPC和没有任何增益调整的情况下,超过90%的跟踪误差减少。此外,相同的控制器具有相同的增益可以准确地飞行高度敏捷的赛车轨迹,该轨迹展示最高速度为70公里/小时,相对于非自适应NMPC基线提供约50%的跟踪性能提高。
translated by 谷歌翻译
二次运动的准确轨迹跟踪控制对于在混乱环境中的安全导航至关重要。但是,由于非线性动态,复杂的空气动力学效应和驱动约束,这在敏捷飞行中具有挑战性。在本文中,我们通过经验比较两个最先进的控制框架:非线性模型预测控制器(NMPC)和基于差异的控制器(DFBC),通过以速度跟踪各种敏捷轨迹,最多20 m/s(即72 km/h)。比较在模拟和现实世界环境中进行,以系统地评估这两种方法从跟踪准确性,鲁棒性和计算效率的方面。我们以更高的计算时间和数值收敛问题的风险来表明NMPC在跟踪动态不可行的轨迹方面的优势。对于这两种方法,我们还定量研究了使用增量非线性动态反演(INDI)方法添加内环控制器的效果,以及添加空气动力学阻力模型的效果。我们在世界上最大的运动捕获系统之一中进行的真实实验表明,NMPC和DFBC的跟踪误差降低了78%以上,这表明有必要使用内环控制器和用于敏捷轨迹轨迹跟踪的空气动力学阻力模型。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译