Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data
translated by 谷歌翻译
Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While several defense techniques have been proposed, and adapted, to the discrete nature of text adversarial attacks, the benefits of general-purpose regularization methods such as label smoothing for language models, have not been studied. In this paper, we study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks in both in-domain and out-of-domain settings. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
translated by 谷歌翻译
Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.
translated by 谷歌翻译
This work explores an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. We present VideoCoCa that reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, we surprisingly find that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to ``flattened frame embeddings'', yielding a strong zero-shot transfer baseline for many video-text tasks. Specifically, the frozen image encoder of a pretrained image-text CoCa takes each video frame as inputs and generates \(N\) token embeddings per frame for totally \(T\) video frames. We flatten \(N \times T\) token embeddings as a long sequence of frozen video representation and apply CoCa's generative attentional pooling and contrastive attentional pooling on top. All model weights including pooling layers are directly loaded from an image-text CoCa pretrained model. Without any video or video-text data, VideoCoCa's zero-shot transfer baseline already achieves state-of-the-art results on zero-shot video classification on Kinetics 400/600/700, UCF101, HMDB51, and Charades, as well as zero-shot text-to-video retrieval on MSR-VTT and ActivityNet Captions. We also explore lightweight finetuning on top of VideoCoCa, and achieve strong results on video question-answering (iVQA, MSRVTT-QA, MSVD-QA) and video captioning (MSR-VTT, ActivityNet, Youcook2). Our approach establishes a simple and effective video-text baseline for future research.
translated by 谷歌翻译
音频文本检索需要自然语言查询以在数据库中检索相关的音频文件。相反,文本审计检索将音频文件作为查询来检索相关的自然语言描述。大多数带有一个音频字幕数据集的文献训练检索系统,但是评估多个数据集培训的好处是没有充满反感的。此外,检索系统必须学习描述从几秒钟到几秒钟的可变长度的音频内容之间的详细句子之间的对齐。在这项工作中,我们提出了一个新的Web音频文本对以及一个新的检索框架。首先,我们提供了大约五千个Web音频纹理对的新集合,我们称为WavText5k。当用来训练我们的检索系统时,WavText5K比其他音频字幕更多地提高了性能。其次,我们的框架学会了使用文本编码器,两个音频编码器和对比度学习目标来连接语言和音频内容。组合两个音频编码器有助于处理可变长度音频。这两个贡献超过了AudioCaps和Clote的Text-Audio检索的最新表现,相对2%和16%,而音频检索则达到6%和23%。
translated by 谷歌翻译
意图检测是对话助手的任何自然语言理解(NLU)系统的关键部分。对于存在多个指令和意图的电子邮件对话,检测正确的意图是必不可少的,但很难。在这种设置中,对话上下文可以成为检测助手的用户请求的关键歧义因素。合并上下文的一种突出方法是建模过去的对话历史,例如以任务为导向的对话模型。但是,电子邮件对话的性质(长形式)限制了直接使用面向任务的对话模型中最新进展。因此,在本文中,我们提供了一个有效的转移学习框架(EMTOD),该框架允许对话模型中的最新开发方式用于长形式的对话。我们表明,提出的EMTOD框架将预训练的语言模型的意图检测性能提高了45%,而预先培训的对话模型则提高了30%,以实现任务为导向的电子邮件对话。此外,提出的框架的模块化性质允许在预训练的语言和面向任务的对话模型中为未来的任何发展提供插件。
translated by 谷歌翻译
全球综合合作对于限制全球温度的升高至关重要,同时继续经济发展,例如减少严重的不平等或实现长期经济增长。与N战略代理进行缓解气候变化的长期合作提出了一个复杂的游戏理论问题。例如,代理商可以谈判并达成气候协议,但是没有中央权力可以执行遵守这些协议。因此,设计谈判和协议框架以促进合作,允许所有代理人达到其个人政策目标并激励长期遵守,这一点至关重要。这是一个跨学科的挑战,要求在机器学习,经济学,气候科学,法律,政策,道德和其他领域进行研究人员之间的合作。特别是,我们认为机器学习是解决该领域复杂性的关键工具。为了促进这项研究,在这里,我们介绍了一个多区域综合评估模型,模拟全球气候和经济,可用于设计和评估不同谈判和协议框架的战略成果。我们还描述了如何使用多代理增强学习来使用水稻N训练理性剂。该框架是全球气候合作的基础,这是一个工作组协作和气候谈判和协议设计的竞争。在这里,我们邀请科学界使用Rice-N,机器学习,经济直觉和其他领域知识来设计和评估其解决方案。更多信息可以在www.ai4climatecoop.org上找到。
translated by 谷歌翻译
在搜救任务中,广泛使用无人驾驶汽车来分发急救箱和食品包。重要的是,这些无人机能够识别和区分标记以进行有效分布。标记位置的常见方法之一是通过使用叠加在各种颜色形状上的字符,这些字符基于不同形状,角色及其各自颜色的组合而产生各种标记。在本文中,我们提出了一个对象检测和分类管道,该管道可防止误报,并最大程度地减少对空中图像中字母数字字符和形状的错误分类。我们的方法利用传统的计算机视觉技术和无监督的机器学习方法来识别区域建议,分割图像目标并删除误报。我们利用计算轻型模型进行分类,使其易于部署在任何航空车上。
translated by 谷歌翻译
皮肤癌的发病率在全世界一直在稳步上升,这是一个严重的问题。早期诊断有可能大大减少疾病造成的伤害,但是,传统活检是一种劳动密集型和侵入性的手术。此外,许多农村社区不容易获得医院,并且不希望因为他们认为可能是小问题而访问一个。使用机器学习和深度学习进行皮肤癌分类可以提高可及性,并减少传统病变检测过程中涉及的不适程序。这些模型可以包裹在网络或移动应用程序中,并为更多的人口提供服务。在本文中,在常见皮肤病变的基准HAM10000数据集上测试了两个这样的模型。它们是带有分层k折的随机森林,并且是Mobilenetv2(在其余的论文中称为Mobilenet)。使用Tensorflow和Pytorch框架分别训练Mobilenet模型。深度学习和机器学习模型的并排比较,以及对在资源约束的移动环境中针对皮肤病变诊断的不同框架的相同深度学习模型的比较。结果表明,这些模型中的每一个在不同的分类任务上都更好。为了获得更大的总回忆,准确性和恶性黑色素瘤的检测,张量流动性是更好的选择。但是,为了检测非癌性皮肤病变,Pytorch Mobilenet被证明更好。当涉及到中等正确性的计算成本低时,随机森林是更好的算法。
translated by 谷歌翻译
我们将人机协作问题解决的问题视为一项计划任务,再加上自然语言交流。我们的框架由三个组成部分组成 - 一种自然语言引擎,将语言话语解析为正式代表,反之亦然,这是一个概念学习者,该概念学习者基于与用户的有限互动来诱导计划的广义概念,以及解决方案的HTN规划师,以解决该计划。基于人类互动的任务。我们说明了该框架通过在基于Minecraft的Blocksworld域中的协作构建任务中证明协作问题解决的关键挑战的能力。随附的演示视频可在https://youtu.be/q1pwe4aahf0上获得。
translated by 谷歌翻译