当前的抽象摘要模型要么仅通过突出源文档的一部分而缺乏明显的解释性或提供不完整的理由。为此,我们提出了摘要程序(SP),这是一个由二进制树的(有序)列表组成的可解释的模块化框架,每个框架都编码来自源文档的抽象摘要句子的分步生成过程。一个摘要程序每个摘要句子包含一个根节点,一棵不同的树将每个摘要句子(根节点)连接到派生的文档句子(叶节点),其中包含中间生成的句子的连接节点。边缘代表涉及摘要的不同模块化操作,例如句子融合,压缩和释义。我们首先建议通过神经模块提出有效的最佳搜索方法,SP搜索通过直接优化Rouge分数来识别人类摘要的SP搜索。接下来,使用这些程序作为自动监督,我们建议使用生成摘要程序的SEQ2SEQ模型,然后执行以获取最终摘要。我们证明,SP搜索有效地代表了使用通常忠于其预期行为的模块的人类摘要背后的生成过程。我们还进行了一项仿真研究,以表明汇总计划通过允许人类更好地模拟模型推理来改善摘要模型的解释性。汇总计划构成了朝着可解释和模块化的抽象摘要迈出的有希望的步骤,这是先前主要通过黑框端到端神经系统解决的复杂任务。我们的代码可从https://github.com/swarnahub/summarization Programs获得
translated by 谷歌翻译
在抽象性摘要的背景下,已广泛讨论了不忠摘要的问题。尽管提取性摘要不太容易出现抽象性摘要的普遍不忠问题,但这是否意味着提取性等于忠实?原来答案是否定的。在这项工作中,我们定义了一种类型学,具有五种类型的广泛的不忠问题(包括和超越未登录),这些问题可能出现在提取性摘要中,包括不正确的核心,不完整的核心,不正确的话语,不完整的话语,不完整的话语以及其他误导性信息。我们要求人类在1500个由15种不同的提取系统产生的英语摘要中标记这些问题。我们发现,其中33%的摘要至少有五个问题中的一个。为了自动检测这些问题,我们发现5个现有的忠诚评估指标与人类判断力的相关性很差。为了解决这个问题,我们提出了一种新的度量标准,该指标旨在检测不忠的提取性摘要,并显示出最佳性能。我们希望我们的工作能够提高对提取性总结中不忠问题的认识,并帮助将来的工作评估和解决这些问题。我们的数据和代码可在https://github.com/zhangshiyue/extractive_is_not_faithful上公开获取
translated by 谷歌翻译
学生对教学的评估(集合)被广泛用于大学。通常在静态PDF报告中为讲师总结了设置的结果。该报告通常包括定量评级的摘要统计数据和未分类的开放式学生评论列表。原始评论的组织不足和汇总会阻碍那些解释有关充分利用信息反馈,准确推断并设计适当教学改进的报告的人。在这项工作中,我们介绍了一个新颖的系统,集合,该系统利用情感分析,提取方面,摘要和可视化技术,以向讲师和其他审阅者提供有组织的插图。来自不同部门的十个大学教授是该系统的评估者,所有人都同意Setsum可以帮助他们更有效地解释集合结果;十分之六的讲师更喜欢我们的系统,而不是标准的静态PDF报告(而其余4个则希望两者都具有两者)。这表明我们的工作有可能在未来改革设定的报告惯例。我们的代码可从https://github.com/evahuyn/setsum获得
translated by 谷歌翻译
以前的语音(POS)归纳模型通常假设某些独立假设(例如,马尔可夫,单向,本地依赖性),这些假设不具有真实语言。例如,主题 - 动词协议可以是长期和双向的。为了促进灵活的依赖性建模,我们提出了一个蒙版的言论部分模型(MPOSM),灵感来自蒙版语言模型(MLM)的最新成功。 MPOSM可以通过掩盖POS重建的目的对任意标签依赖性建模并执行POS归纳。我们在英语Penn WSJ数据集以及包含10种不同语言的通用树库中取得了竞争成果。尽管对长期依赖性进行建模应该理想地有助于这项任务,但我们的消融研究表明,不同语言的趋势不同。为了更好地理解这种现象,我们设计了一个新颖的合成实验,可以专门诊断该模型学习标签一致性的能力。令人惊讶的是,我们发现即使强大的基线也无法在非常简化的设置中始终如一地解决这个问题:相邻单词之间的一致性。尽管如此,MPOSM仍能取得更好的性能。最后,我们进行了详细的错误分析,以阐明其他剩余挑战。我们的代码可从https://github.com/owenzx/mposm获得
translated by 谷歌翻译
多语言代币器是多语言神经机器翻译的基本组成部分。它是通过多语种语料库训练的。由于偏斜的数据分布被认为是有害的,因此通常使用采样策略来平衡语料库中的语言。但是,很少有作品系统地回答了令牌训练中的语言失衡如何影响下游的表现。在这项工作中,我们分析了翻译性能如何随着语言之间的数据比率而变化,而在令牌培训语料库中的变化。我们发现,虽然当语言更加同样地采样时,通常会观察到相对较好的性能,但下游性能对语言不平衡的性能比我们通常预期的要强。在执行任务之前,可以警告两个功能,即UNK速率和接近角色水平,可以警告下游性能不佳。我们还将令牌训练的语言抽样与模型培训的采样分开,并表明该模型对后者更敏感。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译