我们提出了Tacobot,这是为首届Alexa Prive Taskbot Challenge构建的面向任务的对话系统,该系统可帮助用户完成多步骤烹饪和家庭装修任务。Tacobot的设计采用以用户为中心的原则,并渴望提供协作且易于访问的对话体验。为此,它具有准确的语言理解,灵活的对话管理和引人入胜的响应生成。此外,Tacobot还以强大的搜索引擎和自动化的端到端测试套件为支持。在引导Tacobot的开发中,我们探索了一系列数据增强策略,以训练先进的神经语言处理模型,并通过收集的真实对话不断改善对话经验。在半决赛结束时,Tacobot的平均评分为3.55/5.0。
translated by 谷歌翻译
360 {\ deg}场景中基于图像的显着对象检测(ISOD)对于理解和应用全景信息非常重要。但是,由于缺乏大型,复杂,高分辨率且标记良好的数据集,对360 {\ deg} ISOD的研究尚未被广泛探索。为此,我们构建了一个大型360 {\ deg} ISOD数据集,具有对象级像素的依次投影(ERP),其中包含不少于2K分辨率的丰富全景场景,并且是360 {最大的数据集,是最大的数据集{ \ deg} ISOD据我们所知。通过观察数据,我们发现当前的方法在全景方案中面临三个重大挑战:不同的失真度,不连续的边缘效应和可变的对象量表。受到人类观察过程的启发,我们提出了一种基于样本自适应视图变压器(SAVT)模块的视图显着对象检测方法,并带有两个子模块,以减轻这些问题。具体而言,子模块视图变压器(VT)基于不同种类的变换,在不同视图下学习各种特征,并增强模型的变形,边缘效果和对象量表的特征耐受性。此外,亚模块样品自适应融合(SAF)是根据各种样品特征调整不同变换分支的权重,并使转换的增强功能更适当地融合。 20种最先进的ISOD方法的基准结果表明,构造的数据集非常具有挑战性。此外,详尽的实验验证了所提出的方法是实际的,并且表现优于最先进的方法。
translated by 谷歌翻译
文献中有许多不同的方法来解释机器学习结果。但是,这些方法的方法有所不同,通常没有提供相同的解释。在本文中,我们考虑了两种最新方法:集成梯度(Sundararajan,Taly和Yan,2017年)和基线Shapley(Sundararajan和Najmi,2020年)。原始作者已经研究了两种方法的公理属性,并提供了一些比较。我们的工作为表格数据提供了一些有关其比较行为的其他见解。我们讨论两者提供相同解释及其不同的常见情况。我们还使用仿真研究来检查具有Relu激活函数的神经网络拟合模型时的差异。
translated by 谷歌翻译
医疗保健自动化的机会可以改善临床医生的吞吐量。一个这样的例子是辅助工具记录诊断代码时,当临床医生写笔记时。我们使用课程学习研究了医学法规预测的自动化,这是机器学习模型的培训策略,可逐渐将学习任务的硬度从易于到困难提高。课程学习的挑战之一是课程的设计 - 即,在逐渐增加难度的任务设计中。我们提出了分层课程学习(HICU),这是一种在输出空间中使用图形结构的算法,以设计用于多标签分类的课程。我们为多标签分类模型创建课程,以预测患者自然语言描述的ICD诊断和程序代码。通过利用ICD代码的层次结构,该层次基于人体的各种器官系统进行诊断代码,我们发现我们的建议课程改善了基于反复,卷积和基于变压器的体系结构的基于神经网络的预测模型的概括。我们的代码可在https://github.com/wren93/hicu-icd上找到。
translated by 谷歌翻译
本文研究了在潜在的结果框架中使用深神经网络(DNN)的平均治疗效果(ATE)的估计和推理。在一些规则性条件下,观察到的响应可以作为与混杂变量和治疗指标作为自变量的平均回归问题的响应。使用这种配方,我们研究了通过使用特定网络架构的DNN回归基于估计平均回归函数的两种尝试估计和推断方法。我们表明ATE的两个DNN估计在底层真正的均值回归模型上的一些假设下与无维一致性率一致。我们的模型假设可容纳观察到的协变量的潜在复杂的依赖结构,包括治疗指标和混淆变量之间的潜在因子和非线性相互作用。我们还基于采样分裂的思想,确保精确推理和不确定量化,建立了我们估计的渐近常态。仿真研究和实际数据应用证明了我们的理论调查结果,支持我们的DNN估计和推理方法。
translated by 谷歌翻译
与准确性和计算成本具有密切关系的图像分辨率在网络培训中发挥了关键作用。在本文中,我们观察到缩小图像保留相对完整的形状语义,但是失去了广泛的纹理信息。通过形状语义的一致性和纹理信息的脆弱的启发,我们提出了一个名为时间性解决方案递减的新颖培训策略。其中,我们在时域中随机将训练图像降低到较小的分辨率。在使用缩小图像和原始图像的替代训练期间,图像中的不稳定纹理信息导致纹理相关模式与正确标签之间的相关性较弱,自然强制执行模型,以更多地依赖于稳健的形状属性。符合人类决策规则。令人惊讶的是,我们的方法大大提高了卷积神经网络的计算效率。在Imagenet分类上,使用33%的计算量(随机将培训图像随机降低到112 $ \倍112美元)仍然可以将resnet-50从76.32%提高到77.71%,并使用63%的计算量(随机减少在50%时期的训练图像到112 x 112)可以改善resnet-50至78.18%。
translated by 谷歌翻译
我们提出了一种叫做SkullEngine的多级粗内CNN框架,可通过协作,集成和可扩展的JSD模型和三个分段和地标检测细化模型进行高分辨率分割和大规模地标检测。我们在临床数据集中评估了由170 CBCT / CT图像组成的临床数据集,用于分割2骨骼(Midface和Mabless)的任务,并在骨骼,牙齿和软组织上检测175个临床普通的地标。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译