由于医学成像社区缺乏质量注释,半监督学习方法在图像语义分割任务中受到高度重视。在本文中,提出了一种先进的一致性感知伪标签的自我同学方法,以充分利用视觉变压器(VIT)和卷积神经网络(CNN)的力量。我们提出的框架由一个功能学习模块组成,该模块由VIT和CNN相互增强,以及一个适合一致性意识的指导模块。伪标签是通过特征学习模块中的CNN和VIT的视图来重复和分别使用的,以扩展数据集,并且相互有益。同时,为特征学习模块设计了扰动方案,并利用平均网络权重来开发指导模块。通过这样做,该框架结合了CNN和VIT的特征学习强度,通过双视图共同训练增强性能,并以半监督的方式实现一致性的监督。对CNN和VIT的所有替代监督模式进行了拓扑探索,经过详细验证,证明了我们在半监督医学图像分割任务上的最有希望的性能和特定设置。实验结果表明,所提出的方法在带有各种指标的公共基准数据集上实现了最先进的性能。该代码公开可用。
translated by 谷歌翻译
在这项工作中,我们以一种充满挑战的自我监督方法研究无监督的领域适应性(UDA)。困难之一是如何在没有目标标签的情况下学习任务歧视。与以前的文献直接使跨域分布或利用反向梯度保持一致,我们建议域混淆对比度学习(DCCL),以通过域难题桥接源和目标域,并在适应后保留歧视性表示。从技术上讲,DCCL搜索了最大的挑战方向,而精美的工艺领域将增强型混淆为正对,然后对比鼓励该模型向其他领域提取陈述,从而学习更稳定和有效的域名。我们还研究对比度学习在执行其他数据增强时是否必然有助于UDA。广泛的实验表明,DCCL明显优于基准。
translated by 谷歌翻译
在BIN之间传输多个对象是许多应用程序的常用任务。在机器人学中,标准方法是拿起一个对象并一次转移它。然而,抓住和拾取多个物体并立即将它们转移在一起更有效。本文介绍了一组新颖的策略,用于有效地抓住一个垃圾箱中的多个物体以将它们转移到另一个物体。该策略使机器人手能够识别最佳现成的手配置(预先掌握),并根据要掌握所需的物体计算屈曲协同作用。本文还提出了一种方法,它使用Markov决策过程(MDP)在所需的数量大于单个掌握的能力时模拟拾取传输例程。使用MDP模型,所提出的方法可以产生最佳的拾取传输程序,以最小化传输的数量,表示效率。所提出的方法已经在模拟环境和真正的机器人系统中进行了评估。结果表明,与最佳单一物体拣选 - 转移溶液相比,该方法将转移数59%和电梯数量减少58%。
translated by 谷歌翻译
人类手可以通过仅基于触觉感测的堆掌握一下目标数量的物体。为此,机器人需要在堆中掌握,从提升之前感测掌握中的物体的数量,并预测升降后将保持掌握的物体数量。这是一个具有挑战性的问题,因为在进行预测时,机器人手仍然在桩中,并且抓握中的物体对视觉系统不观察到。此外,在从堆中抬起之前手掌抓住的一些物体可能会在手中抬起时掉落。出现这种情况,因为它们被堆中的其他物体支持而不是手指。因此,机器人手应该在提升之前使用其触觉传感器来感测掌握的物体的数量。本文介绍了用于解决此问题的新型多目标抓取分析方法。它们包括掌握体积计算,触觉力分析和数据驱动的深度学习方法。该方法已经在Barrett手上实施,然后在模拟中评估和具有机器人系统的真实设置。评估结果得出结论,一旦BarretT手掌掌握了多个物体,数据驱动的模型可以在提升之前预测,在提升之后将保留在手中的物体的数量。用于我们方法的根均方误差为30.74,用于模拟的立方体和0.58个,球的距离,1.06个球体,对于真实系统的立方体,1.45。
translated by 谷歌翻译
分子动力学(MD)仿真通过用数字积分器解决牛顿运动方程来预测原子的轨迹。由于物理限制,积分器的时间步长需要很小以维持足够的精度。这限制了模拟效率。为此,我们介绍了一个基于图形神经网络(GNN)的模型,MDNet,以预测坐标和动量的演变与大的时间阶跃。此外,由于其线性复杂性相对于系统尺寸,MDNET可以容易地扩展到更大的系统。我们展示了MDNET在具有大时间步骤的4000原子系统上的性能,并显示MDNET可以预测良好的平衡和运输特性,与标准MD模拟良好对齐。
translated by 谷歌翻译
用于图像分类的最可公开的数据集是单个标签,而图像在我们的日常生活中是固有的多标记。这种注释差距使得许多预先接受的单标准分类模型在实际情况下失败。该注释问题更加关注空中图像:从传感器收集的空中数据自然地覆盖具有多个标签的相对大的陆地面积,而被广泛可用的注释空中数据集(例如,UCM,AID)是单标记的。作为手动注释的多标签空中图像将是时间/劳动,我们提出了一种新的自我校正综合域适应(SCIDA)方法,用于自动多标签学习。 SCIDA是弱监督,即,自动学习多标签图像分类模型,从使用大量的公共可用的单一标签图像。为实现这一目标,我们提出了一种新颖的标签 - 明智的自我校正(LWC)模块,以更好地探索潜在的标签相关性。该模块还使无监督的域适配(UDA)从单个到多标签数据中可能。对于模型培训,所提出的型号仅使用单一标签信息,但不需要先验知识的多标记数据;它预测了多标签空中图像的标签。在我们的实验中,用单标签的MAI-AID-S和MAI-UCM-S数据集接受培训,所提出的模型直接在收集的多场景空中图像(MAI)数据集上进行测试。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译