Bimanual activities like coffee stirring, which require coordination of dual arms, are common in daily life and intractable to learn by robots. Adopting reinforcement learning to learn these tasks is a promising topic since it enables the robot to explore how dual arms coordinate together to accomplish the same task. However, this field has two main challenges: coordination mechanism and long-horizon task decomposition. Therefore, we propose the Mixline method to learn sub-tasks separately via the online algorithm and then compose them together based on the generated data through the offline algorithm. We constructed a learning environment based on the GPU-accelerated Isaac Gym. In our work, the bimanual robot successfully learned to grasp, hold and lift the spoon and cup, insert them together and stir the coffee. The proposed method has the potential to be extended to other long-horizon bimanual tasks.
translated by 谷歌翻译
深度神经网络(DNN)已在脑病变检测和分割中广泛采用。但是,在2D MRI切片中定位小病变是具有挑战性的,需要在3D上下文聚集的粒度和计算复杂性之间取得平衡。在本文中,我们提出了一种新型的视角变压器,以增强MRI特征的提取,以进行更准确的肿瘤检测。首先,所提出的变压器在3D脑扫描中收获了不同位置之间的远程相关性。其次,变压器将一堆切片功能堆叠为多个2D视图,并增强这些特征的视图,该功能大致以有效的方式实现了3D相关计算。第三,我们将提出的变压器模块部署在变压器主链中,该模块可以有效地检测到脑损伤周围的2D区域。实验结果表明,我们提出的观看式变压器在具有挑战性的大脑MRI数据集上对大脑病变检测表现良好。
translated by 谷歌翻译
因果发现旨在从观察数据中学习因果图。迄今为止,大多数因果发现方法需要将数据存储在中央服务器中。但是,数据所有者逐渐拒绝分享他们的个性化数据以避免隐私泄漏,使这项任务通过切断第一步来更加麻烦。出现拼图:$ \ texit {如何从分散数据的原因关系推断出来自分散数据的因果关系?} $本文,具有数据的添加性噪声模型假设,我们参加了开发基于渐变的学习框架命名为DAG共享的渐变学习框架联邦因果发现(DS-FCD),可以在不直接触摸本地数据的情况下学习因果图,并自然地处理数据异质性。 DS-FCD受益于每个本地模型的两级结构。第一级别学习因果图并与服务器通信以获取来自其他客户端的模型信息,而第二级别近似于因果机制,并且从其自身的数据逐步更新以适应数据异质性。此外,DS-FCD通过利用平等的非循环性约束,将整体学习任务制定为连续优化问题,这可以通过梯度下降方法自然地解决。对合成和现实世界数据集的广泛实验验证了所提出的方法的功效。
translated by 谷歌翻译
背景:基于其可变的历史视觉记录,对青少年的球形等效物进行定量预测。方法:从2019年10月到2022年3月,我们检查了来自中国成都成都6-20岁的37,586名青少年的双眼未校正视力,轴向长度,角膜曲率和轴向75,172眼。 80 \%样品由训练集和剩余的20 \%组成测试集。时间感知的长期短期记忆被用来定量预测青少年在两年半内的球形当量。结果:球形当量的测试集的平均绝对预测误差为0.273-0.257,如果我们考虑不同的历史记录和不同的预测持续时间,则从0.189-0.160到0.596-0.473。结论:时间感知时间长的短期记忆被应用于不规则采样时间序列中的时间特征,这更符合实际数据的特征,因此具有更高的适用性,并有助于较早地识别近视的进展。总体误差0.273远小于临床上可接受预测的标准,例如0.75。
translated by 谷歌翻译
基于图像补丁重建的自我监督学习方法在培训自动编码器方面取得了巨大的成功,其预训练的权重可以转移到微调图像理解的其他下游任务。但是,现有方法很少研究重建斑块的各种重要性和解剖结构的对称性,当它们应用于3D医学图像时。在本文中,我们提出了一种基于3D脑MRI分割任务的视觉变压器(VIT)的新颖的对称自动编码器(ASA)。我们猜想,强迫自动编码器恢复信息性图像区域可以收获更多的判别性表示,而不是恢复光滑的图像贴片。然后,我们采用基于梯度的指标来估计每个图像补丁的重要性。在预训练阶段,提议的自动编码器更多地注意根据梯度指标重建信息贴片。此外,我们求助于大脑结构的先验,并开发一种对称位置编码(SPE)方法,以更好地利用远距离但空间对称区域之间的相关性以获得有效的特征。实验结果表明,我们提出的细心对称自动编码器的表现优于三个大脑MRI分割基准的最先进的自我监督学习方法和医学图像分割模型。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译