Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality depth sensors are expensive and can not be widely applied. While general depth sensors produce the noisy and sparse depth information, which brings the depth-based networks with irreversible interference. In this paper, we propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD). Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. In this way, the depth information can be completed and purified. Moreover, we introduce a multi-modal filtered transformer (MFT) module, which equips with three modality-specific filters to generate the transformer-enhanced feature for each modality. The proposed model works in a depth-free style during the testing phase. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time. And, the resulted depth map can help existing RGB-D SOD methods obtain significant performance gain. The source code will be publicly available at https://github.com/Xiaoqi-Zhao-DLUT/MMFT.
translated by 谷歌翻译
大多数现有的RGB-D突出物体检测方法利用卷积操作并构建复杂的交织融合结构来实现跨模型信息集成。卷积操作的固有局部连接将基于卷积的方法的性能进行了限制到天花板的性能。在这项工作中,我们从全球信息对齐和转换的角度重新思考此任务。具体地,所提出的方法(Transcmd)级联几个跨模型集成单元来构造基于自上而下的变换器的信息传播路径(TIPP)。 Transcmd将多尺度和多模态特征集成作为序列到序列上下文传播和内置于变压器上的更新过程。此外,考虑到二次复杂性W.R.T.输入令牌的数量,我们设计了具有可接受的计算成本的修补程序令牌重新嵌入策略(Ptre)。七个RGB-D SOD基准数据集上的实验结果表明,在配备TIPP时,简单的两流编码器 - 解码器框架可以超越最先进的基于CNN的方法。
translated by 谷歌翻译
现有的基于CNNS的RGB-D突出物体检测(SOD)网络全部需要在想象网上预先预先磨削以学习层次结构功能,有助于提供良好的初始化。但是,大规模数据集的收集和注释是耗时和昂贵的。在本文中,我们利用自我监督的表示学习(SSL)来设计两个借口任务:跨模型自动编码器和深度轮廓估计。我们的借口任务只需要几个和未标记的RGB-D数据集来执行预先润廓,这使得网络捕获丰富的语义上下文并降低两个模态之间的间隙,从而为下游任务提供有效的初始化。此外,对于RGB-D SOD中的跨模态融合的固有问题,我们提出了一种一致性差异聚合(CDA)模块,其将单个特征融合分成多路径融合,以实现对一致和差分信息的充分看法。 CDA模块是通用的,适用于跨模型和交叉级别融合。关于六个基准数据集的广泛实验表明,我们的自我监督净化模型对想象成的最先进的方法有利地表现出有利的。源代码将在\ textColor {红色} {\ url {https://github.com/xiaoqi-zhao-dlut/sslsod}}上公开可用。
translated by 谷歌翻译
Vision-Centric Bird-Eye-View (BEV) perception has shown promising potential and attracted increasing attention in autonomous driving. Recent works mainly focus on improving efficiency or accuracy but neglect the domain shift problem, resulting in severe degradation of transfer performance. With extensive observations, we figure out the significant domain gaps existing in the scene, weather, and day-night changing scenarios and make the first attempt to solve the domain adaption problem for multi-view 3D object detection. Since BEV perception approaches are usually complicated and contain several components, the domain shift accumulation on multi-latent spaces makes BEV domain adaptation challenging. In this paper, we propose a novel Multi-level Multi-space Alignment Teacher-Student ($M^{2}ATS$) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Multi-space Feature Aligned (MFA) student model. Specifically, DAT model adopts uncertainty guidance to sample reliable depth information in target domain. After constructing domain-invariant BEV perception, it then transfers pixel and instance-level knowledge to student model. To further alleviate the domain shift at the global level, MFA student model is introduced to align task-relevant multi-space features of two domains. To verify the effectiveness of $M^{2}ATS$, we conduct BEV 3D object detection experiments on four cross domain scenarios and achieve state-of-the-art performance (e.g., +12.6% NDS and +9.1% mAP on Day-Night). Code and dataset will be released.
translated by 谷歌翻译
Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era.
translated by 谷歌翻译
最近,图形神经网络(GNN)显着提高了图形上机器学习任务的性能。但是,这一技术突破使人们感到奇怪:GNN如何做出这样的决定,我们可以高度信心信任它的预测吗?当涉及到一些关键领域(例如生物医学)时,做出错误的决策可能会产生严重的后果,在应用它们之前解释GNN的内部工作机制至关重要。在本文中,我们为遵循消息传递方案GnnInterPreter的不同GNN的新型模型模型级解释方法提出了一种新颖的模型级解释方法,以解释GNN模型的高级决策过程。更具体地说,通过图形的连续放松和重新聚集技巧,GnnInterPreter学习了概率生成图分布,该分布在GNN模型的眼中生成了目标预测的最具代表性图。与唯一的现有作品相比,GnnInterPreter在生成具有不同类型的节点功能和边缘功能的解释图时更加有效,更灵活,而无需引入另一个Blackbox来解释GNN,而无需特定领域的知识。此外,在四个不同数据集上进行的实验研究表明,当模型是理想的情况下,GnnInterPreter生成的解释图可以匹配所需的图形模式,并揭示了如果存在任何模型。
translated by 谷歌翻译
本文介绍了对聪明差异的检查,并以三个机会的层次进行了检查。当结果在波动的载荷下方时,将差异速度和力解释为三个结果的主要差异,但是当暴露于接近载荷时,将其等效的运动和力与其结果相等。确定的运动学和元素在三种不同的负担案件下进行了假设研究。此外,三个负担案件的移动也被重新创建并集中在其当前和潜在应用以及其当前和潜在应用的好处。
translated by 谷歌翻译
机器人操作系统(ROS)为涉及生产任务,提高生产力和简化人类运营的各个领域的自动化带来了极大的自动化潜力。但是,ROS高度依赖交流,但缺乏安全的数据共享机制。确保多机器人之间的机密数据交换在多机器人交互中提出了重大挑战。在本文中,我们介绍了Authros,这是一个安全且方便的授权框架,用于ROS节点,具有绝对安全性和基于私人以太坊网络和SM算法的高可用性。据我们所知,Authros是装有ROS的机器人的第一个安全数据共享框架。该框架可以满足ROS节点之间交换机密数据的不可变性和安全性的要求。此外,提出了授权和身份验证的机制,以在没有第三方的情况下进行原子执行以确保值得信赖的数据交换。 SM2密钥交换和SM4授权加密机制均已提出用于数据传输安全性。还实施了数据摘要上传方案,以提高以太坊网络上数据查询和上传的效率。实验结果表明,它可以从6.34ms的800KB加密数据中生成摘要。通过安全分析,Authros实现了安全的数据交换,数据操作检测和节点锻造攻击保护。
translated by 谷歌翻译
设计一个管道内的攀岩机器人,该机器人操纵锋利的齿轮以研究复杂的线关系。探索管道曲线时,传统的滚动/发生管道攀爬机器人往往会滑动。提议的变速箱连接到标准双输出变速箱的最远地面平面。仪器有助于实现一个非常明确的减速序列,在该序列中,机器人在向前移动时滑动和拉动。该仪器考虑了线路关系中每个轨道上施加的力,并有意修改机器人的轨道速度,从而解锁了微调的钥匙。这使得3个输出传输需要大量时间。机器人在具有各种轴承和防滑管道弯曲的管网上的挠度证明了所提出的结构的完整性。
translated by 谷歌翻译
本文介绍了一种新的,高度结果的设置,用于将计算机视觉用于环境可持续性。浓缩动物喂养行动(CAFO)(又称密集牲畜农场或“工厂农场”)产生了巨大的肥料和污染。在冬季,倾倒粪便构成了重大的环境风险,并在许多州违反了环境法。然而,联邦环境保护署(EPA)和州机构主要依靠自我报告来监视此类“土地应用”。我们的论文做出了四个贡献。首先,我们介绍了CAFO和土地应用的环境,政策和农业环境。其次,我们提供了一个新的高效率数据集(每天至每周至每周)3M/像素卫星图像,从2018 - 20年使用威斯康星州的330个CAFO,并带有手工标记的土地应用实例(n = 57,697)。第三,我们开发了一个对象检测模型,以预测土地应用和一个系统以实时进行推断。我们表明,该系统似乎有效地检测到土地应用(PR AUC = 0.93),并且我们发现了几个异常设施,这些设施似乎定期适用。最后,我们估计2021/22冬季土地应用事件的人口流行率。我们表明,土地应用的普遍性要比设施自我报告的要高得多。该系统可以由环境监管机构和利益集团使用,该系统是在过去冬天根据该系统进行的试点探访的。总体而言,我们的应用程序展示了基于AI的计算机视觉系统解决环境符合近日图像的主要问题的潜力。
translated by 谷歌翻译