智能论文笔记

SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection

Chao-Peng Chen , Jun-Wei Hsieh , Ping-Yang Chen , Yi-Kuan Hsieh , Bor-Shiun Wang

分类：计算机视觉 | 人工智能

2022-12-02

Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they ignore spatial information and scaling changes between objects, giving rise to blurry or wrong boundaries. In addition to these, they also neglect the interactive information of two different images. To alleviate these problems, we propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue. In this paper, three modules are proposed that include relation-aware, scale-aware, and cross-transformer to tackle the problem of scene change detection more effectively. To verify our model, we tested three public datasets, including LEVIR-CD, WHU-CD, and DSFIN, and obtained SoTA accuracy. Our code is available at https://github.com/f64051041/SARAS-Net.

translated by 谷歌翻译

RCDT: Relational Remote Sensing Change Detection with Transformer

Kaixuan Lu , Xiao Huang

分类：计算机视觉

2022-12-09

Deep learning based change detection methods have received wide attentoion, thanks to their strong capability in obtaining rich features from images. However, existing AI-based CD methods largely rely on three functionality-enhancing modules, i.e., semantic enhancement, attention mechanisms, and correspondence enhancement. The stacking of these modules leads to great model complexity. To unify these three modules into a simple pipeline, we introduce Relational Change Detection Transformer (RCDT), a novel and simple framework for remote sensing change detection tasks. The proposed RCDT consists of three major components, a weight-sharing Siamese Backbone to obtain bi-temporal features, a Relational Cross Attention Module (RCAM) that implements offset cross attention to obtain bi-temporal relation-aware features, and a Features Constrain Module (FCM) to achieve the final refined predictions with high-resolution constraints. Extensive experiments on four different publically available datasets suggest that our proposed RCDT exhibits superior change detection performance compared with other competing methods. The therotical, methodogical, and experimental knowledge of this study is expected to benefit future change detection efforts that involve the cross attention mechanism.

translated by 谷歌翻译

dual unet:a novel siamese network for change detection with cascade differential fusion

Kaixuan Jiang , Ja Liu , Fang Liu , Wenhua Zhang , Yangguang Liu

分类：计算机视觉

2022-08-12

遥感图像的更改检测（CD）是通过分析两个次时图像之间的差异来检测变化区域。它广泛用于土地资源规划，自然危害监测和其他领域。在我们的研究中，我们提出了一个新型的暹罗神经网络，用于变化检测任务，即双UNET。与以前的单独编码BITEMAL图像相反，我们设计了一个编码器差分注意模块，以关注像素的空间差异关系。为了改善网络的概括，它计算了咬合图像之间的任何像素之间的注意力权重，并使用它们来引起更具区别的特征。为了改善特征融合并避免梯度消失，在解码阶段提出了多尺度加权方差图融合策略。实验表明，所提出的方法始终优于流行的季节性变化检测数据集最先进的方法。

translated by 谷歌翻译

RDP-Net: Region Detail Preserving Network for Change Detection

Hongjia Chen , Fangling Pu , Rui Yang , Rui Tang , Xin Xu

分类：计算机视觉

2022-02-20

Change detection (CD) is an essential earth observation technique. It captures the dynamic information of land objects. With the rise of deep learning, convolutional neural networks (CNN) have shown great potential in CD. However, current CNN models introduce backbone architectures that lose detailed information during learning. Moreover, current CNN models are heavy in parameters, which prevents their deployment on edge devices such as UAVs. In this work, we tackle this issue by proposing RDP-Net: a region detail preserving network for CD. We propose an efficient training strategy that constructs the training tasks during the warmup period of CNN training and lets the CNN learn from easy to hard. The training strategy enables CNN to learn more powerful features with fewer FLOPs and achieve better performance. Next, we propose an effective edge loss that increases the penalty for errors on details and improves the network's attention to details such as boundary regions and small areas. Furthermore, we provide a CNN model with a brand new backbone that achieves the state-of-the-art empirical performance in CD with only 1.70M parameters. We hope our RDP-Net would benefit the practical CD applications on compact devices and could inspire more people to bring change detection to a new level with the efficient training strategy. The code and models are publicly available at https://github.com/Chnja/RDPNet.

translated by 谷歌翻译

SiamixFormer: A Siamese Transformer Network For Building Detection And Change Detection From Bi-Temporal Remote Sensing Images

Amir mohammadian , Foad Ghaderi

分类：计算机视觉

2022-08-01

使用遥感图像进行建筑检测和变更检测可以帮助城市和救援计划。此外，它们可用于自然灾害后的建筑损害评估。当前，大多数用于建筑物检测的现有模型仅使用一个图像（预拆架图像）来检测建筑物。这是基于这样的想法：由于存在被破坏的建筑物，后沙仪图像降低了模型的性能。在本文中，我们提出了一种称为暹罗形式的暹罗模型，该模型使用前和垃圾后图像作为输入。我们的模型有两个编码器，并具有分层变压器体系结构。两个编码器中每个阶段的输出都以特征融合的方式给予特征融合，以从disasaster图像生成查询，并且（键，值）是从disasaster图像中生成的。为此，在特征融合中也考虑了时间特征。在特征融合中使用颞变压器的另一个优点是，与CNN相比，它们可以更好地维持由变压器编码器产生的大型接受场。最后，在每个阶段，将颞变压器的输出输入简单的MLP解码器。在XBD和WHU数据集上评估了暹罗形式模型，用于构建检测以及Levir-CD和CDD数据集，以进行更改检测，并可以胜过最新的。

translated by 谷歌翻译

A Transformer-Based Siamese Network for Change Detection

Wele Gedara Chaminda Bandara , Vishal M. Patel

分类：计算机视觉

2022-01-04

本文介绍了一种基于变压器的暹罗网络架构（由Cradiformer缩写），用于从一对共同登记的遥感图像改变检测（CD）。与最近的CD框架不同，该CD框架基于完全卷积的网络（CoundNets），该方法将具有多层感知（MLP）解码器的分层结构化变压器编码器统一，以暹罗网络架构中的多层感知器，以有效地呈现所需的多尺度远程详细信息用于准确的CD。两个CD数据集上的实验表明，所提出的端到端培训变换器架构比以前的同行实现更好的CD性能。我们的代码可在https://github.com/wgcban/changeFormer获得。

translated by 谷歌翻译

IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection

Qing Guo , Ruofei Wang , Rui Huang , Shuifa Sun , Yuxiang Zhang

分类：计算机视觉

2022-07-15

Change detection (CD) aims to detect change regions within an image pair captured at different times, playing a significant role in diverse real-world applications. Nevertheless, most of the existing works focus on designing advanced network architectures to map the feature difference to the final change map while ignoring the influence of the quality of the feature difference. In this paper, we study the CD from a different perspective, i.e., how to optimize the feature difference to highlight changes and suppress unchanged regions, and propose a novel module denoted as iterative difference-enhanced transformers (IDET). IDET contains three transformers: two transformers for extracting the long-range information of the two images and one transformer for enhancing the feature difference. In contrast to the previous transformers, the third transformer takes the outputs of the first two transformers to guide the enhancement of the feature difference iteratively. To achieve more effective refinement, we further propose the multi-scale IDET-based change detection that uses multi-scale representations of the images for multiple feature difference refinements and proposes a coarse-to-fine fusion strategy to combine all refinements. Our final CD method outperforms seven state-of-the-art methods on six large-scale datasets under diverse application scenarios, which demonstrates the importance of feature difference enhancements and the effectiveness of IDET.

translated by 谷歌翻译

RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation

Guijie Zhu , Zhun Fan , Jiacheng Liu , Duan Yuan , Peili Ma , Meihua Wang , Weihua Sheng , Kelvin C. P. Wang

分类：计算机视觉 | 机器学习

2022-07-28

人行道表面数据的获取和评估在路面条件评估中起着至关重要的作用。在本文中，提出了一个称为RHA-NET的自动路面裂纹分割的有效端到端网络，以提高路面裂纹分割精度。 RHA-NET是通过将残留块（重阻）和混合注意块集成到编码器架构结构中来构建的。这些重组用于提高RHA-NET提取高级抽象特征的能力。混合注意块旨在融合低级功能和高级功能，以帮助模型专注于正确的频道和裂纹区域，从而提高RHA-NET的功能表现能力。构建并用于训练和评估所提出的模型的图像数据集，其中包含由自设计的移动机器人收集的789个路面裂纹图像。与其他最先进的网络相比，所提出的模型在全面的消融研究中验证了添加残留块和混合注意机制的功能。此外，通过引入深度可分离卷积生成的模型的轻加权版本可以更好地实现性能和更快的处理速度，而U-NET参数数量的1/30。开发的系统可以在嵌入式设备Jetson TX2（25 fps）上实时划分路面裂纹。实时实验拍摄的视频将在https://youtu.be/3xiogk0fig4上发布。

translated by 谷歌翻译

UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer

Haonan Wang , Peng Cao , Jiaqi Wang , Osmar R. Zaiane

分类：计算机视觉 | 机器学习

2021-09-09

最新的语义分段方法采用具有编码器解码器架构的U-Net框架。 U-Net仍然具有挑战性，具有简单的跳过连接方案来模拟全局多尺度上下文：1）由于编码器和解码器级的不兼容功能集的问题，并非每个跳过连接设置都是有效的，甚至一些跳过连接对分割性能产生负面影响; 2）原始U-Net比某些数据集上没有任何跳过连接的U-Net更糟糕。根据我们的调查结果，我们提出了一个名为Uctransnet的新分段框架（在U-Net中的提议CTRANS模块），从引导机制的频道视角。具体地，CTRANS模块是U-NET SKIP连接的替代，其包括与变压器（命名CCT）和子模块通道 - 明智的跨关注进行多尺度信道交叉融合的子模块（命名为CCA）以指导熔融的多尺度通道 - 明智信息，以有效地连接到解码器功能以消除歧义。因此，由CCT和CCA组成的所提出的连接能够替换原始跳过连接以解决精确的自动医学图像分割的语义间隙。实验结果表明，我们的UCTRANSNET产生更精确的分割性能，并通过涉及变压器或U形框架的不同数据集和传统架构的语义分割来实现一致的改进。代码：https：//github.com/mcgregorwwwww/uctransnet。

translated by 谷歌翻译

UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery

Libo Wang , Rui Li , Ce Zhang , Shenghui Fang , Chenxi Duan , Xiaoliang Meng , Peter M. Atkinson

分类：计算机视觉

2021-09-18

在广泛的实用应用中，需要进行远程感知的城市场景图像的语义细分，例如土地覆盖地图，城市变化检测，环境保护和经济评估。在深度学习技术的快速发展，卷积神经网络（CNN）的迅速发展。）多年来一直在语义细分中占主导地位。 CNN采用层次特征表示，证明了局部信息提取的强大功能。但是，卷积层的本地属性限制了网络捕获全局上下文。最近，作为计算机视觉领域的热门话题，Transformer在全球信息建模中展示了其巨大的潜力，从而增强了许多与视觉相关的任务，例如图像分类，对象检测，尤其是语义细分。在本文中，我们提出了一个基于变压器的解码器，并为实时城市场景细分构建了一个类似Unet的变压器（UneTformer）。为了有效的分割，不显示器将轻量级RESNET18选择作为编码器，并开发出有效的全球关注机制，以模拟解码器中的全局和局部信息。广泛的实验表明，我们的方法不仅运行速度更快，而且与最先进的轻量级模型相比，其准确性更高。具体而言，拟议的未显示器分别在无人机和洛夫加数据集上分别达到了67.8％和52.4％的MIOU，而在单个NVIDIA GTX 3090 GPU上输入了512x512输入的推理速度最多可以达到322.4 fps。在进一步的探索中，拟议的基于变压器的解码器与SWIN变压器编码器结合使用，还可以在Vaihingen数据集上实现最新的结果（91.3％F1和84.1％MIOU）。源代码将在https://github.com/wanglibo1995/geoseg上免费获得。

translated by 谷歌翻译

S2Looking: A Satellite Side-Looking Dataset for Building Change Detection

Li Shen , Yao Lu , Hao Chen , Hao Wei , Donghai Xie , Jiabao Yue , Rui Chen , Shouye Lv , Bitao Jiang

分类：计算机视觉 | 人工智能

2021-07-20

建筑变更检测是许多重要应用，特别是在军事和危机管理领域。最近用于变化检测的方法已转向深度学习，这取决于其培训数据的质量。因此，大型注释卫星图像数据集的组装对于全球建筑更改监视是必不可少的。现有数据集几乎完全提供近Nadir观看角度。这限制了可以检测到的更改范围。通过提供更大的观察范围，光学卫星的滚动成像模式提出了克服这种限制的机会。因此，本文介绍了S2Looking，一个建筑变革检测数据集，其中包含以各种偏离Nadir角度捕获的大规模侧视卫星图像。 DataSet由5000个批次图像对组成的农村地区，并在全球范围内超过65,920个辅助的变化实例。数据集可用于培训基于深度学习的变更检测算法。它通过提供（1）更大的观察角来扩展现有数据集; （2）大照明差异; （3）额外的农村形象复杂性。为了便于{该数据集的使用，已经建立了基准任务，并且初步测试表明，深度学习算法发现数据集明显比最接近的近Nadir DataSet，Levir-CD +更具挑战性。因此，S2Looking可能会促进现有的建筑变革检测算法的重要进步。 DataSet可在https://github.com/s2looking/使用。

translated by 谷歌翻译

Transformers in Remote Sensing: A Survey

Abdulaziz Amer Aleissaee , Amandeep Kumar , Rao Muhammad Anwer , Salman Khan , Hisham Cholakkal , Gui-Song Xia , Fahad Shahbaz khan

分类：计算机视觉

2022-09-02

在过去的十年中，基于深度学习的算法在遥感图像分析的不同领域中广泛流行。最近，最初在自然语言处理中引入的基于变形金刚的体系结构遍布计算机视觉领域，在该字段中，自我发挥的机制已被用作替代流行的卷积操作员来捕获长期依赖性。受到计算机视觉的最新进展的启发，遥感社区还见证了对各种任务的视觉变压器的探索。尽管许多调查都集中在计算机视觉中的变压器上，但据我们所知，我们是第一个对基于遥感中变压器的最新进展进行系统评价的人。我们的调查涵盖了60多种基于变形金刚的60多种方法，用于遥感子方面的不同遥感问题：非常高分辨率（VHR），高光谱（HSI）和合成孔径雷达（SAR）图像。我们通过讨论遥感中变压器的不同挑战和开放问题来结束调查。此外，我们打算在遥感论文中频繁更新和维护最新的变压器，及其各自的代码：https：//github.com/virobo-15/transformer-in-in-remote-sensing

translated by 谷歌翻译

HTML版本

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

Reza Azad , René Arimond , Ehsan Khodapanah Aghdam , Amirhosein Kazerouni , Dorit Merhof

分类：计算机视觉

2022-12-27

Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.

translated by 谷歌翻译

IDAN: Image Difference Attention Network for Change Detection

Hongkun Liu , Zican Hu , Qichen Ding , Xueyun Chen

分类：计算机视觉

2022-08-17

遥感图像变化检测在灾难评估和城市规划中至关重要。主流方法是使用编码器模型来检测两个输入图像的更改区域。由于遥感图像的变化内容具有广泛范围和多样性的特征，因此有必要通过增加注意机制来提高网络的检测准确性，这通常包括：挤压和激发块，非本地和非本地块和卷积阻止注意模块等。这些方法考虑了通道或通道内部不同位置特征的重要性，但无法感知输入图像之间的差异。在本文中，我们提出了一个新颖的图像差异注意网络（IDAN）。在图像预处理阶段，我们使用预训练模型来提取两个输入图像之间的特征差异，以获得特征差异图（FD-MAP）和用于边缘检测的Chany以获得边缘差异图（ED-MAP）。在图像特征提取阶段中，FD-MAP和ED-MAP分别输入了特征差异注意模块和边缘补偿模块，以优化IDAN提取的功能。最后，通过特征差异操作获得了变化检测结果。 Idan全面考虑了图像的区域和边缘特征的差异，从而优化了提取的图像特征。实验结果表明，与WHU数据集和Levir-CD数据集的基线模型相比，IDAN的F1得分分别提高了1.62％和1.98％。

translated by 谷歌翻译

Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection

Xiurong Jiang , Lin Zhu , Yifan Hou , Hui Tian

分类：计算机视觉

2022-07-07

RGB-thermal显着对象检测（RGB-T SOD）旨在定位对齐可见的和热红外图像对的共同突出对象，并准确地分割所有属于这些对象的像素。由于对热图像的照明条件不敏感，它在诸如夜间和复杂背景之类的具有挑战性的场景中很有希望。因此，RGB-T SOD的关键问题是使两种方式的功能相互补充并互相调整，因为不可避免的是，由于极端光条件和诸如极端光条件和诸如极端光明条件和热跨界。在本文中，我们提出了一个针对RGB-T SOD的新型镜子互补变压器网络（MCNET）。具体而言，我们将基于变压器的特征提取模块引入RGB和热图像的有效提取分层特征。然后，通过基于注意力的特征相互作用和基于串行的多尺度扩张卷积（SDC）特征融合模块，提出的模型实现了低级特征的互补相互作用以及深度特征的语义融合。最后，基于镜子互补结构，即使是一种模态也可以准确地提取两种方式的显着区域也是无效的。为了证明在现实世界中具有挑战性的场景下提出的模型的鲁棒性，我们基于自动驾驶域中使用的大型公共语义分段RGB-T数据集建立了一种新颖的RGB-T SOD数据集VT723。基准和VT723数据集上的昂贵实验表明，所提出的方法优于最先进的方法，包括基于CNN的方法和基于变压器的方法。该代码和数据集将在稍后在https://github.com/jxr326/swinmcnet上发布。

translated by 谷歌翻译

TINYCD: A (Not So) Deep Learning Model For Change Detection

Andrea Codegoni , Gabriele Lombardi , Alessandro Ferrari

分类：计算机视觉 | 机器学习

2022-07-26

更改检测的目的（CD）是通过比较在不同时间拍摄的两张图像来检测变化。 CD的挑战性部分是跟踪用户想要突出显示的变化，例如新建筑物，并忽略了由于外部因素（例如环境，照明条件，雾或季节性变化）而引起的变化。深度学习领域的最新发展使研究人员能够在这一领域取得出色的表现。特别是，时空注意的不同机制允许利用从模型中提取的空间特征，并通过利用这两个可用图像来以时间方式将它们相关联。不利的一面是，这些模型已经变得越来越复杂且大，对于边缘应用来说通常是不可行的。当必须将模型应用于工业领域或需要实时性能的应用程序时，这些都是限制。在这项工作中，我们提出了一个名为TinyCD的新型模型，证明既轻量级又有效，能够实现较少参数13-150x的最新技术状态。在我们的方法中，我们利用了低级功能比较图像的重要性。为此，我们仅使用几个骨干块。此策略使我们能够保持网络参数的数量较低。为了构成从这两个图像中提取的特征，我们在参数方面引入了一种新颖的经济性，混合块能够在时空和时域中交叉相关的特征。最后，为了充分利用计算功能中包含的信息，我们定义了能够执行像素明智分类的PW-MLP块。源代码，模型和结果可在此处找到：https：//github.com/andreacodegoni/tiny_model_4_cd

translated by 谷歌翻译

LEDCNet: A Lightweight and Efficient Semantic Segmentation Algorithm Using Dual Context Module for Extracting Ground Objects from UAV Aerial Remote Sensing Images

Xiaoxiang Han , Yiman Liu , Gang Liu , Qiaohong Liu

分类：计算机视觉

2022-12-16

Semantic segmentation of UAV aerial remote sensing images provides a more efficient and convenient surveying and mapping method for traditional surveying and mapping. In order to make the model lightweight and improve a certain accuracy, this research developed a new lightweight and efficient network for the extraction of ground features from UAV aerial remote sensing images, called LDMCNet. Meanwhile, this research develops a powerful lightweight backbone network for the proposed semantic segmentation model. It is called LDCNet, and it is hoped that it can become the backbone network of a new generation of lightweight semantic segmentation algorithms. The proposed model uses dual multi-scale context modules, namely the Atrous Space Pyramid Pooling module (ASPP) and the Object Context Representation module (OCR). In addition, this research constructs a private dataset for semantic segmentation of aerial remote sensing images from drones. This data set contains 2431 training sets, 945 validation sets, and 475 test sets. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G floating-point operations (FLOPs), achieving an average intersection-over-union ratio (mIoU) of 71.12%. 7.88% higher than the baseline model. In order to verify the effectiveness of the proposed model, training on the public datasets "LoveDA" and "CITY-OSM" also achieved excellent results, achieving mIoU of 65.27% and 74.39%, respectively.

translated by 谷歌翻译

How to Reduce Change Detection to Semantic Segmentation

Guo-Hua Wang , Bin-Bin Gao , Chengjie Wang

分类：计算机视觉 | 人工智能

2022-06-15

变更检测（CD）旨在识别在不同时间拍摄的图像对中发生的变化。先前的方法从头开始设计特定的网络，以预测像素级别中的更改口罩，并与一般分割问题斗争。在本文中，我们提出了一种新的范式，该范式将CD降低到语义分割，这意味着调整现有且强大的语义分割网络以求解CD。这种新的范式方便地享受主流语义分割技术，以解决CD中的一般细分问题。因此，我们可以集中精力研究如何检测变化。我们提出了一种新颖而重要的见解，即CD中存在不同的变化类型，应分别学习它们。基于它，我们设计了一个名为MTF的模块来提取更改信息和融合时间功能。 MTF具有高解释性，并揭示了CD的基本特征。并且大多数分割网络都可以通过我们的MTF模块来解决CD问题。最后，我们提出了C-3PO，该网络可检测像素级别的变化。 C-3PO在没有铃铛和哨子的情况下实现最先进的表现。它很简单但有效，可以被视为该领域的新基线。我们的代码将可用。

translated by 谷歌翻译

Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks

Qinghui Liu , Michael Kampffmeyer , Robert Jenssen , Arnt-Børre Salberg

分类：计算机视觉

2021-11-06

多模态数据在遥感（RS）中变得容易获得，并且可以提供有关地球表面的互补信息。因此，多模态信息的有效融合对于卢比的各种应用是重要的，而且由于域差异，噪音和冗余，也是非常具有挑战性的。缺乏有效和可扩展的融合技术，用于遍布多种模式编码器和完全利用互补信息。为此，我们提出了一种基于新型金字塔注意融合（PAF）模块和门控融合单元（GFU）的多模态遥感数据的新型多模态网络（Multimodnet）。 PAF模块旨在有效地从每个模态中获得丰富的细粒度上下文表示，具有内置的交叉级别和巧克力关注融合机制，GFU模块利用了新颖的门控机制，用于早期合并特征，从而降低隐藏的冗余和噪音。这使得可以有效地提取补充方式来提取最迟到的特征融合的最有价值和互补的信息。两个代表性RS基准数据集的广泛实验证明了多模态土地覆盖分类的多模型的有效性，鲁棒性和优越性。

translated by 谷歌翻译

DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

Wei Sun , Chengao Liu , Linyan Zhang , Yu Li , Pengxu Wei , Chang Liu , Jialing Zou , Jianbin Jiao , Qixiang Ye

分类：计算机视觉

2022-12-16

Camouflaged objects are seamlessly blended in with their surroundings, which brings a challenging detection task in computer vision. Optimizing a convolutional neural network (CNN) for camouflaged object detection (COD) tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue which inevitably leads to missing or redundant regions of objects. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among image regions. In order to obtain feature maps that could activate full object extent, keeping the segmental results from being overwhelmed by noisy features, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed. It reasons the relations between long-range-aware representations and multi-scale local details to make the enhanced representation fully highlight the object regions and eliminate noise on non-object regions. Specifically, a vanilla ViT pretrained with self-supervised learning (SSL) is employed to model long-range dependencies among image regions. A ResNet is employed to enable learning fine-grained spatial local details in multiple scales. Then, to effectively retrieve object-related details, a Relation-Based Querying (RBQ) module is proposed to explore window-based interactions between the global representations and the multi-scale local details. Extensive experiments are conducted on the widely used COD datasets and show that our DQnet outperforms the current state-of-the-arts.

translated by 谷歌翻译