translated by 谷歌翻译
神经网络合奏,例如贝叶斯神经网络(BNNS),在不确定性估计和鲁棒性领域表现出了成功。但是,至关重要的挑战禁止其在实践中使用。 BNN需要大量预测来产生可靠的结果,从而大大增加了计算成本。为了减轻这个问题,我们提出了空间平滑,这是一种在空间上集合相邻的卷积神经网络特征映射点的方法。通过简单地在模型中添加一些模糊层,我们从经验上表明,空间平滑提高了BNN在整个合奏大小范围内的准确性,不确定性估计和鲁棒性。特别是,结合空间平滑的BNN仅与少数合奏实现高预测性能。此外,该方法还可以应用于规范确定性神经网络以改善性能。许多证据表明,改进可以归因于稳定的特征图和损失景观的平滑。此外,我们通过将其作为特殊的空间平滑案例来称呼它们,为先前作品提供基本解释 - 即全球平均汇集,预活化和relu6。这些不仅提高了准确性,而且通过使损失景观与空间平滑相同的方式使损失景观更加顺畅,从而提高了不确定性估计和鲁棒性。该代码可从https://github.com/xxxnell/spatial-smoothing获得。
translated by 谷歌翻译
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
translated by 谷歌翻译
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.
translated by 谷歌翻译
从计算机视觉的频率的角度来看,以前的无监督域适应方法无法处理跨域问题。可以将不同域的图像或特征地图分解为低频组件和高频组件。本文提出了这样一个假设,即低频信息是更域的不变性,而高频信息包含与域相关的信息。因此,我们引入了一种名为低频模块(LFM)的方法,以提取域不变特征表示。 LFM由数字高斯低通滤波器构建。我们的方法易于实施,并且不引入额外的超参数。我们设计了两种有效的方法来利用LFM进行域的适应性,我们的方法与其他现有方法互补,并作为可以与这些方法结合使用的插件单元。实验结果表明,我们的LFM优于各种计算机视觉任务的最先进方法,包括图像分类和对象检测。
translated by 谷歌翻译
Modern convolutional networks are not shiftinvariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and averagepooling, ignore the sampling theorem. The wellknown signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
translated by 谷歌翻译
translated by 谷歌翻译
我们提出蒙版频率建模(MFM),这是一种基于统一的基于频域的方法,用于自我监督的视觉模型预训练。在本文中,我们将视角转移到了频域中,而不是将蒙版令牌随机插入到空间域中的输入嵌入。具体而言,MFM首先掩盖了输入图像的一部分频率分量,然后预测频谱上的缺失频率。我们的关键见解是,由于沉重的空间冗余,预测频域中的屏蔽组件更理想地揭示了基础图像模式,而不是预测空间域中的掩盖斑块。我们的发现表明,通过对蒙版和预测策略的正确配置,高频组件中的结构信息和低频对应物中的低级统计信息都有用。 MFM首次证明,对于VIT和CNN,即使没有使用以下内容,简单的非叙事框架也可以学习有意义的表示形式:(i)额外的数据,(ii)额外的模型,(iii)蒙版令牌。与最近的蒙版图像建模方法相比,对成像网和几个鲁棒性基准的实验结果表明,MFM的竞争性能和高级鲁棒性。此外,我们还全面研究了从统一的频率角度来表示经典图像恢复任务对表示学习的有效性,并揭示了他们与MFM方法的有趣关系。项目页面:https://www.mmlab-ntu.com/project/mfm/index.html。
translated by 谷歌翻译
Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layerdeep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks.
translated by 谷歌翻译
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections-one between each layer and its subsequent layer-our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.
translated by 谷歌翻译
In this paper, we address the problem of blind deblurring with high efficiency. We propose a set of lightweight deep-wiener-network to finish the task with real-time speed. The Network contains a deep neural network for estimating parameters of wiener networks and a wiener network for deblurring. Experimental evaluations show that our approaches have an edge on State of the Art in terms of inference times and numbers of parameters. Two of our models can reach a speed of 100 images per second, which is qualified for real-time deblurring. Further research may focus on some real-world applications of deblurring with our models.
translated by 谷歌翻译
标准卷积神经网络(CNN)设计很少专注于明确捕获各种功能以增强网络性能的重要性。相反,大多数现有方法遵循增加或调整网络深度和宽度的间接方法,这在许多情况下显着提高了计算成本。受生物视觉系统的启发,我们提出了一种多样化和自适应的卷积网络(DA $ ^ {2} $ - net),它使任何前锋CNN能够明确地捕获不同的功能,并自适应地选择并强调最具信息性的功能有效地提高网络的性能。 DA $ ^ {2} $ - NET会引入可忽略不计的计算开销,它旨在与任何CNN架构轻松集成。我们广泛地评估了基准数据集的DA $ ^ {2} $ - 网上,包括CNN架构的CNN100,SVHN和Imagenet,包括CNN100。实验结果显示DA $ ^ {2} $ - NET提供了具有非常最小的计算开销的显着性能改进。
translated by 谷歌翻译
translated by 谷歌翻译
In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person reidentification. Code is available at: https://github. com/zhunzhong07/Random-Erasing.
translated by 谷歌翻译
由于存储器和计算资源有限,部署在移动设备上的卷积神经网络(CNNS)是困难的。我们的目标是通过利用特征图中的冗余来设计包括CPU和GPU的异构设备的高效神经网络,这很少在神经结构设计中进行了研究。对于类似CPU的设备,我们提出了一种新颖的CPU高效的Ghost(C-Ghost)模块,以生成从廉价操作的更多特征映射。基于一组内在的特征映射,我们使用廉价的成本应用一系列线性变换,以生成许多幽灵特征图,可以完全揭示内在特征的信息。所提出的C-Ghost模块可以作为即插即用组件,以升级现有的卷积神经网络。 C-Ghost瓶颈旨在堆叠C-Ghost模块,然后可以轻松建立轻量级的C-Ghostnet。我们进一步考虑GPU设备的有效网络。在建筑阶段的情况下,不涉及太多的GPU效率(例如,深度明智的卷积),我们建议利用阶段明智的特征冗余来制定GPU高效的幽灵(G-GHOST)阶段结构。舞台中的特征被分成两个部分,其中使用具有较少输出通道的原始块处理第一部分,用于生成内在特征,另一个通过利用阶段明智的冗余来生成廉价的操作。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet分别可以分别实现CPU和GPU的准确性和延迟的最佳权衡。代码可在https://github.com/huawei-noah/cv-backbones获得。
translated by 谷歌翻译
Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets . While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
translated by 谷歌翻译
translated by 谷歌翻译