In this paper, we introduced the novel concept of advisor network to address the problem of noisy labels in image classification. Deep neural networks (DNN) are prone to performance reduction and overfitting problems on training data with noisy annotations. Weighting loss methods aim to mitigate the influence of noisy labels during the training, completely removing their contribution. This discarding process prevents DNNs from learning wrong associations between images and their correct labels but reduces the amount of data used, especially when most of the samples have noisy labels. Differently, our method weighs the feature extracted directly from the classifier without altering the loss value of each data. The advisor helps to focus only on some part of the information present in mislabeled examples, allowing the classifier to leverage that data as well. We trained it with a meta-learning strategy so that it can adapt throughout the training of the main model. We tested our method on CIFAR10 and CIFAR100 with synthetic noise, and on Clothing1M which contains real-world noise, reporting state-of-the-art results.
translated by 谷歌翻译
Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods. Source code is available at https://github.com/xjtushujun/meta-weight-net. * Corresponding author. 1 We call the training data biased when they are generated from a joint sample-label distribution deviating from the distribution of evaluation/test set [1].
translated by 谷歌翻译
标签噪声显着降低了应用中深度模型的泛化能力。有效的策略和方法,\ Texit {例如}重新加权或损失校正,旨在在训练神经网络时缓解标签噪声的负面影响。这些现有的工作通常依赖于预指定的架构并手动调整附加的超参数。在本文中,我们提出了翘曲的概率推断(WARPI),以便在元学习情景中自适应地整理分类网络的培训程序。与确定性模型相比,WARPI通过学习摊销元网络来制定为分层概率模型,这可以解决样本模糊性,因此对严格的标签噪声更加坚固。与直接生成损耗的重量值的现有近似加权功能不同,我们的元网络被学习以估计从登录和标签的输入来估计整流向量,这具有利用躺在它们中的足够信息的能力。这提供了纠正分类网络的学习过程的有效方法,证明了泛化能力的显着提高。此外,可以将整流载体建模为潜在变量并学习元网络,可以无缝地集成到分类网络的SGD优化中。我们在嘈杂的标签上评估了四个强大学习基准的Warpi,并在变体噪声类型下实现了新的最先进的。广泛的研究和分析还展示了我们模型的有效性。
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
元学习是一种处理不平衡和嘈杂标签学习的有效方法,但它取决于验证集,其中包含随机选择,手动标记和平衡的分布式样品。该验证集的随机选择和手动标记和平衡不仅是元学习的最佳选择,而且随着类的数量,它的缩放范围也很差。因此,最近的元学习论文提出了临时启发式方法来自动构建和标记此验证集,但是这些启发式方法仍然是元学习的最佳选择。在本文中,我们分析了元学习算法,并提出了新的标准来表征验证集的实用性,基于:1)验证集的信息性; 2)集合的班级分配余额; 3)集合标签的正确性。此外,我们提出了一种新的不平衡的嘈杂标签元学习(INOLML)算法,该算法会自动构建通过上面的标准最大化其实用程序来构建验证。我们的方法比以前的元学习方法显示出显着改进,并在几个基准上设定了新的最新技术。
translated by 谷歌翻译
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
translated by 谷歌翻译
Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called Men-torNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on We-bVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet.
translated by 谷歌翻译
深神经网络(DNN)的记忆效果在许多最先进的标签噪声学习方法中起着枢轴作用。为了利用这一财产,通常采用早期停止训练早期优化的伎俩。目前的方法通常通过考虑整个DNN来决定早期停止点。然而,DNN可以被认为是一系列层的组成,并且发现DNN中的后一个层对标签噪声更敏感,而其前同行是非常稳健的。因此,选择整个网络的停止点可以使不同的DNN层对抗彼此影响,从而降低最终性能。在本文中,我们建议将DNN分离为不同的部位,逐步培训它们以解决这个问题。而不是早期停止,它一次列举一个整体DNN,我们最初通过用相对大量的时期优化DNN来训练前DNN层。在培训期间,我们通过使用较少数量的时期使用较少的地层来逐步培训后者DNN层,以抵消嘈杂标签的影响。我们将所提出的方法术语作为渐进式早期停止(PES)。尽管其简单性,与早期停止相比,PES可以帮助获得更有前景和稳定的结果。此外,通过将PE与现有的嘈杂标签培训相结合,我们在图像分类基准上实现了最先进的性能。
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily-labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies wherein, essentially, a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose an adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
translated by 谷歌翻译
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix.
translated by 谷歌翻译
带有嘈杂标签的训练深神经网络(DNN)实际上是具有挑战性的,因为不准确的标签严重降低了DNN的概括能力。以前的努力倾向于通过识别带有粗糙的小损失标准来减轻嘈杂标签的干扰的嘈杂数据来处理统一的denoising流中的零件或完整数据,而忽略了嘈杂样本的困难是不同的,因此是刚性和统一的。数据选择管道无法很好地解决此问题。在本文中,我们首先提出了一种称为CREMA的粗到精细的稳健学习方法,以分裂和串扰的方式处理嘈杂的数据。在粗糙水平中,干净和嘈杂的集合首先从统计意义上就可信度分开。由于实际上不可能正确对所有嘈杂样本进行分类,因此我们通过对每个样本的可信度进行建模来进一步处理它们。具体而言,对于清洁集,我们故意设计了一种基于内存的调制方案,以动态调整每个样本在训练过程中的历史可信度顺序方面的贡献,从而减轻了错误地分组为清洁集中的嘈杂样本的效果。同时,对于分类为嘈杂集的样品,提出了选择性标签更新策略,以纠正嘈杂的标签,同时减轻校正错误的问题。广泛的实验是基于不同方式的基准,包括图像分类(CIFAR,Clothing1M等)和文本识别(IMDB),具有合成或自然语义噪声,表明CREMA的优势和普遍性。
translated by 谷歌翻译
部分标签学习(PLL)是一个典型的弱监督学习框架,每个培训实例都与候选标签集相关联,其中只有一个标签是有效的。为了解决PLL问题,通常方法试图通过使用先验知识(例如培训数据的结构信息)或以自训练方式提炼模型输出来对候选人集进行歧义。不幸的是,由于在模型训练的早期阶段缺乏先前的信息或不可靠的预测,这些方法通常无法获得有利的性能。在本文中,我们提出了一个新的针对部分标签学习的框架,该框架具有元客观指导性的歧义(MOGD),该框架旨在通过在小验证集中求解元目标来从设置的候选标签中恢复地面真相标签。具体而言,为了减轻假阳性标签的负面影响,我们根据验证集的元损失重新权重。然后,分类器通过最大程度地减少加权交叉熵损失来训练。通过使用普通SGD优化器的各种深网络可以轻松实现所提出的方法。从理论上讲,我们证明了元目标的收敛属性,并得出了所提出方法的估计误差界限。在各种基准数据集和实际PLL数据集上进行的广泛实验表明,与最先进的方法相比,所提出的方法可以实现合理的性能。
translated by 谷歌翻译
自数据注释(尤其是对于大型数据集)以来,使用嘈杂的标签学习引起了很大的研究兴趣,这可能不可避免地不可避免。最近的方法通过将培训样本分为清洁和嘈杂的集合来求助于半监督的学习问题。然而,这种范式在重标签噪声下容易出现重大变性,因为干净样品的数量太小,无法进行常规方法。在本文中,我们介绍了一个新颖的框架,称为LC-Booster,以在极端噪音下明确处理学习。 LC-Booster的核心思想是将标签校正纳入样品选择中,以便可以通过可靠的标签校正来培训更纯化的样品,从而减轻确认偏差。实验表明,LC-Booster在几个嘈杂标签的基准测试中提高了最先进的结果,包括CIFAR-10,CIFAR-100,CLASTINGING 1M和WEBVISION。值得注意的是,在极端的90 \%噪声比下,LC-Booster在CIFAR-10和CIFAR-100上获得了92.9 \%和48.4 \%的精度,超过了最终方法,较大的边距就超过了最终方法。
translated by 谷歌翻译
Despite being robust to small amounts of label noise, convolutional neural networks trained with stochastic gradient methods have been shown to easily fit random labels. When there are a mixture of correct and mislabelled targets, networks tend to fit the former before the latter. This suggests using a suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled. Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss). We further adapt mixup augmentation to drive our approach a step further. Experiments on CIFAR-10/100 and TinyImageNet demonstrate a robustness to label noise that substantially outperforms recent state-of-the-art. Source code is available at https://git.io/fjsvE.
translated by 谷歌翻译
对标签噪声的学习是一个至关重要的话题,可以保证深度神经网络的可靠表现。最近的研究通常是指具有模型输出概率和损失值的动态噪声建模,然后分离清洁和嘈杂的样本。这些方法取得了显着的成功。但是,与樱桃挑选的数据不同,现有方法在面对不平衡数据集时通常无法表现良好,这是现实世界中常见的情况。我们彻底研究了这一现象,并指出了两个主要问题,这些问题阻碍了性能,即\ emph {类间损耗分布差异}和\ emph {由于不确定性而引起的误导性预测}。第一个问题是现有方法通常执行类不足的噪声建模。然而,损失分布显示在类失衡下的类别之间存在显着差异,并且类不足的噪声建模很容易与少数族裔类别中的嘈杂样本和样本混淆。第二个问题是指该模型可能会因认知不确定性和不确定性而导致的误导性预测,因此仅依靠输出概率的现有方法可能无法区分自信的样本。受我们的观察启发,我们提出了一个不确定性的标签校正框架〜(ULC)来处理不平衡数据集上的标签噪声。首先,我们执行认识不确定性的班级特异性噪声建模,以识别可信赖的干净样本并精炼/丢弃高度自信的真实/损坏的标签。然后,我们在随后的学习过程中介绍了不确定性,以防止标签噪声建模过程中的噪声积累。我们对几个合成和现实世界数据集进行实验。结果证明了提出的方法的有效性,尤其是在数据集中。
translated by 谷歌翻译
The existence of label noise imposes significant challenges (e.g., poor generalization) on the training process of deep neural networks (DNN). As a remedy, this paper introduces a permutation layer learning approach termed PermLL to dynamically calibrate the training process of the DNN subject to instance-dependent and instance-independent label noise. The proposed method augments the architecture of a conventional DNN by an instance-dependent permutation layer. This layer is essentially a convex combination of permutation matrices that is dynamically calibrated for each sample. The primary objective of the permutation layer is to correct the loss of noisy samples mitigating the effect of label noise. We provide two variants of PermLL in this paper: one applies the permutation layer to the model's prediction, while the other applies it directly to the given noisy label. In addition, we provide a theoretical comparison between the two variants and show that previous methods can be seen as one of the variants. Finally, we validate PermLL experimentally and show that it achieves state-of-the-art performance on both real and synthetic datasets.
translated by 谷歌翻译
使用嘈杂的标签学习是一场实际上有挑战性的弱势监督。在现有文献中,开放式噪声总是被认为是有毒的泛化,类似于封闭式噪音。在本文中,我们经验证明,开放式嘈杂标签可能是无毒的,甚至有利于对固有的嘈杂标签的鲁棒性。灵感来自观察,我们提出了一种简单而有效的正则化,通过将具有动态噪声标签(ODNL)引入培训的开放式样本。使用ODNL,神经网络的额外容量可以在很大程度上以不干扰来自清洁数据的学习模式的方式消耗。通过SGD噪声的镜头,我们表明我们的方法引起的噪音是随机方向,无偏向,这可能有助于模型收敛到最小的最小值,具有卓越的稳定性,并强制执行模型以产生保守预测-of-分配实例。具有各种类型噪声标签的基准数据集的广泛实验结果表明,所提出的方法不仅提高了许多现有的强大算法的性能,而且即使在标签噪声设置中也能实现分配异点检测任务的显着改进。
translated by 谷歌翻译
Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true labels. Our framework can correct labels during training by alternating update of network parameters and labels. We conduct experiments on the noisy CIFAR-10 datasets and the Clothing1M dataset.The results indicate that our approach significantly outperforms other state-of-the-art methods.
translated by 谷歌翻译
不平衡的数据对基于深度学习的分类模型构成挑战。解决不平衡数据的最广泛使用的方法之一是重新加权,其中训练样本与损失功能的不同权重相关。大多数现有的重新加权方法都将示例权重视为可学习的参数,并优化了元集中的权重,因此需要昂贵的双重优化。在本文中,我们从分布的角度提出了一种基于最佳运输(OT)的新型重新加权方法。具体而言,我们将训练集视为其样品上的不平衡分布,该分布由OT运输到从元集中获得的平衡分布。训练样品的权重是分布不平衡的概率质量,并通过最大程度地减少两个分布之间的ot距离来学习。与现有方法相比,我们提出的一种方法可以脱离每次迭代时的体重学习对相关分类器的依赖性。图像,文本和点云数据集的实验表明,我们提出的重新加权方法具有出色的性能,在许多情况下实现了最新的结果,并提供了一种有希望的工具来解决不平衡的分类问题。
translated by 谷歌翻译