尽管无偏见的机器学习模型对于许多应用程序至关重要,但偏见是一个人为定义的概念,可以在任务中有所不同。只有输入标签对,算法可能缺乏足够的信息来区分稳定(因果)特征和不稳定(虚假)特征。但是,相关任务通常具有类似的偏见 - 我们可以利用在转移环境中开发稳定的分类器的观察结果。在这项工作中,我们明确通知目标分类器有关源任务中不稳定功能的信息。具体而言,我们得出一个表示,该表示通过对比源任务中的不同数据环境来编码不稳定的功能。我们通过根据此表示形式将目标任务的数据聚类来实现鲁棒性,并最大程度地降低这些集群中最坏情况的风险。我们对文本和图像分类进行评估。经验结果表明,我们的算法能够在合成生成的环境和现实环境的目标任务上保持鲁棒性。我们的代码可在https://github.com/yujiabao/tofu上找到。
translated by 谷歌翻译
在偏置数据集中培训时,分类器会偏差。作为一种补救措施,我们建议学习分裂(LS),这是一种用于自动偏置检测的算法。给定一个具有输入标签对的数据集,LS学会了将该数据集分开,以便在训练分训练上训练的预测因素不能推广到测试分配。该性能差距表明,数据集中的测试拆分代表性不足,这是潜在偏差的信号。识别不可替代的分裂是具有挑战性的,因为我们对偏见没有注释。在这项工作中,我们表明,测试拆分中每个示例的预测正确性可以用作弱监督的来源:如果我们移动正确预测的示例,将概括性能下降错误预测。 LS是任务不合时宜的,可以应用于任何监督的学习问题,从自然语言理解和图像分类到分子财产预测。经验结果表明,LS能够产生与人类识别偏见相关的惊人挑战分裂。此外,我们证明,将强大的学习算法(例如群DRO)与LS启用自动偏差确定的拆分相结合。与以前的最先进相比,当训练和验证过程中偏见的来源未知时,我们显着提高了最差的组绩效(平均为23.4%)。
translated by 谷歌翻译
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domaininvariant. An important assumption in this area is that the training examples are partitioned into "domains" or "environments". Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds and CivilComments datasets. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
translated by 谷歌翻译
Models trained via empirical risk minimization (ERM) are known to rely on spurious correlations between labels and task-independent input features, resulting in poor generalization to distributional shifts. Group distributionally robust optimization (G-DRO) can alleviate this problem by minimizing the worst-case loss over a set of pre-defined groups over training data. G-DRO successfully improves performance of the worst-group, where the correlation does not hold. However, G-DRO assumes that the spurious correlations and associated worst groups are known in advance, making it challenging to apply it to new tasks with potentially multiple unknown spurious correlations. We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples which maximizes worst-case loss over the discovered groups. On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches used with G-DRO. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO -- datasets where potential spurious correlations are as yet uncharacterized. Human evaluation of ARGO groups shows that they contain well-defined, yet previously unstudied spurious correlations that lead to model errors.
translated by 谷歌翻译
Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point, whereas approaches that do not use such group annotations typically achieve unsatisfactory worst-group accuracy. In this paper, we propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified. Intuitively, this upweights examples from groups on which standard ERM models perform poorly, leading to improved worst-group performance. Averaged over four image classification and natural language processing tasks with spurious correlations, JTT closes 75% of the gap in worst-group accuracy between standard ERM and group DRO, while only requiring group annotations on a small validation set in order to tune hyperparameters.
translated by 谷歌翻译
尽管机器学习模型迅速推进了各种现实世界任务的最先进,但鉴于这些模型对虚假相关性的脆弱性,跨域(OOD)的概括仍然是一个挑战性的问题。尽管当前的域概括方法通常着重于通过新的损耗函数设计在不同域上实施某些不变性属性,但我们提出了一种平衡的迷你批次采样策略,以减少观察到的训练分布中域特异性的虚假相关性。更具体地说,我们提出了一种两步方法,该方法1)识别虚假相关性的来源,以及2)通过在确定的来源上匹配,构建平衡的迷你批次而没有虚假相关性。我们提供了伪造来源的可识别性保证,并表明我们提出的方法是从所有培训环境中平衡,无虚拟分布的样本。实验是在三个具有伪造相关性的计算机视觉数据集上进行的,从经验上证明,与随机的迷你批次采样策略相比,我们平衡的微型批次采样策略可改善四个不同建立的域泛化模型基线的性能。
translated by 谷歌翻译
虽然神经网络在平均病例的性能方面对分类任务的成功显着,但它们通常无法在某些数据组上表现良好。这样的组信息可能是昂贵的;因此,即使在培训数据不可用的组标签不可用,较稳健性和公平的最新作品也提出了改善最差组性能的方法。然而,这些方法通常在培训时间使用集团信息的表现不佳。在这项工作中,我们假设没有组标签的较大数据集一起访问少量组标签。我们提出了一个简单的两步框架,利用这个部分组信息来提高最差组性能:训练模型以预测训练数据的丢失组标签,然后在强大的优化目标中使用这些预测的组标签。从理论上讲,我们在最差的组性能方面为我们的方法提供泛化界限,展示了泛化误差如何相对于培训点总数和具有组标签的培训点的数量。凭经验,我们的方法优于不使用群组信息的基线表达,即使只有1-33%的积分都有组标签。我们提供消融研究,以支持我们框架的稳健性和可扩展性。
translated by 谷歌翻译
当环境标签未知时,我们研究不变学习的问题。当贝叶斯最佳条件标签分布在不同环境中相同时,我们将重点放在不变的表示概念上。先前的工作通过最大化不变风险最小化(IRM)框架的罚款来进行环境推理(EI)。 EI步骤使用的参考模型侧重于虚假相关性,以有效地达到良好的环境分区。但是,尚不清楚如何找到这样的参考模型。在这项工作中,我们建议重复EI过程,并在先前的EI步骤推断出的\ textit {多数}环境上重复ERM模型。在温和的假设下,我们发现这种迭代过程有助于学习比单一步骤更好地捕获虚假相关性的表示。这会导致更好的环境推理和更好的不变学习。我们表明,该方法在合成数据集和现实世界数据集上的表现优于基准。
translated by 谷歌翻译
We propose a Target Conditioned Representation Independence (TCRI) objective for domain generalization. TCRI addresses the limitations of existing domain generalization methods due to incomplete constraints. Specifically, TCRI implements regularizers motivated by conditional independence constraints that are sufficient to strictly learn complete sets of invariant mechanisms, which we show are necessary and sufficient for domain generalization. Empirically, we show that TCRI is effective on both synthetic and real-world data. TCRI is competitive with baselines in average accuracy while outperforming them in worst-domain accuracy, indicating desired cross-domain stability.
translated by 谷歌翻译
Overparameterized neural networks can be highly accurate on average on an i.i.d.test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization-a stronger-than-typical 2 penalty or early stopping-we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
translated by 谷歌翻译
我们通过对杂散相关性的因果解释提出了一种信息 - 理论偏置测量技术,这通过利用条件相互信息来识别特征级算法偏压有效。尽管已经提出了几种偏置测量方法并广泛地研究以在各种任务中实现诸如面部识别的各种任务中的算法公平,但它们的准确性或基于Logit的度量易于导致普通预测得分调整而不是基本偏差减少。因此,我们设计针对算法偏差的新型扩张框架,其包括由所提出的信息 - 理论偏置测量方法导出的偏压正则化损耗。此外,我们介绍了一种基于随机标签噪声的简单而有效的无监督的脱叠技术,这不需要明确的偏置信息监督。通过多种标准基准测试的广泛实验,在不同的现实情景中验证了所提出的偏差测量和脱叠方法。
translated by 谷歌翻译
Empirical studies suggest that machine learning models trained with empirical risk minimization (ERM) often rely on attributes that may be spuriously correlated with the class labels. Such models typically lead to poor performance during inference for data lacking such correlations. In this work, we explicitly consider a situation where potential spurious correlations are present in the majority of training data. In contrast with existing approaches, which use the ERM model outputs to detect the samples without spurious correlations, and either heuristically upweighting or upsampling those samples; we propose the logit correction (LC) loss, a simple yet effective improvement on the softmax cross-entropy loss, to correct the sample logit. We demonstrate that minimizing the LC loss is equivalent to maximizing the group-balanced accuracy, so the proposed LC could mitigate the negative impacts of spurious correlations. Our extensive experimental results further reveal that the proposed LC loss outperforms the SoTA solutions on multiple popular benchmarks by a large margin, an average 5.5% absolute improvement, without access to spurious attribute labels. LC is also competitive with oracle methods that make use of the attribute labels. Code is available at https://github.com/shengliu66/LC.
translated by 谷歌翻译
机器学习算法通常假设培训和测试示例是从相同的分布中汲取的。然而,分发转移是现实世界应用中的常见问题,并且可以在测试时间造成模型急剧执行。在本文中,我们特别考虑域移位和亚泊素班次的问题(例如,不平衡数据)。虽然先前的作品通常会寻求明确地将模型的内部表示和预测器进行明确,以成为域不变的,但我们旨在规范整个功能而不限制模型的内部表示。这导致了一种简单的基于混合技术,它通过名为LISA的选择性增强来学习不变函数。 Lisa选择性地用相同的标签而单独地插值样本,但不同的域或具有相同的域但不同的标签。我们分析了线性设置,从理论上展示了LISA如何导致较小的最差组错误。凭经验,我们研究了LISA对从亚本化转变到域移位的九个基准的有效性,我们发现LISA一直以其他最先进的方法表达。
translated by 谷歌翻译
通过推断培训数据中的潜在群体,最近的作品将不可用的注释不可用的情况引入不变性学习。通常,在大多数/少数族裔分裂下学习群体不变性在经验上被证明可以有效地改善许多数据集的分布泛化。但是,缺乏这些关于学习不变机制的理论保证。在本文中,我们揭示了在防止分类器依赖于培训集中的虚假相关性的情况下,现有小组不变学习方法的不足。具体来说,我们提出了两个关于判断这种充分性的标准。从理论和经验上讲,我们表明现有方法可以违反标准,因此未能推广出虚假的相关性转移。在此激励的情况下,我们设计了一种新的组不变学习方法,该方法构建具有统计独立性测试的组,并按组标签重新启动样本,以满足标准。关于合成数据和真实数据的实验表明,新方法在推广到虚假相关性转移方面显着优于现有的组不变学习方法。
translated by 谷歌翻译
The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions-datasets, architectures, and model selection criteria-render fair and realistic comparisons difficult. In this paper, we are interested in understanding how useful domain generalization algorithms are in realistic settings. As a first step, we realize that model selection is non-trivial for domain generalization tasks. Contrary to prior work, we argue that domain generalization algorithms without a model selection strategy should be regarded as incomplete. Next, we implement DOMAINBED, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria. We conduct extensive experiments using DO-MAINBED and find that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets. Looking forward, we hope that the release of DOMAINBED, along with contributions from fellow researchers, will streamline reproducible and rigorous research in domain generalization. * Alphabetical order, equal contribution.Preprint. Under review.
translated by 谷歌翻译
分布式概括(OOD)都是关于对环境变化的学习不变性。如果每个类中的上下文分布均匀分布,则OOD将是微不足道的,因为由于基本原则,可以轻松地删除上下文:类是上下文不变的。但是,收集这种平衡的数据集是不切实际的。学习不平衡的数据使模型偏见对上下文,从而伤害了OOD。因此,OOD的关键是上下文平衡。我们认为,在先前工作中广泛采用的假设,可以直接从偏见的类预测中注释或估算上下文偏差,从而使上下文不完整甚至不正确。相比之下,我们指出了上述原则的另一面:上下文对于类也不变,这激励我们将类(已经被标记为已标记的)视为不同环境以解决上下文偏见(没有上下文标签)。我们通过最大程度地减少阶级样本相似性的对比损失,同时确保这种相似性在所有类别中不变,从而实现这一想法。在具有各种上下文偏见和域间隙的基准测试中,我们表明,配备了我们上下文估计的简单基于重新加权的分类器实现了最新的性能。我们在https://github.com/simpleshinobu/irmcon上提供了附录中的理论理由和代码。
translated by 谷歌翻译
许多数据集被指定:给定任务存在多个同样可行的解决方案。对于学习单个假设的方法,指定的指定可能是有问题的,因为实现低训练损失的不同功能可以集中在不同的预测特征上,从而在分布数据的数据上产生明显变化的预测。我们提出了Divdis,这是一个简单的两阶段框架,首先通过利用测试分布中的未标记数据来学习多种假设,以实现任务。然后,我们通过使用其他标签的形式或检查功能可视化的形式选择最小的其他监督来选择一个发现的假设之一来消除歧义。我们证明了Divdis找到在图像分类中使用强大特征的假设和自然语言处理问题的能力。
translated by 谷歌翻译
Machine learning models rely on various assumptions to attain high accuracy. One of the preliminary assumptions of these models is the independent and identical distribution, which suggests that the train and test data are sampled from the same distribution. However, this assumption seldom holds in the real world due to distribution shifts. As a result models that rely on this assumption exhibit poor generalization capabilities. Over the recent years, dedicated efforts have been made to improve the generalization capabilities of these models collectively known as -- \textit{domain generalization methods}. The primary idea behind these methods is to identify stable features or mechanisms that remain invariant across the different distributions. Many generalization approaches employ causal theories to describe invariance since causality and invariance are inextricably intertwined. However, current surveys deal with the causality-aware domain generalization methods on a very high-level. Furthermore, we argue that it is possible to categorize the methods based on how causality is leveraged in that method and in which part of the model pipeline is it used. To this end, we categorize the causal domain generalization methods into three categories, namely, (i) Invariance via Causal Data Augmentation methods which are applied during the data pre-processing stage, (ii) Invariance via Causal representation learning methods that are utilized during the representation learning stage, and (iii) Invariance via Transferring Causal mechanisms methods that are applied during the classification stage of the pipeline. Furthermore, this survey includes in-depth insights into benchmark datasets and code repositories for domain generalization methods. We conclude the survey with insights and discussions on future directions.
translated by 谷歌翻译
学习不变表示是在数据集中虚假相关驱动的机器学习模型时的重要要求。这些杂散相关性,在输入样本和目标标签之间,错误地指导了神经网络预测,导致某些组的性能差,尤其是少数群体。针对这些虚假相关性的强大培训需要每个样本的组成员资格。这种要求在少数群体或稀有群体的数据标签努力的情况下是显着费力的,或者包括数据集的个人选择隐藏敏感信息的情况。另一方面,存在这种数据收集的存在力度导致包含部分标记的组信息的数据集。最近的作品解决了完全无监督的场景,没有用于组的标签。因此,我们的目标是通过解决更现实的设置来填补文献中的缺失差距,这可以在培训期间利用部分可用的敏感或群体信息。首先,我们构造一个约束集并导出组分配所属的高概率绑定到该集合。其次,我们提出了一种从约束集中优化了优化最严格的组分配的算法。通过对图像和表格数据集的实验,我们显示少数集团的性能的改进,同时在跨组中保持整体汇总精度。
translated by 谷歌翻译
通常对机器学习分类器进行培训,以最大程度地减少数据集的平均误差。不幸的是,在实践中,这个过程通常会利用训练数据中亚组不平衡引起的虚假相关性,从而导致高平均性能,但跨亚组的性能高度可变。解决此问题的最新工作提出了使用骆驼进行模型修补。这种先前的方法使用生成的对抗网络来执行类内的群间数据增强,需要(a)训练许多计算昂贵的模型以及(b)给定域模型的合成输出的足够质量。在这项工作中,我们提出了RealPatch,这是一个基于统计匹配的简单,更快,更快的数据增强的框架。我们的框架通过使用真实样本增强数据集来执行模型修补程序,从而减轻了为目标任务训练生成模型的需求。我们证明了RealPatch在三个基准数据集,Celeba,Waterbird和IwildCam的一部分中的有效性,显示了最差的亚组性能和二进制分类中亚组性能差距的改进。此外,我们使用IMSITU数据集进行了211个类的实验,在这种设置中,基于生成模型的修补(例如骆驼)是不切实际的。我们表明,RealPatch可以成功消除数据集泄漏,同时减少模型泄漏并保持高实用程序。可以在https://github.com/wearepal/realpatch上找到RealPatch的代码。
translated by 谷歌翻译