Despite impressive success in many tasks, deep learning models are shown to rely on spurious features, which will catastrophically fail when generalized to out-of-distribution (OOD) data. Invariant Risk Minimization (IRM) is proposed to alleviate this issue by extracting domain-invariant features for OOD generalization. Nevertheless, recent work shows that IRM is only effective for a certain type of distribution shift (e.g., correlation shift) while it fails for other cases (e.g., diversity shift). Meanwhile, another thread of method, Adversarial Training (AT), has shown better domain transfer performance, suggesting that it has the potential to be an effective candidate for extracting domain-invariant features. This paper investigates this possibility by exploring the similarity between the IRM and AT objectives. Inspired by this connection, we propose Domainwise Adversarial Training (DAT), an AT-inspired method for alleviating distribution shift by domain-specific perturbations. Extensive experiments show that our proposed DAT can effectively remove domain-varying features and improve OOD generalization under both correlation shift and diversity shift.
translated by 谷歌翻译
我们考虑了OOD概括的问题,其目标是训练在与训练分布不同的测试分布上表现良好的模型。已知深度学习模型在这种转变上是脆弱的,即使对于略有不同的测试分布,也可能遭受大量精度下降。我们提出了一种基于直觉的新方法 - 愚蠢的方法,即大量丰富特征的对抗性结合应提供鲁棒性。我们的方法仔细提炼了一位强大的老师的知识,该知识使用标准培训学习了几个判别特征,同时使用对抗性培训将其结合在一起。对标准的对抗训练程序进行了修改,以产生可以更好地指导学生的教师。我们评估DAFT在域床框架中的标准基准测试中,并证明DAFT比当前最新的OOD泛化方法取得了重大改进。 DAFT始终超过表现良好的ERM和蒸馏基线高达6%,对于较小的网络而言,其增长率更高。
translated by 谷歌翻译
研究兴趣大大增加了将数据驱动方法应用于力学问题的问题。尽管传统的机器学习(ML)方法已经实现了许多突破,但它们依赖于以下假设:培训(观察到的)数据和测试(看不见)数据是独立的且分布相同的(i.i.d)。因此,当应用于未知的测试环境和数据分布转移的现实世界力学问题时,传统的ML方法通常会崩溃。相反,分布(OOD)的概括假定测试数据可能会发生变化(即违反I.I.D.假设)。迄今为止,已经提出了多种方法来改善ML方法的OOD概括。但是,由于缺乏针对OOD回归问题的基准数据集,因此这些OOD方法在主导力学领域的回归问题上的效率仍然未知。为了解决这个问题,我们研究了机械回归问题的OOD泛化方法的性能。具体而言,我们确定了三个OOD问题:协变量移位,机制移位和采样偏差。对于每个问题,我们创建了两个基准示例,以扩展机械MNIST数据集收集,并研究了流行的OOD泛化方法在这些机械特定的回归问题上的性能。我们的数值实验表明,在大多数情况下,与传统的ML方法相比,在大多数情况下,在这些OOD问题上的传统ML方法的性能更好,但迫切需要开发更强大的OOD概括方法,这些方法在多个OOD场景中有效。总体而言,我们希望这项研究以及相关的开放访问基准数据集将进一步开发用于机械特定回归问题的OOD泛化方法。
translated by 谷歌翻译
学习域不变的表示已成为域适应/概括的最受欢迎的方法之一。在本文中,我们表明不变的表示可能不足以保证良好的概括,在考虑标签函数转移的情况下。受到这一点的启发,我们首先在经验风险上获得了新的概括上限,该概括风险明确考虑了标签函数移动。然后,我们提出了特定领域的风险最小化(DRM),该风险最小化(DRM)可以分别对不同域的分布移动进行建模,并为目标域选择最合适的域。对四个流行的域概括数据集(CMNIST,PACS,VLCS和域)进行了广泛的实验,证明了所提出的DRM对域泛化的有效性,具有以下优点:1)它的表现明显超过了竞争性盆地的表现; 2)与香草经验风险最小化(ERM)相比,所有训练领域都可以在所有训练领域中具有可比性或优越的精度; 3)在培训期间,它仍然非常简单和高效,4)与不变的学习方法是互补的。
translated by 谷歌翻译
为了在单一源领域的概括中取得成功,最大化合成域的多样性已成为最有效的策略之一。最近的许多成功都来自预先指定模型在培训期间暴露于多样性类型的方法,因此它最终可以很好地概括为新领域。但是,基于na \“基于多样性的增强也不能因为它们无法对大型域移动建模,或者因为预先指定的变换的跨度不能涵盖域概括中通常发生的转移类型。解决这个问题,我们提出了一个新颖的框架,该框架使用神经网络使用对抗学习的转换(ALT)来建模可欺骗分类器的合理但硬的图像转换。该网络是为每个批次的随机初始初始初始初始初始初始化的,并培训了固定数量的步骤。为了最大化分类错误。此外,我们在分类器对干净和转化的图像的预测之间实现一致性。通过广泛的经验分析,我们发现这种对抗性转换的新形式同时实现了多样性和硬度的目标,并超越了所有现有技术,以实现竞争性的所有技术单源域概括的基准。我们还显示了T HAT ALT可以自然地与现有的多样性模块合作,从而产生高度独特的源域,导致最先进的性能。
translated by 谷歌翻译
Out-of-distribution (OOD) generalization on graphs is drawing widespread attention. However, existing efforts mainly focus on the OOD issue of correlation shift. While another type, covariate shift, remains largely unexplored but is the focus of this work. From a data generation view, causal features are stable substructures in data, which play key roles in OOD generalization. While their complementary parts, environments, are unstable features that often lead to various distribution shifts. Correlation shift establishes spurious statistical correlations between environments and labels. In contrast, covariate shift means that there exist unseen environmental features in test data. Existing strategies of graph invariant learning and data augmentation suffer from limited environments or unstable causal features, which greatly limits their generalization ability on covariate shift. In view of that, we propose a novel graph augmentation strategy: Adversarial Causal Augmentation (AdvCA), to alleviate the covariate shift. Specifically, it adversarially augments the data to explore diverse distributions of the environments. Meanwhile, it keeps the causal features invariant across diverse environments. It maintains the environmental diversity while ensuring the invariance of the causal features, thereby effectively alleviating the covariate shift. Extensive experimental results with in-depth analyses demonstrate that AdvCA can outperform 14 baselines on synthetic and real-world datasets with various covariate shifts.
translated by 谷歌翻译
Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world. To tackle this problem, we assume that variation across training domains is representative of the variation we might encounter at test time, but also that shifts at test time may be more extreme in magnitude. In particular, we show that reducing differences in risk across training domains can reduce a model's sensitivity to a wide range of extreme distributional shifts, including the challenging setting where the input contains both causal and anticausal elements. We motivate this approach, Risk Extrapolation (REx), as a form of robust optimization over a perturbation set of extrapolated domains (MM-REx), and propose a penalty on the variance of training risks (V-REx) as a simpler variant. We prove that variants of REx can recover the causal mechanisms of the targets, while also providing some robustness to changes in the input distribution ("covariate shift"). By tradingoff robustness to causally induced distributional shifts and covariate shift, REx is able to outperform alternative methods such as Invariant Risk Minimization in situations where these types of shift co-occur.
translated by 谷歌翻译
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domaininvariant. An important assumption in this area is that the training examples are partitioned into "domains" or "environments". Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds and CivilComments datasets. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
translated by 谷歌翻译
尽管机器学习模型迅速推进了各种现实世界任务的最先进,但鉴于这些模型对虚假相关性的脆弱性,跨域(OOD)的概括仍然是一个挑战性的问题。尽管当前的域概括方法通常着重于通过新的损耗函数设计在不同域上实施某些不变性属性,但我们提出了一种平衡的迷你批次采样策略,以减少观察到的训练分布中域特异性的虚假相关性。更具体地说,我们提出了一种两步方法,该方法1)识别虚假相关性的来源,以及2)通过在确定的来源上匹配,构建平衡的迷你批次而没有虚假相关性。我们提供了伪造来源的可识别性保证,并表明我们提出的方法是从所有培训环境中平衡,无虚拟分布的样本。实验是在三个具有伪造相关性的计算机视觉数据集上进行的,从经验上证明,与随机的迷你批次采样策略相比,我们平衡的微型批次采样策略可改善四个不同建立的域泛化模型基线的性能。
translated by 谷歌翻译
最近的学习不变(因果)特征(OOD)概括最近引起了广泛的关注,在建议中不变风险最小化(IRM)(Arjovsky等,2019)是一个显着的解决方案。尽管其对线性回归的理论希望,但在线性分类问题中使用IRM的挑战仍然存在(Rosenfeld等,2020; Nagarajan等,2021)。沿着这一行,最近的一项研究(Arjovsky等人,2019年)迈出了第一步,并提出了基于信息瓶颈的不变风险最小化的学习原理(IB-imm)。在本文中,我们首先表明(Arjovsky等人,2019年)使用不变特征的支持重叠的关键假设对于保证OOD泛化是相当强大的,并且在没有这种假设的情况下仍然可以实现最佳解决方案。为了进一步回答IB-IRM是否足以在线性分类问题中学习不变特征的问题,我们表明IB-IRM在两种情况下仍将失败,无论是否不变功能捕获有关标签的所有信息。为了解决此类失败,我们提出了一个\ textit {基于反事实的信息瓶颈(CSIB)}学习算法,该算法可恢复不变的功能。即使从单个环境访问数据时,提出的算法也可以工作,并且在理论上对二进制和多类问题都具有一致的结果。我们对三个合成数据集进行了经验实验,以验证我们提出的方法的功效。
translated by 谷歌翻译
域泛化(DG)的主要挑战是克服多个训练域和看不见的测试域之间的潜在分布偏移。一类流行的DG算法旨在学习在训练域中具有不变因果关系的表示。但是,某些特征,称为\ emph {伪不变特征},可能是培训域中的不变性,但不是测试域,并且可以大大降低现有算法的性能。为了解决这个问题,我们提出了一种新颖的算法,称为不变信息瓶颈(IIB),该算法学习跨越训练和测试域的最小值的最小值。通过最大限度地减少表示和输入之间的相互信息,IIB可以减轻其对伪不变特征的依赖,这对于DG是期望的。为了验证IIB原则的有效性,我们对大型DG基准进行了广泛的实验。结果表明,在两个评估度量标准中,IIB的IIIb平均超过2.8 \%和3.8 \%的准确性。
translated by 谷歌翻译
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.
translated by 谷歌翻译
机器学习算法通常假设培训和测试示例是从相同的分布中汲取的。然而,分发转移是现实世界应用中的常见问题,并且可以在测试时间造成模型急剧执行。在本文中,我们特别考虑域移位和亚泊素班次的问题(例如,不平衡数据)。虽然先前的作品通常会寻求明确地将模型的内部表示和预测器进行明确,以成为域不变的,但我们旨在规范整个功能而不限制模型的内部表示。这导致了一种简单的基于混合技术,它通过名为LISA的选择性增强来学习不变函数。 Lisa选择性地用相同的标签而单独地插值样本,但不同的域或具有相同的域但不同的标签。我们分析了线性设置,从理论上展示了LISA如何导致较小的最差组错误。凭经验,我们研究了LISA对从亚本化转变到域移位的九个基准的有效性,我们发现LISA一直以其他最先进的方法表达。
translated by 谷歌翻译
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
translated by 谷歌翻译
Recent studies show that even highly biased dense networks contain an unbiased substructure that can achieve better out-of-distribution (OOD) generalization than the original model. Existing works usually search the invariant subnetwork using modular risk minimization (MRM) with out-domain data. Such a paradigm may bring about two potential weaknesses: 1) Unfairness, due to the insufficient observation of out-domain data during training; and 2) Sub-optimal OOD generalization, due to the feature-untargeted model pruning on the whole data distribution. In this paper, we propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures without referring to the above weaknesses. Specifically, SFP identifies in-distribution (ID) features during training using our theoretically verified task loss, upon which, SFP can perform ID targeted-model pruning that removes branches with strong dependencies on ID features. Notably, by attenuating the projections of spurious features into model space, SFP can push the model learning toward invariant features and pull that out of environmental features, devising optimal OOD generalization. Moreover, we also conduct detailed theoretical analysis to provide the rationality guarantee and a proof framework for OOD structures via model sparsity, and for the first time, reveal how a highly biased data distribution affects the model's OOD generalization. Extensive experiments on various OOD datasets show that SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
translated by 谷歌翻译
域泛化(DG)方法旨在开发概括到测试分布与训练数据不同的设置的模型。在本文中,我们专注于多源零拍DG的挑战性问题,其中来自多个源域的标记训练数据可用,但无法从目标域中访问数据。虽然这个问题已成为研究的重要话题,但令人惊讶的是,将所有源数据汇集在一起​​和培训单个分类器的简单解决方案在标准基准中具有竞争力。更重要的是,即使在不同域中明确地优化不变性的复杂方法也不一定提供对ERM的非微不足道的增益。在本文中,我们首次研究了预先指定的域标签和泛化性能之间的重要链接。使用动机案例研究和分布稳健优化算法的新变种,我们首先演示了如何推断的自定义域组可以通过数据集的原始域标签来实现一致的改进。随后,我们介绍了一种用于多域泛化,Muldens的一般方法,它使用基于ERM的深度合并骨干,并通过元优化算法执行隐式域重标。使用对多个标准基准测试的经验研究,我们表明Muldens不需要定制增强策略或特定于数据集的培训过程,始终如一地优于ERM,通过显着的边距,即使在比较时也会产生最先进的泛化性能对于利用域标签的现有方法。
translated by 谷歌翻译
当环境标签未知时,我们研究不变学习的问题。当贝叶斯最佳条件标签分布在不同环境中相同时,我们将重点放在不变的表示概念上。先前的工作通过最大化不变风险最小化(IRM)框架的罚款来进行环境推理(EI)。 EI步骤使用的参考模型侧重于虚假相关性,以有效地达到良好的环境分区。但是,尚不清楚如何找到这样的参考模型。在这项工作中,我们建议重复EI过程,并在先前的EI步骤推断出的\ textit {多数}环境上重复ERM模型。在温和的假设下,我们发现这种迭代过程有助于学习比单一步骤更好地捕获虚假相关性的表示。这会导致更好的环境推理和更好的不变学习。我们表明,该方法在合成数据集和现实世界数据集上的表现优于基准。
translated by 谷歌翻译
域的概括(DG)旨在学习分配变化的可推广模型,以避免重新拟合大规模训练数据。以前具有复杂损失设计和梯度约束的作品尚未在大规模基准上取得经验成功。在这项工作中,我们通过利用跨域跨域的预测特征的多个方面来揭示Experts(MOE)模型对DG的概括性的混合物。为此,我们提出了稀疏的融合混合物(SF-MOE),该混合物将稀疏性和融合机制纳入MOE框架中,以使模型保持稀疏和预测性。 SF-MOE有两个专用模块:1)稀疏块和2)融合块,它们分别分别分离和汇总对象的多样化信号。广泛的实验表明,SF-MOE是大规模基准的域名学习者。在5个大规模的DG数据集(例如域内)中,它的表现优于最佳同行,其计算成本相同甚至较低。我们从分布式表示的角度(例如,视觉属性)进一步揭示了SF-MOE的内部机制。我们希望这个框架可以促进未来的研究,将可普遍的对象识别推向现实世界。代码和模型在https://github.com/luodian/sf-moe-dg上发布。
translated by 谷歌翻译
尽管在各种应用中取得了显着成功,但众所周知,在呈现出分发数据时,深度学习可能会失败。为了解决这一挑战,我们考虑域泛化问题,其中使用从相关训练域系列绘制的数据进行训练,然后在不同和看不见的测试域中评估预测器。我们表明,在数据生成的自然模型和伴随的不变性条件下,域泛化问​​题等同于无限维约束的统计学习问题;此问题构成了我们的方法的基础,我们呼叫基于模型的域泛化。由于解决深度学习中受约束优化问题的固有挑战,我们利用非凸显二元性理论,在二元间隙上紧张的界限发展这种统计问题的不受约束放松。基于这种理论动机,我们提出了一种具有收敛保证的新型域泛化算法。在我们的实验中,我们在几个基准中报告了最多30个百分点的阶段概括基座,包括彩色,Camelyon17-Wilds,FMOW-Wilds和PAC。
translated by 谷歌翻译
最近,对分布(OOD)数据具有相关性转移的概括引起了极大的关注。相关转移是由与类标签相关的虚假属性引起的,因为它们之间的相关性可能在训练和测试数据中有所不同。对于这样一个问题,我们表明,鉴于类标签,有条件独立的虚假属性模型是可推广的。基于此,提出了控制OOD泛化误差的度量条件伪变异(CSV),以衡量这种条件独立性。为了改善OOD的概括,我们将培训过程正常使用拟议的CSV。在温和的假设下,我们的训练目标可以作为非Convex-Concave Mini-Max问题提出。提出了具有可证明的收敛速率的算法来解决该问题。广泛的经验结果验证了我们算法在改善OOD概括方面的功效。
translated by 谷歌翻译