在各种方法中,旨在使神经网络的学习程序更有效,科学界会根据其估计的复杂性来开发策略,以从较大的网络中蒸发蒸馏知识,或利用对抗机器学习背后的原则。最近提出了一个不同的想法,命名为友好培训,这包括通过增加自动估计的扰动来改变输入数据,其目标是促进神经分类器的学习过程。只要训练收益,转变就会逐渐消失,直到它完全消失。在这项工作中,我们重新审视并扩展了这个想法,引入了通过神经发电机在对抗机器学习的背景下的完全不同和新的方法的启发。我们提出了一种辅助多层网络,该网络负责改变输入数据,使得在训练过程的当前阶段可以更容易地处理分类器。辅助网络与神经分类器共同培训,因此本质上增加了分类器的“深度”,并且预计将在数据改变过程中发现一般规律。辅助网络的效果逐渐减少到训练结束时,当它完全下降时,分类器部署用于应用程序。我们将这种方法称为神经友好培训。涉及多个数据集和不同神经架构的扩展实验程序表明,神经友好培训克服了最初提出的友好培训技术,提高了分类器的泛化,特别是在嘈杂的数据的情况下。
translated by 谷歌翻译
对基于机器学习的分类器以及防御机制的对抗攻击已在单一标签分类问题的背景下广泛研究。在本文中,我们将注意力转移到多标签分类,其中关于所考虑的类别中的关系的域知识可以提供自然的方法来发现不连贯的预测,即与培训数据之外的对抗的例子相关的预测分配。我们在框架中探讨这种直觉,其中一阶逻辑知识被转换为约束并注入半监督的学习问题。在此设置中,约束分类器学会满足边际分布的域知识,并且可以自然地拒绝具有不连贯预测的样本。尽管我们的方法在训练期间没有利用任何对攻击的知识,但我们的实验分析令人惊讶地推出了域名知识约束可以有效地帮助检测对抗性示例,特别是如果攻击者未知这样的约束。
translated by 谷歌翻译
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
translated by 谷歌翻译
良好的培训数据是开发有用的ML应用程序的先决条件。但是,在许多域中,现有数据集不能由于隐私法规(例如,从医学研究)而被共享。这项工作调查了一种简单而非规范的方法,可以匿名数据综合来使第三方能够受益于此类私人数据。我们探讨了从不切实际,任务相关的刺激中隐含地学习的可行性,这通过激发训练有素的深神经网络(DNN)的神经元来合成。因此,神经元励磁用作伪生成模型。刺激数据用于培训新的分类模型。此外,我们将此框架扩展以抑制与特定个人相关的表示。我们使用开放和大型闭合临床研究的睡眠监测数据,并评估(1)最终用户是否可以创建和成功使用定制分类模型进行睡眠呼吸暂停检测,并且(2)研究中参与者的身份受到保护。广泛的比较实证研究表明,在刺激上培训的不同算法能够在与原始模型相同的任务上成功概括。然而,新和原始模型之间的架构和算法相似性在性能方面发挥着重要作用。对于类似的架构,性能接近使用真实数据(例如,精度差为0.56 \%,Kappa系数差为0.03-0.04)。进一步的实验表明,刺激可以在很大程度上成功地匿名匿名研究临床研究的参与者。
translated by 谷歌翻译
培训深度神经网络是一项非常苛刻的任务,尤其是具有挑战性的是如何适应体系结构以提高训练有素的模型的性能。我们可以发现,有时,浅网络比深网概括得更好,并且增加更多层会导致更高的培训和测试错误。深层残留学习框架通过将跳过连接添加到几个神经网络层来解决此降解问题。最初,需要这种跳过连接才能成功地训练深层网络,因为网络的表达性会随着深度的指数增长而成功。在本文中,我们首先通过神经网络分析信息流。我们介绍和评估批处理循环,该批处理通过神经网络的每一层量化信息流。我们从经验和理论上证明,基于梯度下降的训练方法需要正面批处理融合,以成功地优化给定的损失功能。基于这些见解,我们引入了批处理凝聚正则化,以使基于梯度下降的训练算法能够单独通过每个隐藏层来优化信息流。借助批处理正则化,梯度下降优化器可以将不可吸引的网络转换为可训练的网络。我们从经验上表明,因此我们可以训练“香草”完全连接的网络和卷积神经网络 - 没有跳过连接,批处理标准化,辍学或任何其他建筑调整 - 只需将批处理 - 凝集正则术语添加到500层中损失功能。批处理 - 注入正则化的效果不仅在香草神经网络上评估,还评估了在各种计算机视觉以及自然语言处理任务上的剩余网络,自动编码器以及变压器模型上。
translated by 谷歌翻译
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
translated by 谷歌翻译
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10 30 . We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.
translated by 谷歌翻译
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.
translated by 谷歌翻译
这是一门专门针对STEM学生开发的介绍性机器学习课程。我们的目标是为有兴趣的读者提供基础知识,以在自己的项目中使用机器学习,并将自己熟悉术语作为进一步阅读相关文献的基础。在这些讲义中,我们讨论受监督,无监督和强化学习。注释从没有神经网络的机器学习方法的说明开始,例如原理分析,T-SNE,聚类以及线性回归和线性分类器。我们继续介绍基本和先进的神经网络结构,例如密集的进料和常规神经网络,经常性的神经网络,受限的玻尔兹曼机器,(变性)自动编码器,生成的对抗性网络。讨论了潜在空间表示的解释性问题,并使用梦和对抗性攻击的例子。最后一部分致力于加强学习,我们在其中介绍了价值功能和政策学习的基本概念。
translated by 谷歌翻译
在人类和其他动物中分类的众所周知的感知后果称为分类感知,是由类别内部压缩和类别分离之间的特别特征:两个项目,在输入空间内,如果它们属于与属于不同类别的类别相同。在这里阐述认知科学的实验和理论结果,我们在这里研究人工神经网络中的分类效果。我们结合了利用互联网信息量的理论分析,以及关于增加复杂性的网络的一系列数值模拟。这些形式和数值分析提供了深层层内神经表示的几何形状的见解,随着类别边界附近的空间膨胀,远离类别边界。我们通过使用两个互补方法调查分类表示:通过不同类别的刺激之间的变形连续进行动态物理学和认知神经科学的一种模仿实验,而另一个介绍网络中的每层的分类指数,量化的分类指数量化了神经人口水平的类别。我们展示了类别学习的浅层和深度神经网络,自动诱导分类感知。我们进一步表明层更深,分类效果越强。作为我们研究的结果,我们提出了辍学正规化技术不同启发式实践的效果的相干观点。更一般地,我们的观点在神经科学文献中发现回声,坚持根据所学习的神经表示的几何形状的任何给定层中的噪声对噪声的差异影响,即该几何形状如何反映类别的结构。
translated by 谷歌翻译
来自X射线图像的近端股骨骨折的足够分类对于治疗选择和患者的临床结果至关重要。我们依赖于常用的AO系统,该系统描述了将图像分类为类型和亚型的分层知识树根据裂缝的位置和复杂性。在本文中,我们提出了一种基于卷积神经网络(CNN)自动分类近端股骨骨折的近端骨折分类为3和7 AO类。如已知所知,CNNS需要具有可靠标签的大型和代表性数据集,这很难收集手头的应用。在本文中,我们设计了一个课程学习(CL)方法,在这种情况下通过基本的CNNS性能提高。我们的小说配方团结了三个课程策略:单独加权培训样本,重新排序培训集,以及数据采样子集。这些策略的核心是评分函数排名训练样本。我们定义了两种小说评分函数:一个来自域的特定于域的先前知识和原始的自我节奏的不确定性分数。我们对近端股骨射线照片的临床数据集进行实验。课程改善了近端股骨骨折分类,达到了经验丰富的创伤外科医生的性能。最佳课程方法根据现有知识重新排列培训集,从而达到15%的分类提高。使用公开可用的MNIST DataSet,我们进一步讨论并展示了我们统一的CL配方对三个受控和具有挑战性的数字识别方案的好处:具有有限的数据,在类别 - 不平衡下以及在标签噪声存在下。我们的工作代码可在:https://github.com/ameliajimenez/curriculum-learning-prior -unctainty。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages.We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.
translated by 谷歌翻译
无线电星系的连续排放通常可以分为不同的形态学类,如FRI,Frii,弯曲或紧凑。在本文中,我们根据使用深度学习方法使用小规模数据集的深度学习方法来探讨基于形态的无线电星系分类的任务($ \ SIM 2000 $ Samples)。我们基于双网络应用了几次射击学习技术,并使用预先培训的DENSENET模型进行了先进技术的传输学习技术,如循环学习率和歧视性学习迅速训练模型。我们使用最佳表演模型实现了超过92 \%的分类准确性,其中最大的混乱来源是弯曲和周五型星系。我们的结果表明,专注于一个小但策划数据集随着使用最佳实践来训练神经网络可能会导致良好的结果。自动分类技术对于即将到来的下一代无线电望远镜的调查至关重要,这预计将在不久的将来检测数十万个新的无线电星系。
translated by 谷歌翻译
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-ofthe-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
translated by 谷歌翻译
深度学习使用由其重量进行参数化的神经网络。通常通过调谐重量来直接最小化给定损耗功能来训练神经网络。在本文中,我们建议将权重重新参数转化为网络中各个节点的触发强度的目标。给定一组目标,可以计算使得发射强度最佳地满足这些目标的权重。有人认为,通过我们称之为级联解压缩的过程,使用培训的目标解决爆炸梯度的问题,并使损失功能表面更加光滑,因此导致更容易,培训更快,以及潜在的概括,神经网络。它还允许更容易地学习更深层次和经常性的网络结构。目标对重量的必要转换有额外的计算费用,这是在许多情况下可管理的。在目标空间中学习可以与现有的神经网络优化器相结合,以额外收益。实验结果表明了使用目标空间的速度,以及改进的泛化的示例,用于全连接的网络和卷积网络,以及调用和处理长时间序列的能力,并使用经常性网络进行自然语言处理。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that-when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
translated by 谷歌翻译
We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
translated by 谷歌翻译
We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fully-connected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected subset of weights within the network to zero. Each unit thus receives input from a random subset of units in the previous layer. We derive a bound on the generalization performance of both Dropout and DropConnect. We then evaluate DropConnect on a range of datasets, comparing to Dropout, and show state-of-the-art results on several image recognition benchmarks by aggregating multiple DropConnect-trained models.
translated by 谷歌翻译