如果神经网络更大,即使所产生的模型过度参数化,神经网络也倾向于通过训练获得更好的准确性。然而,在训练之前,期间或之后,仔细删除此类多余参数也可能会产生具有相似精度甚至提高的模型。在许多情况下,可以通过启发式方法奇怪地实现,就像去除具有最小绝对价值的权重一样 - 即使幅度并不是重量相关性的完美代理。以这样的前提是,从修剪中获得更好的性能取决于删除多个权重的综合效果的考虑,我们重新审视了基于影响的基于撞击的经典方法之一:最佳脑外科医生(obs)。我们提出了一种可拖动的启发式方法,用于求解OBS的组合扩展,其中我们选择了同时删除的权重,以及剩余权重的系统更新。我们的选择方法在高稀疏性下的其他方法优于其他方法,即使与其他方法结合使用,重量更新也是有利的。
translated by 谷歌翻译
我们考虑在具有挑战性的训练后环境中,深度神经网络(DNN)的模型压缩问题,在该设置中,我们将获得精确的训练模型,并且必须仅基于少量校准输入数据而无需任何重新培训即可压缩它。鉴于新兴软件和硬件支持通过加速修剪和/或量化压缩的模型,并且已经针对两种压缩方法独立提出了良好的表现解决方案,因此该问题已变得流行。在本文中,我们引入了一个新的压缩框架,该框架涵盖了统一环境中的重量修剪和量化,时间和空间效率高,并且在现有的后训练方法的实际性能上大大改善。在技​​术层面上,我们的方法基于[Lecun,Denker和Solla,1990年]在现代DNN的规模上的经典最佳脑外科医生(OBS)框架的第一个精确实现,我们进一步扩展到覆盖范围。重量量化。这是通过一系列可能具有独立利益的算法开发来实现的。从实际的角度来看,我们的实验结果表明,它可以在现有后训练方法的压缩 - 准确性权衡方面显着改善,并且甚至可以在训练后进行修剪和量化的准确共同应用。
translated by 谷歌翻译
有效地近似损失函数的局部曲率信息是用于深神经网络的优化和压缩的关键工具。然而,大多数现有方法近似二阶信息具有高计算或存储成本,这可以限制其实用性。在这项工作中,我们调查矩阵,用于估计逆象征的矢量产品(IHVPS)的矩阵线性时间方法,因为当Hessian可以近似为乘语 - 一个矩阵的总和时,如Hessian的经典近似由经验丰富的Fisher矩阵。我们提出了两个新的算法作为称为M-FAC的框架的一部分:第一个算法朝着网络压缩量身定制,如果Hessian给出了M $等级的总和,则可以计算Dimension $ D $的IHVP。 ,使用$ O(DM ^ 2)$预压制,$ O(DM)$代价计算IHVP,并查询逆Hessian的任何单个元素的费用$ O(m)$。第二算法针对优化设置,我们希望在反向Hessian之间计算产品,估计在优化步骤的滑动窗口和给定梯度方向上,根据预先说明的SGD所需的梯度方向。我们为计算IHVP和OHVP和O(DM + M ^ 3)$ of $ o(dm + m ^ 2)$提供算法,以便从滑动窗口添加或删除任何渐变。这两种算法产生最先进的结果,用于网络修剪和相对于现有二阶方法的计算开销的优化。在[9]和[17]可用实现。
translated by 谷歌翻译
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. When executing SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, we can reach 60% sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches.
translated by 谷歌翻译
最近对深神经网络(DNN)效率的重点已导致了模型压缩方法的重要工作,其中重量修剪是最受欢迎的方法之一。同时,有快速增长的计算支持,以有效地执行通过修剪获得的非结构化模型。但是,大多数现有的修剪方法最小化仅剩余权重的数量,即模型的大小,而不是针对推理时间进行优化。我们通过引入SPDY来解决这一差距,SPDY是一种新的压缩方法,该方法会自动确定层次的稀疏性目标,可以在给定系统上实现所需的推理速度,同时最大程度地减少准确性损失。 SPDY由两种新技术组成:第一个是一种有效的动态编程算法,用于求解一组给定的层敏感性得分,以解决加速约束的层压缩问题;第二个是一个局部搜索程序,用于确定准确的层敏感性得分。跨流行视觉和语言模型的实验表明,SPDY可以保证相对于现有策略的恢复较高的准确性,无论是一次性和逐步修剪方案,并且与大多数现有的修剪方法兼容。我们还将方法扩展到了最近实施的修剪任务,几乎没有数据,在该数据中,我们在修剪GPU支持的2:4稀疏模式时实现了最著名的准确性恢复。
translated by 谷歌翻译
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with finetuning by backpropagation-a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10× theoretical reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the largescale ImageNet dataset to emphasize the flexibility of our approach.
translated by 谷歌翻译
我们为深神经网络提出了一种新的全球压缩框架,它自动分析每个层以识别最佳的每个层压缩比,同时实现所需的整体压缩。我们的算法通过将其通道切入多个组并通过低秩分解来分解每个组来铰接压缩每个卷积(或完全连接)层的想法。在我们的算法的核心处于从Eckart Young MiRSKY定理中推导了层面错误界限的推导。然后,我们利用这些界限将压缩问题框架作为优化问题,我们希望最小化层次的最大压缩误差并提出朝向解决方案的有效算法。我们的实验表明,我们的方法优于各种网络和数据集的现有低级压缩方法。我们认为,我们的结果为未来的全球性能大小的研究开辟了新的途径,即现代神经网络的全球性能大小。我们的代码可在https://github.com/lucaslie/torchprune获得。
translated by 谷歌翻译
修剪神经网络可降低推理时间和记忆成本。在标准硬件上,如果修剪诸如特征地图之类的粗粒结构(例如特征地图),这些好处将特别突出。我们为二阶结构修剪(SOSP)设计了两种新型的基于显着性的方法,其中包括所有结构和层之间的相关性。我们的主要方法SOSP-H采用了创新的二阶近似,可以通过快速的Hessian-vector产品进行显着评估。 SOSP-H因此,尽管考虑到了完整的Hessian,但仍像一阶方法一样缩放。我们通过将SOSP-H与使用公认的Hessian近似值以及许多最先进方法进行比较来验证SOSP-H。尽管SOSP-H在准确性方面的表现或更好,但在可伸缩性和效率方面具有明显的优势。这使我们能够将SOSP-H扩展到大规模视觉任务,即使它捕获了网络所有层的相关性。为了强调我们修剪方法的全球性质,我们不仅通过删除预验证网络的结构,而且还通过检测建筑瓶颈来评估它们的性能。我们表明,我们的算法允许系统地揭示建筑瓶颈,然后将其删除以进一步提高网络的准确性。
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
修剪深度神经网络的现有方法专注于去除训练有素的网络的不必要参数,然后微调模型,找到恢复训练模型的初始性能的良好解决方案。与其他作品不同,我们的方法特别注意通过修剪神经元的压缩模型和推理计算时间的解决方案的质量。通过探索Hessian的光谱半径,所提出的算法通过探索Hessian的光谱半径来指示压缩模型的参数,这导致了更好地推广了未经看涨的数据。此外,该方法不适用于预先训练的网络,并同时执行训练和修剪。我们的结果表明,它改善了神经元压缩的最先进的结果。该方法能够在不同神经网络模型上实现具有小精度下降的非常小的网络。
translated by 谷歌翻译
深度神经网络(DNN)在解决许多真实问题方面都有效。较大的DNN模型通常表现出更好的质量(例如,精度,精度),但它们的过度计算会导致长期推理时间。模型稀疏可以降低计算和内存成本,同时保持模型质量。大多数现有的稀疏算法是单向移除的重量,而其他人则随机或贪婪地探索每层进行修剪的小权重子集。这些算法的局限性降低了可实现的稀疏性水平。此外,许多算法仍然需要预先训练的密集模型,因此遭受大的内存占地面积。在本文中,我们提出了一种新颖的预定生长和修剪(间隙)方法,而无需预先培训密集模型。它通过反复生长一个层次的层来解决以前的作品的缺点,然后在一些训练后修剪回到稀疏。实验表明,使用所提出的方法修剪模型匹配或击败高度优化的密集模型的质量,在各种任务中以80%的稀疏度,例如图像分类,客观检测,3D对象分段和翻译。它们还优于模型稀疏的其他最先进的(SOTA)方法。作为一个例子,通过间隙获得的90%不均匀的稀疏resnet-50模型在想象中实现了77.9%的前1个精度,提高了先前的SOTA结果1.5%。所有代码将公开发布。
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译
Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. By adding a soft mask re-parameterization of the weights and channel pruning from the perspective of removing input channels, we allow gradient updates to previously pruned channels and the opportunity for the channels to later return to the network. We then formulate input channel pruning as a global resource allocation problem. Our method outperforms prior works on both the ImageNet classification and PASCAL VOC detection datasets.
translated by 谷歌翻译
修剪是稀疏深神经网络的任务,最近受到了越来越多的关注。尽管最先进的修剪方法提取了高度稀疏的模型,但它们忽略了两个主要挑战:(1)寻找这些稀疏模型的过程通常非常昂贵; (2)非结构化的修剪在GPU记忆,训练时间或碳排放方面没有提供好处。我们提出了通过梯度流量保存(早期CROP)提出的早期压缩,该压缩在训练挑战(1)的培训(1)中有效提取最先进的稀疏模型,并且可以以结构化的方式应用来应对挑战(2)。这使我们能够在商品GPU上训练稀疏的网络,该商品GPU的密集版本太大,从而节省了成本并减少了硬件要求。我们从经验上表明,早期杂交的表现优于许多任务(包括分类,回归)和域(包括计算机视觉,自然语言处理和增强学习)的丰富基线。早期杂交导致准确性与密集训练相当,同时超过修剪基线。
translated by 谷歌翻译
训练有素的神经网络的性能至关重要。加上深度学习模型的不断增长的规模,这种观察激发了对学习稀疏模型的广泛研究。在这项工作中,我们专注于控制稀疏学习时的稀疏水平的任务。基于稀疏性惩罚的现有方法涉及对罚款因素的昂贵反复试验调整,因此缺乏直接控制所得模型的稀疏性。作为响应,我们采用了一个约束的公式:使用Louizos等人提出的栅极机制。 (2018年),我们制定了一个受约束的优化问题,其中稀疏以训练目标和所需的稀疏目标以端到端的方式指导。使用WIDERESNET和RESNET {18,50}模型进行了CIFAR-10/100,Tinyimagenet和ImageNet的实验验证了我们的提案的有效性,并证明我们可以可靠地实现预定的稀疏目标,而不会损害预测性能。
translated by 谷歌翻译
Neural network pruning-the task of reducing the size of a network by removing parameters-has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
深度学习在广泛的AI应用方面取得了有希望的结果。较大的数据集和模型一致地产生更好的性能。但是,我们一般花费更长的培训时间,以更多的计算和沟通。在本调查中,我们的目标是在模型精度和模型效率方面提供关于大规模深度学习优化的清晰草图。我们调查最常用于优化的算法,详细阐述了大批量培训中出现的泛化差距的可辩论主题,并审查了解决通信开销并减少内存足迹的SOTA策略。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译