Deep neural networks have long training and processing times. Early exits added to neural networks allow the network to make early predictions using intermediate activations in the network in time-sensitive applications. However, early exits increase the training time of the neural networks. We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. QuickNets are trained in a layer-wise manner such that each successive layer is only trained on samples that could not be correctly classified by the previous layers. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation. Additionally, we introduce commitment layers that significantly improve the early exits by identifying for over-confident predictions and demonstrate its success.
translated by 谷歌翻译
Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications.To address this issue, we present BranchyNet, a novel deep network architecture that is augmented with additional side branch classifiers. The architecture allows prediction results for a large portion of test samples to exit the network early via these branches when samples can already be inferred with high confidence. BranchyNet exploits the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points. For more difficult samples, which are expected less frequently, BranchyNet will use further or all network layers to provide the best likelihood of correct prediction. We study the BranchyNet architecture using several well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) and show that it can both improve accuracy and significantly reduce the inference time of the network.
translated by 谷歌翻译
具有早期退出机制的最先进的神经网络通常需要大量的培训和微调,以通过低计算成本来实现良好的性能。我们提出了一种新颖的早期出口技术,基于样本的类手段,提前出口课程(E $^2 $ cm)。与大多数现有方案不同,E $^2 $ cm不需要基于梯度的内部分类器培训,并且不会通过任何方式修改基本网络。这使其对于低功率设备的神经网络培训特别有用,如无线边缘网络。我们评估了E $^2 $ cm的性能和间接费用,例如MobileNetV3,EdgisterNet,Resnet和数据集,例如CIFAR-100,Imagenet和KMNIST。我们的结果表明,鉴于固定的培训时间预算,与现有的早期退出机制相比,E $^2 $ cm的准确性更高。此外,如果培训时间预算没有限制,则可以将E $^2 $ cm与现有的早期退出计划相结合,以提高后者的性能,从而在计算成本和网络准确性之间取得更好的权衡。我们还表明,E $^2 $ cm可用于降低无监督学习任务中的计算成本。
translated by 谷歌翻译
减少大深度学习模型的处理时间的问题是许多现实世界应用中的根本挑战。早期退出方法通过将附加内部分类器(IC)附加到神经网络的中间层来努力实现这一目标。 IC可以快速返回简单示例的预测,结果,降低整个模型的平均推理时间。但是,如果特定IC不决定早期回答,则其预测被丢弃,其计算有效地浪费。为了解决这个问题,我们引入零时间浪费(ZTW),这是一种新的方法,其中每个IC重用由其前辈返回的预测(1)在IC和(2)之间以相对于类似的方式组合先前输出之间的直接连接。我们对各个数据集和架构进行了广泛的实验,以证明ZTW实现了比最近提出的早期退出方法的其他更好的比例与推理时间权衡。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.
translated by 谷歌翻译
多EXIT体系结构由骨干和分支分类器组成,这些分类器提供缩短的推理途径,以减少深神经网络的运行时间。在本文中,我们分析了不同分支模式在分支分类器的计算复杂性分配方面有所不同。恒定复杂性分支使所有分支保持相同,同时复杂性增强和复杂性降低分支位置分别在骨架后期或更早的骨架上更复杂的分支。通过对多个骨干和数据集进行广泛的实验,我们发现复杂性削弱分支比恒定复杂性或复杂性增长分支更有效,这实现了最佳的准确性成本折衷。我们通过使用知识一致性来研究原因,以探测将分支添加到主链上的效果。我们的发现表明,复杂性降低的分支对骨干的特征抽象层次结构产生最小的破坏,这解释了分支模式的有效性。
translated by 谷歌翻译
由于最近在ML和IoT中的突破,部署机器学习(ML)在MilliWatt-Scale-Scale-Scale-Scale Edge设备(Tinyml)上正在越来越受欢迎。但是,Tinyml的功能受到严格的功率和计算约束的限制。 Tinyml中的大多数当代研究都集中在模型压缩技术上,例如模型修剪和量化,以适合低端设备上的ML模型。然而,由于积极的压缩迅速缩小了模型能力和准确性,因此通过现有技术获得的能源消耗和推理时间的改善是有限的。在保留其模型容量的同时,改善推理时间和/或降低功率的另一种方法是通过早期筛选网络。这些网络将中间分类器沿基线神经网络放置,如果中间分类器对其预测表现出足够的信心,则可以促进神经网络计算的早期退出。早期效果网络的先前工作集中在大型网络上,超出了通常用于Tinyml应用程序的功能。在本文中,我们讨论了将早期外观添加到最先进的小型CNN中的挑战,并设计了一种早期筛选架构T-RECX,以解决这些挑战。此外,我们开发了一种方法来减轻在最终退出中通过利用早期外观学到的高级代表性来减轻网络过度思考的影响。我们从MLPERF微小的基准套件中评估了三个CNN的T-RECX,用于图像分类,关键字发现和视觉唤醒单词检测任务。我们的结果表明,T-RECX提高了基线网络的准确性,并显着减少了微小CNN的平均推理时间。 T-RECX达到了32.58%的平均拖鞋降低,以换取所有评估模型的1%精度。此外,我们的技术提高了我们评估的三个模型中的两个基线网络的准确性
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32× memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is the same as the full-precision AlexNet. We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16% in top-1 accuracy. Our code is available at: http://allenai.org/plato/xnornet.
translated by 谷歌翻译
近年来,最先进神经网络的参数的数量急剧增加。这种对大规模神经网络感兴趣的激增具有促使新的分布式培训策略的发展,从而实现了这种模型。一种这样的策略是模型平行分布式培训。不幸的是,模型 - 并行性遭受资源利用率差,导致资源浪费。在这项工作中,我们改进了最近的理想化模型 - 并行优化设置:本地学习。由于资源利用率差,我们在当地和全球学习之间介绍了一类中介战略,称为联锁反向化。这些策略保留了本地优化的许多计算效率优势,同时恢复全球优化实现的大部分任务性能。我们评估了我们对图像分类的策略和变压器语言模型,发现我们的策略一致地在任务绩效方面出现本地学习,并在培训效率方面进行全球学习。
translated by 谷歌翻译
While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient. We exploit this observation by learning to skip convolutional layers on a per-input basis. We introduce SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer. We formulate the dynamic skipping problem in the context of sequential decision making and propose a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions. We show SkipNet reduces computation by 30 90% while preserving the accuracy of the original model on four benchmark datasets and outperforms the state-of-the-art dynamic networks and static compression methods. We also qualitatively evaluate the gating policy to reveal a relationship between image scale and saliency and the number of layers skipped.
translated by 谷歌翻译
将深度学习模型部署在具有有限计算资源的时间关键性应用程序中,例如在边缘计算系统和IoT网络中,是一项具有挑战性的任务,通常依赖于动态推理方法(例如早期退出)。在本文中,我们介绍了一种基于视觉变压器体系结构的新型架构,用于早期退出,以及一种微调策略,该策略与传统方法相比,在引入较少的开销的同时,显着提高了早期出口分支的准确性。通过有关图像和音频分类以及视听人群计数的广泛实验,我们表明我们的方法在分类和回归问题以及单模式设置中都适用于分类和回归问题。此外,我们引入了一种新颖的方法,用于在视听数据分析的早期出口中整合音频和视觉方式,这可能导致更细粒度的动态推断。
translated by 谷歌翻译
我们日常生活中的深度学习是普遍存在的,包括自驾车,虚拟助理,社交网络服务,医疗服务,面部识别等,但是深度神经网络在训练和推理期间需要大量计算资源。该机器学习界主要集中在模型级优化(如深度学习模型的架构压缩),而系统社区则专注于实施级别优化。在其间,在算术界中提出了各种算术级优化技术。本文在模型,算术和实施级技术方面提供了关于资源有效的深度学习技术的调查,并确定了三种不同级别技术的资源有效的深度学习技术的研究差距。我们的调查基于我们的资源效率度量定义,阐明了较低级别技术的影响,并探讨了资源有效的深度学习研究的未来趋势。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
使用卷积神经网络(CNN)已经显着改善了几种图像处理任务,例如图像分类和对象检测。与Reset和Abseralnet一样,许多架构在创建时至少在一个数据集中实现了出色的结果。培训的一个关键因素涉及网络的正规化,这可以防止结构过度装备。这项工作分析了在过去几年中开发的几种正规化方法,显示了不同CNN模型的显着改进。该作品分为三个主要区域:第一个称为“数据增强”,其中所有技术都侧重于执行输入数据的更改。第二个,命名为“内部更改”,旨在描述修改神经网络或内核生成的特征映射的过程。最后一个称为“标签”,涉及转换给定输入的标签。这项工作提出了与关于正则化的其他可用调查相比的两个主要差异:(i)第一个涉及在稿件中收集的论文并非超过五年,并第二个区别是关于可重复性,即所有作品此处推荐在公共存储库中可用的代码,或者它们已直接在某些框架中实现,例如Tensorflow或Torch。
translated by 谷歌翻译
最近对反向传播的近似(BP)减轻了BP的许多计算效率低下和与生物学的不兼容性,但仍然存在重要的局限性。此外,近似值显着降低了基准的准确性,这表明完全不同的方法可能更富有成果。在这里,基于在软冠军全网络中Hebbian学习的最新理论基础上,我们介绍了多层softhebb,即一种训练深神经网络的算法,没有任何反馈,目标或错误信号。结果,它通过避免重量传输,非本地可塑性,层更新的时间锁定,迭代平衡以及(自我)监督或其他反馈信号来实现效率,这在其他方法中是必不可少的。与最先进的生物学知识学习相比,它提高的效率和生物兼容性不能取得准确性的折衷,而是改善了准确性。 MNIST,CIFAR-10,STL-10和IMAGENET上最多五个隐藏层和添加的线性分类器,分别达到99.4%,80.3%,76.2%和27.3%。总之,SOFTHEBB显示出与BP的截然不同的方法,即对几层的深度学习在大脑中可能是合理的,并提高了生物学上的机器学习的准确性。
translated by 谷歌翻译
诸如智能手机和自治车辆的移动设备越来越依赖深神经网络(DNN)来执行复杂的推理任务,例如图像分类和语音识别等。但是,在移动设备上连续执行整个DNN可以快速消耗其电池。虽然任务卸载到云/边缘服务器可能会降低移动设备的计算负担,但信道质量,网络和边缘服务器负载中的不稳定模式可能导致任务执行的显着延迟。最近,已经提出了基于分割计算(SC)的方法,其中DNN被分成在移动设备上和边缘服务器上执行的头部和尾模型。最终,这可能会降低带宽使用以及能量消耗。另一种叫做早期退出(EE)的方法,列车模型在架构中呈现多个“退出”,每个都提供越来越高的目标准确性。因此,可以根据当前条件或应用需求进行准确性和延迟之间的权衡。在本文中,我们通过呈现最相关方法的比较,对SC和EE策略进行全面的综合调查。我们通过提供一系列引人注目的研究挑战来结束论文。
translated by 谷歌翻译
深神经网络(DNN)已成为许多应用程序域(包括基于Web的服务)的重要组成部分。这些服务需要高吞吐量和(接近)实时功能,例如,对用户的请求做出反应或反应,或者按时处理传入数据流。但是,DNN设计的趋势是朝着具有许多层和参数的较大模型,以实现更准确的结果。尽管这些模型通常是预先训练的,但是在如此大的模型中,计算复杂性仍然相对显着,从而阻碍了低推断潜伏期。实施缓存机制是用于加速服务响应时间的典型系统工程解决方案。但是,传统的缓存通常不适合基于DNN的服务。在本文中,我们提出了一种端到端自动化解决方案,以根据其计算复杂性和推理延迟来提高基于DNN的服务的性能。我们的缓存方法采用了DNN模型和早期出口的自我介绍的思想。提出的解决方案是一种自动化的在线层缓存机制,如果提前出口之一中的高速缓存模型足够有信心,则可以在推理时间提早退出大型模型。本文的主要贡献之一是,我们将该想法实施为在线缓存,这意味着缓存模型不需要访问培训数据,并且仅根据运行时的传入数据执行,使其适用于应用程序使用预训练的模型。我们的实验在两个下游任务(面部和对象分类)上结果表明,平均而言,缓存可以将这些服务的计算复杂性降低到58 \%(就FLOPS计数而言),并将其推断潜伏期提高到46 \%精度低至零至零。
translated by 谷歌翻译
深度神经网络在一系列任务上的性能显着提高,对计算资源的需求不断增长,从而使低资源设备(内存和电池电量有限)的部署不可行。与实价模型相比,二元神经网络(BNNS)在极端的压缩和加速增长方面解决了问题。我们提出了一种简单但有效的方法,通过通过早期验证策略统一BNN来加速推理。我们的方法允许简单实例根据决策阈值尽早退出,并利用添加到不同中间层的输出层以避免执行整个二进制模型。我们对三个音频分类任务以及四个BNNS架构进行了广泛评估我们的方法。我们的方法证明了有利的质量效率权衡,同时可以通过系统用户指定的基于熵的阈值来控制。它还基于现有的BNN体系结构而无需进行不同效率水平的单个模型,从而获得更好的加速(延迟小于6ms)。它还提供了一种直接的方法来估计样本难度和对数据集中某些类别周围不确定性的更好理解。
translated by 谷歌翻译