深度学习的面部识别模型通过利用具有较高计算成本的完整精确浮点网络来遵循深神经网络的共同趋势。由于完整的模型所需的大量内存,将这些网络部署在受计算需求约束的用例中通常是不可行的。以前的紧凑型面部识别方法提议设计特殊的紧凑型建筑并使用真实的培训数据从头开始训练它们,由于隐私问题,在现实世界中可能无法使用。我们在这项工作中介绍了基于低位精度格式模型量化的定量解决方案。 Quantface降低了现有面部识别模型所需的计算成本,而无需设计特定的体系结构或访问真实的培训数据。 Quantface将隐私友好的合成面数据引入量化过程中,以减轻潜在的隐私问题和与真实培训数据有关的问题。通过对七个基准和四个网络体系结构进行的广泛评估实验,我们证明了Quantface可以成功地将模型大小降低到5倍,同时在很大程度上维护完整模型的验证性能而无需访问真实的培训数据集。
translated by 谷歌翻译
深度神经网络已迅速成为人脸识别(FR)的主流方法。但是,这限制了这些模型的部署,该模型包含了嵌入式和低端设备的极大量参数。在这项工作中,我们展示了一个非常轻巧和准确的FR解决方案,即小组装。我们利用神经结构搜索开发一个新的轻量级脸部架构。我们还提出了一种基于知识蒸馏(KD)的新型培训范式,该培训范式是多步KD,其中知识从教师模型蒸馏到学生模型的培训成熟日的不同阶段。我们进行了详细的消融研究,证明了使用NAS为FR的特定任务而不是一般对象分类的理智,以及我们提出的多步KD的益处。我们对九种不同基准的最先进(SOTA)紧凑型FR模型提供了广泛的实验评估和比较,包括IJB-B,IJB-C和Megaface等大规模评估基准。在考虑相同水平的模型紧凑性时,Pocketnets在九个主流基准上始终如一地推进了SOTA FR性能。使用0.92M参数,我们最小的网络PocketNets-128对最近的SOTA压缩型号实现了非常竞争力的结果,该模型包含多达4M参数。
translated by 谷歌翻译
文献中提出的最新深层识别模型利用了大规模的公共数据集(例如MS-CELEB-1M和VGGFACE2)来培训非常深的神经网络,从而在主流基准上实现了最先进的表现。最近,由于可靠的隐私和道德问题,许多这些数据集(例如MS-CELEB-1M和VGGFACE2)被撤回。这激发了这项工作提出和调查使用隐私友好型合成生成的面部数据集来训练面部识别模型的可行性。为此,我们利用类别条件生成的对抗网络来生成类标记的合成面部图像,即sface。为了解决使用此类数据训练面部识别模型的隐私方面,我们提供了有关合成数据集与用于训练生成模型的原始真实数据集之间的身份关系的广泛评估实验。我们报告的评估证明,将真实数据集与合成数据集中的同一类标签相关联是不可能的。我们还建议使用三种不同的学习策略,多级分类,无标签的知识转移以及多级分类和知识转移的联合学习,对我们的隐私友好数据集进行识别。报告的五个真实面部基准的评估结果表明,隐私友好的合成数据集具有很高的潜力,可用于训练面部识别模型,例如,使用多级分类和99.13在LFW上实现91.87 \%的验证精度。 \%使用联合学习策略。
translated by 谷歌翻译
To obtain lower inference latency and less memory footprint of deep neural networks, model quantization has been widely employed in deep model deployment, by converting the floating points to low-precision integers. However, previous methods (such as quantization aware training and post training quantization) require original data for the fine-tuning or calibration of quantized model, which makes them inapplicable to the cases that original data are not accessed due to privacy or security. This gives birth to the data-free quantization method with synthetic data generation. While current data-free quantization methods still suffer from severe performance degradation when quantizing a model into lower bit, caused by the low inter-class separability of semantic features. To this end, we propose a new and effective data-free quantization method termed ClusterQ, which utilizes the feature distribution alignment for synthetic data generation. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics to imitate the distribution of real data, so that the performance degradation is alleviated. Moreover, we incorporate the diversity enhancement to solve class-wise mode collapse. We also employ the exponential moving average to update the centroid of each cluster for further feature distribution improvement. Extensive experiments based on different deep models (e.g., ResNet-18 and MobileNet-V2) over the ImageNet dataset demonstrate that our proposed ClusterQ model obtains state-of-the-art performance.
translated by 谷歌翻译
无数据量化是一项将神经网络压缩到低位的任务,而无需访问原始培训数据。大多数现有的无数据量化方法导致由于不准确的激活剪辑范围和量化误差而导致严重的性能降解,尤其是对于低位宽度。在本文中,我们提出了一种简单而有效的无数据量化方法,具有准确的激活剪辑和自适应批准化。精确的激活剪辑(AAC)通过利用完全精确模型的准确激活信息来提高模型的准确性。自适应批准归一化首先建议通过自适应更新批处理层次来解决分布更改中的量化误差。广泛的实验表明,所提出的无数据量化方法可以产生令人惊讶的性能,在Imagenet数据集上达到RESNET18的64.33%的TOP-1准确性,绝对改进的3.7%优于现有的最新方法。
translated by 谷歌翻译
深神经网络(DNN)的庞大计算和记忆成本通常排除了它们在资源约束设备中的使用。将参数和操作量化为较低的位精确,为神经网络推断提供了可观的记忆和能量节省,从而促进了在边缘计算平台上使用DNN。量化DNN的最新努力采用了一系列技术,包括渐进式量化,步进尺寸的适应性和梯度缩放。本文提出了一种针对边缘计算的混合精度卷积神经网络(CNN)的新量化方法。我们的方法在模型准确性和内存足迹上建立了一个新的Pareto前沿,展示了一系列量化模型,可提供低于4.3 MB的权重(WGTS。)和激活(ACTS。)。我们的主要贡献是:(i)用张量学的学习精度,(ii)WGTS的靶向梯度修饰,(i)硬件感知的异质可区分量化。和行为。为了减轻量化错误,以及(iii)多相学习时间表,以解决从更新到学习的量化器和模型参数引起的学习不稳定性。我们证明了我们的技术在Imagenet数据集上的有效性,包括高效网络lite0(例如,WGTS。的4.14MB和ACTS。以67.66%的精度)和MobilenEtV2(例如3.51MB WGTS。 % 准确性)。
translated by 谷歌翻译
Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. With this representation, we can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism which allows us to obtain multiple quantized networks from one full precision source model by progressively mapping the higher precision weights to their adjacent lower precision counterparts. Then, with networks of different bit-widths from one source model, multi-objective optimization is employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. By doing this, the shared weights will be optimized to balance the performance of different quantized models, thus making the weights transferable among different bit widths. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width. Code will be available.
translated by 谷歌翻译
混合精确的深神经网络达到了硬件部署所需的能源效率和吞吐量,尤其是在资源有限的情况下,而无需牺牲准确性。但是,不容易找到保留精度的最佳每层钻头精度,尤其是在创建巨大搜索空间的大量模型,数据集和量化技术中。为了解决这一困难,最近出现了一系列文献,并且已经提出了一些实现有希望的准确性结果的框架。在本文中,我们首先总结了文献中通常使用的量化技术。然后,我们对混合精液框架进行了彻底的调查,该调查是根据其优化技术进行分类的,例如增强学习和量化技术,例如确定性舍入。此外,讨论了每个框架的优势和缺点,我们在其中呈现并列。我们最终为未来的混合精液框架提供了指南。
translated by 谷歌翻译
模型量化已成为加速深度学习推理的不可或缺的技术。虽然研究人员继续推动量化算法的前沿,但是现有量化工作通常是不可否认的和不可推销的。这是因为研究人员不选择一致的训练管道并忽略硬件部署的要求。在这项工作中,我们提出了模型量化基准(MQBench),首次尝试评估,分析和基准模型量化算法的再现性和部署性。我们为实际部署选择多个不同的平台,包括CPU,GPU,ASIC,DSP,并在统一培训管道下评估广泛的最新量化算法。 MQBENCK就像一个连接算法和硬件的桥梁。我们进行全面的分析,并找到相当大的直观或反向直观的见解。通过对齐训练设置,我们发现现有的算法在传统的学术轨道上具有大致相同的性能。虽然用于硬件可部署量化,但有一个巨大的精度差距,仍然不稳定。令人惊讶的是,没有现有的算法在MQBench中赢得每一项挑战,我们希望这项工作能够激发未来的研究方向。
translated by 谷歌翻译
面部图像的质量显着影响底层识别算法的性能。面部图像质量评估(FIQA)估计捕获的图像的效用在实现可靠和准确的识别性能方面。在这项工作中,我们提出了一种新的学习范式,可以在培训过程中学习内部网络观察。基于此,我们所提出的CR-FiQA使用该范例来通过预测其相对分类性来估计样品的面部图像质量。基于关于其类中心和最近的负类中心的角度空间中的训练样本特征表示来测量该分类性。我们通过实验说明了面部图像质量与样本相对分类性之间的相关性。由于此类属性仅为培训数据集可观察到,因此我们建议从培训数据集中学习此属性,并利用它来预测看不见样品的质量措施。该培训同时执行,同时通过用于面部识别模型训练的角度裕度罚款的软墨损失来优化类中心。通过对八个基准和四个面部识别模型的广泛评估实验,我们展示了我们提出的CR-FiQA在最先进(SOTA)FIQ算法上的优越性。
translated by 谷歌翻译
全球Covid-19大流行的出现会给生物识别技术带来新的挑战。不仅是非接触式生物识别选项变得更加重要,而且最近也遇到了频繁的面具的面对面识别。这些掩模会影响前面识别系统的性能,因为它们隐藏了重要的身份信息。在本文中,我们提出了一种掩模不变的面部识别解决方案(MaskInv),其利用训练范例内的模板级知识蒸馏,其旨在产生类似于相同身份的非掩盖面的掩模面的嵌入面。除了蒸馏知识外,学生网络还通过基于边缘的身份分类损失,弹性面,使用遮蔽和非蒙面面的额外指导。在两个真正蒙面面部数据库和具有合成面具的五个主流数据库的逐步消融研究中,我们证明了我们的maskinV方法的合理化。我们所提出的解决方案优于先前的最先进(SOTA)在最近的MFRC-21挑战中的学术解决方案,屏蔽和屏蔽VS非屏蔽,并且还优于MFR2数据集上的先前解决方案。此外,我们证明所提出的模型仍然可以在缺陷的面上表现良好,只有在验证性能下的少量损失。代码,培训的模型以及合成屏蔽数据的评估协议是公开的:https://github.com/fdbtrs/masked-face-recognition-kd。
translated by 谷歌翻译
最近,生成的数据无量子化作为一种​​实用的方法,将神经网络压缩到低位宽度而不访问真实数据。它通过利用其全精密对应物的批量归一化(BN)统计来生成数据来量化网络。然而,我们的研究表明,在实践中,BN统计的合成数据在分布和样品水平时严重均匀化,这导致量化网络的严重劣化。本文提出了一种通用不同的样本生成(DSG)方案,用于生成无数据的训练后量化和量化感知培训,以减轻有害的均质化。在我们的DSG中,我们首先将统计对齐缩写为BN层中的功能,以放宽分配约束。然后,我们加强特定BN层对不同样品的损失影响,并抑制了生成过程中样品之间的相关性,分别从统计和空间角度分别多样化样本。广泛的实验表明,对于大规模的图像分类任务,我们的DSG可以始终如一地优于各种神经结构上的现有数据无数据量化方法,尤其是在超低比特宽度下(例如,在W4A4设置下的22%的增益下)。此外,由我们的DSG引起的数据多样化引起了各种量化方法的一般增益,证明了多样性是无数据量化的高质量合成数据的重要特性。
translated by 谷歌翻译
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By utilizing the learned parameters (statistics) of FP32-pre-trained models, zero-shot quantization schemes focus on generating synthetic data by minimizing the distance between the learned parameters ($\mu$ and $\sigma$) and distributions of intermediate activations. Subsequently, they distill knowledge from the pre-trained model (\textit{teacher}) to the quantized model (\textit{student}) such that the quantized model can be optimized with the synthetic dataset. In general, zero-shot quantization comprises two major elements: synthesizing datasets and quantizing models. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours on even half an hour. Furthermore, we propose a framework called \genie~that generates data suited for post-training quantization. With the data synthesized by \genie, we can produce high-quality quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach.
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
量化图像超分辨率的深卷积神经网络大大降低了它们的计算成本。然而,现有的作品既不患有4个或低位宽度的超低精度的严重性能下降,或者需要沉重的微调过程以恢复性能。据我们所知,这种对低精度的漏洞依赖于特征映射值的两个统计观察。首先,特征贴图值的分布每个通道和每个输入图像都变化显着变化。其次,特征映射具有可以主导量化错误的异常值。基于这些观察,我们提出了一种新颖的分布感知量化方案(DAQ),其促进了超低精度的准确训练量化。 DAQ的简单功能确定了具有低计算负担的特征图和权重的动态范围。此外,我们的方法通过计算每个通道的相对灵敏度来实现混合精度量化,而无需涉及任何培训过程。尽管如此,量化感知培训也适用于辅助性能增益。我们的新方法优于最近的培训甚至基于培训的量化方法,以超低精度为最先进的图像超分辨率网络。
translated by 谷歌翻译
具有混合精度量化的大DNN可以实现超高压缩,同时保持高分类性能。但是,由于找到了可以引导优化过程的准确度量的挑战,与32位浮点(FP-32)基线相比,这些方法牺牲了显着性能,或者依赖于计算昂贵的迭代培训政策这需要预先训练的基线的可用性。要解决此问题,本文提出了BMPQ,一种使用位梯度来分析层敏感性的训练方法,并产生混合精度量化模型。 BMPQ需要单一的训练迭代,但不需要预先训练的基线。它使用整数线性程序(ILP)来动态调整培训期间层的精度,但经过固定的硬件预算。为了评估BMPQ的功效,我们对CiFar-10,CiFar-100和微小想象数据集的VGG16和Reset18进行了广泛的实验。与基线FP-32型号相比,BMPQ可以产生具有15.4倍的参数比特的模型,精度可忽略不计。与SOTA“在培训期间”相比,混合精确训练方案,我们的模型分别在CiFar-10,CiFar-100和微小想象中分别为2.1倍,2.2倍2.9倍,具有提高的精度高达14.54%。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
我们日常生活中的深度学习是普遍存在的,包括自驾车,虚拟助理,社交网络服务,医疗服务,面部识别等,但是深度神经网络在训练和推理期间需要大量计算资源。该机器学习界主要集中在模型级优化(如深度学习模型的架构压缩),而系统社区则专注于实施级别优化。在其间,在算术界中提出了各种算术级优化技术。本文在模型,算术和实施级技术方面提供了关于资源有效的深度学习技术的调查,并确定了三种不同级别技术的资源有效的深度学习技术的研究差距。我们的调查基于我们的资源效率度量定义,阐明了较低级别技术的影响,并探讨了资源有效的深度学习研究的未来趋势。
translated by 谷歌翻译
As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.
translated by 谷歌翻译
学习综合数据已成为零拍量化(ZSQ)的有希望的方向,其代表低位整数而不访问任何实际数据的神经网络。在本文中,我们在实际数据中观察到阶级内异质性的有趣现象,并表明现有方法未能在其合成图像中保留此属性,这导致有限的性能增加。要解决此问题,我们提出了一种新颖的零射量量化方法,称为IntraQ。首先,我们提出了一种局部对象加强件,该局部对象加强能够以不同的尺度和合成图像的位置定位目标对象。其次,我们引入了边缘距离约束,以形成分布在粗糙区域中的类相关的特征。最后,我们设计了一种软的成立损失,该损耗注射了软的先前标签,以防止合成图像过度接近固定物体。我们的intraQ被证明是在合成图像中提供阶级内的异质性,并且还观察到执行最先进的。例如,与高级ZSQ相比,当MobileNetv1的所有层被量化为4位时,我们的IntraIS获取9.17 \%增加了Imagenet上的前1个精度。代码是https://github.com/viperit/interq。
translated by 谷歌翻译