适应数据分布的结构(例如对称性和转型Imarerces)是机器学习中的重要挑战。通过架构设计或通过增强数据集,可以内在学习过程中内置Inhormces。两者都需要先验的了解对称性的确切性质。缺乏这种知识,从业者求助于昂贵且耗时的调整。为了解决这个问题,我们提出了一种新的方法来学习增强变换的分布,以新的\ emph {转换风险最小化}(trm)框架。除了预测模型之外,我们还优化了从假说空间中选择的转换。作为算法框架,我们的TRM方法是(1)有效(共同学习增强和模型,以\ emph {单训练环}),(2)模块化(使用\ emph {任何训练算法),以及(3)一般(处理\ \ ich {离散和连续}增强)。理论上与标准风险最小化的TRM比较,并在其泛化误差上给出PAC-Bayes上限。我们建议通过块组成的新参数化优化富裕的增强空间,导致新的\ EMPH {随机成分增强学习}(SCALE)算法。我们在CIFAR10 / 100,SVHN上使用先前的方法(快速自身自动化和武术器)进行实际比较规模。此外,我们表明规模可以在数据分布中正确地学习某些对称性(恢复旋转Mnist上的旋转),并且还可以改善学习模型的校准。
translated by 谷歌翻译
The cyber-physical convergence is opening up new business opportunities for industrial operators. The need for deep integration of the cyber and the physical worlds establishes a rich business agenda towards consolidating new system and network engineering approaches. This revolution would not be possible without the rich and heterogeneous sources of data, as well as the ability of their intelligent exploitation, mainly due to the fact that data will serve as a fundamental resource to promote Industry 4.0. One of the most fruitful research and practice areas emerging from this data-rich, cyber-physical, smart factory environment is the data-driven process monitoring field, which applies machine learning methodologies to enable predictive maintenance applications. In this paper, we examine popular time series forecasting techniques as well as supervised machine learning algorithms in the applied context of Industry 4.0, by transforming and preprocessing the historical industrial dataset of a packing machine's operational state recordings (real data coming from the production line of a manufacturing plant from the food and beverage domain). In our methodology, we use only a single signal concerning the machine's operational status to make our predictions, without considering other operational variables or fault and warning signals, hence its characterization as ``agnostic''. In this respect, the results demonstrate that the adopted methods achieve a quite promising performance on three targeted use cases.
translated by 谷歌翻译
Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators' mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection. For code and video results, see https://clvrai.com/pato
translated by 谷歌翻译
Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST and SpeechCommandsv2 demonstrate that this simple framework provides fast and accurate FL while supporting heterogenous device capabilities, efficient personalization, and scalable-cost anytime inference.
translated by 谷歌翻译
3D gaze estimation is most often tackled as learning a direct mapping between input images and the gaze vector or its spherical coordinates. Recently, it has been shown that pose estimation of the face, body and hands benefits from revising the learning target from few pose parameters to dense 3D coordinates. In this work, we leverage this observation and propose to tackle 3D gaze estimation as regression of 3D eye meshes. We overcome the absence of compatible ground truth by fitting a rigid 3D eyeball template on existing gaze datasets and propose to improve generalization by making use of widely available in-the-wild face images. To this end, we propose an automatic pipeline to retrieve robust gaze pseudo-labels from arbitrary face images and design a multi-view supervision framework to balance their effect during training. In our experiments, our method achieves improvement of 30% compared to state-of-the-art in cross-dataset gaze estimation, when no ground truth data are available for training, and 7% when they are. We make our project publicly available at https://github.com/Vagver/dense3Deyes.
translated by 谷歌翻译
Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps as well as a lack of sufficient data for cross-modality model training. To overcome this problem, we propose a novel method for paired NIR-VIS facial image generation. Specifically, we reconstruct 3D face shape and reflectance from a large 2D facial dataset and introduce a novel method of transforming the VIS reflectance to NIR reflectance. We then use a physically-based renderer to generate a vast, high-resolution and photorealistic dataset consisting of various poses and identities in the NIR and VIS spectra. Moreover, to facilitate the identity feature learning, we propose an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss, which not only reduces the modality gap between NIR and VIS images at the domain level but encourages the network to focus on the identity features instead of facial details, such as poses and accessories. Extensive experiments conducted on four challenging NIR-VIS face recognition benchmarks demonstrate that the proposed method can achieve comparable performance with the state-of-the-art (SOTA) methods without requiring any existing NIR-VIS face recognition datasets. With slightly fine-tuning on the target NIR-VIS face recognition datasets, our method can significantly surpass the SOTA performance. Code and pretrained models are released under the insightface (https://github.com/deepinsight/insightface/tree/master/recognition).
translated by 谷歌翻译
A backdoor attack places triggers in victims' deep learning models to enable a targeted misclassification at testing time. In general, triggers are fixed artifacts attached to samples, making backdoor attacks easy to spot. Only recently, a new trigger generation harder to detect has been proposed: the stylistic triggers that apply stylistic transformations to the input samples (e.g., a specific writing style). Currently, stylistic backdoor literature lacks a proper formalization of the attack, which is established in this paper. Moreover, most studies of stylistic triggers focus on text and images, while there is no understanding of whether they can work in sound. This work fills this gap. We propose JingleBack, the first stylistic backdoor attack based on audio transformations such as chorus and gain. Using 444 models in a speech classification task, we confirm the feasibility of stylistic triggers in audio, achieving 96% attack success.
translated by 谷歌翻译
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.
translated by 谷歌翻译
随着深度神经网络(DNN)的出现,成为许多计算机视觉任务中的骨干,它们在现实世界中的消费应用程序中的采用不断扩大。鉴于智能设备的丰富性和无所不能,正在形成“智能生态系统”,同时进行感应而不是独立。这将处式推理范式转移到在边缘部署集中式神经加工单元(NPU),其中多个设备(例如,在智能家居或自动驾驶汽车中)可以通过动态速率流式传输数据以进行处理。尽管这为输入批处理提供了增强的潜力,但幼稚的解决方案可以导致表现不佳的性能和经验质量,尤其是在尖峰负载下。同时,动态DNN的部署,包括随机计算图(例如早期 - 外观(EE)模型),引入了此类系统中动态行为的新维度。在这项工作中,我们提出了一种新颖的早期感知的调度算法,该算法允许在运行时进行样本抢占,以说明到达和早期外来过程引入的动态性。同时,我们向NPU硬件体系结构的设计空间介绍了两个新颖的维度,即流体批处理和可堆叠的处理元素,这些元素可以使运行时适应性适应不同的批次尺寸,并显着改善了NPU利用率,即使在小批次尺寸下也是如此。我们的评估表明,我们的系统分别在平均延迟和尾部潜伏期SLO满意度方面,平均达到1.97倍和6.7倍的改善。
translated by 谷歌翻译
如今,越来越多的数据集已发布针对系统和模型的研究和开发,从而直接比较,解决方案的持续改进以及研究人员参与实验,现实生活数据。但是,尤其是在结构健康监测(SHM)领域中,在许多情况下,新的研究项目具有结构设计和实施,传感器选择和技术推动因素的独特组合,这些组合不符合相关个人研究的配置文学。因此,由于我们没有找到任何相关存储库,因此我们将案例研究中的数据分享到研究界。更具体地说,在本文中,我们提出了一个新颖的时间序列数据集,用于使用陶瓷压电传感器(PZTS)连接到物联网(IOT)设备(IOT)设备的陶瓷压电传感器(PZTS),用于塑料薄板上的撞击检测和本地化,朝着结构性健康监测应用。数据集是从低速,低能冲击事件的实验过程中收集的,该过程包括每个独特的实验至少3个重复,而输入测量值来自放置在板的角落的4个PZT传感器。对于每个重复和传感器,以100 kHz的采样率存储5000个值。该系统用钢球激发,释放的高度从10厘米到20厘米不等。该数据集可在GitHub(https://github.com/smart-objects/impact-events-dataset)中获得。
translated by 谷歌翻译