在本文中,我们利用低级编译器中间表示(IR)来改善代码翻译。传统的转运器依赖于句法信息和手工制作的规则,这限制了其适用性并产生不自然的代码。将神经机器翻译(NMT)方法应用于代码,已成功扩大了可以获得自然翻译的程序集。但是,它们将代码视为文本令牌的序列,并且在具有不同语言的语义不同的类似代码之间仍然没有足够的区分。结果是低质量的翻译,降低了NMT的实用性,并强调对方法的需求显着提高了其准确性。在这里,我们建议与IRS,特别是LLVM IR增强代码翻译,并在C ++,Java,Rust和Go语言上进行结果。我们的方法改善了无监督的代码翻译的最新技术状态,将正确翻译的数量平均增加了11%,而Java -Rust Pair则最多可提高79%。我们通过添加数百个GO和RUST功能来扩展代码翻译的先前测试集。此外,我们在IR代表问题,从IR生成编程源代码以及使用IRS作为中介枢轴进行翻译的研究。
translated by 谷歌翻译
This work focuses on unsupervised representation learning in person re-identification (ReID). Recent self-supervised contrastive learning methods learn invariance by maximizing the representation similarity between two augmented views of a same image. However, traditional data augmentation may bring to the fore undesirable distortions on identity features, which is not always favorable in id-sensitive ReID tasks. In this paper, we propose to replace traditional data augmentation with a generative adversarial network (GAN) that is targeted to generate augmented views for contrastive learning. A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features. Deviating from previous GAN-based ReID methods that only work in id-unrelated space (pose and camera style), we conduct GAN-based augmentation on both id-unrelated and id-related features. We further propose specific contrastive losses to help our network learn invariance from id-unrelated and id-related augmentations. By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.
translated by 谷歌翻译
Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.
translated by 谷歌翻译
3D autonomous driving semantic segmentation using deep learning has become, a well-studied subject, providing methods that can reach very high performance. Nonetheless, because of the limited size of the training datasets, these models cannot see every type of object and scenes found in real-world applications. The ability to be reliable in these various unknown environments is called domain generalization. Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling LiDAR domain shifts. We also propose the first method designed to address this domain generalization, which we call 3DLabelProp. This method relies on leveraging the geometry and sequentiality of the LiDAR data to enhance its generalization performances by working on partially accumulated point clouds. It reaches a mIoU of 52.6% on SemanticPOSS while being trained only on SemanticKITTI, making it state-of-the-art method for generalization (+7.4% better than the second best method). The code for this method will be available on Github.
translated by 谷歌翻译
In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objective and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
translated by 谷歌翻译
语音活动检测(VAD)旨在检测音频信号上的语音段,这对于许多今天的基于语音的应用程序来说是必要的第一步。当前的最新方法着重于训练直接包含声学中包含的神经网络,例如MEL Filter Basks(MFBS)。因此,此类方法需要一个额外的归一化步骤,以适应影响声学的新领域,这可能仅仅是由于说话者,麦克风或环境的变化所致。此外,这个归一化步骤通常是一种具有一定局限性的基本方法,例如高度容易受到新域可用的数据量。在这里,我们利用了众包共同的声音(CV)语料库,以表明基于自我监督学习(SSL)的表示形式可以很好地适应不同的领域,因为它们是通过跨多个领域的语音表达来计算的。 SSL表示也比基于手工制作的表示(MFB)和现成的VAD的系统获得更好的结果,并在跨域设置方面有了显着改善。
translated by 谷歌翻译
大多数自动情绪识别系统利用情绪的时间连续注释,以提供对自发表达的细粒度描述,如现实生活中所观察到的那样。由于情感是相当主观的,因此通常由几个注释者执行的注释,这些注释为给定维度提供痕迹,即描述诸如唤醒或价值之类的维度的时间连续系列。但是,相同表达式的注释在时间或价值之间很少一致,这增加了用于学习情感预测模型的迹线的偏见和延迟。因此,我们提出了一种可以动态补偿注释之间的矛盾的方法,并使用复发性神经网络将痕迹与相应的声学特征同步。进行了几个情绪数据集进行实验评估,其中包括中文,法语,德语和匈牙利参与者,他们在无噪声条件或野外进行远程互动。结果表明,对于唤醒和价值,我们的方法可以显着增加通道间的一致性以及迹线和音频特征之间的相关性。此外,在使用简单的轻量重量模型对这些维度的自动预测中获得了改进,尤其是在无噪声条件下的价值中,并唤醒了在野外捕获的记录。
translated by 谷歌翻译
对于适当的统计估计,数据集中的偏差可能非常有害。为了应对这个问题,已经开发了重要的加权方法,以将任何有偏分的分布与其相应的目标无偏分布相匹配。如今,开创性内核平均匹配(KMM)方法仍然被认为是该研究领域的最新技术。但是,该方法的主要缺点之一是大型数据集的计算负担。基于Huang等人的先前作品。 (2007)和De Mathelin等。 (2021),我们得出了一种新颖的重要性加权算法,该算法通过使用神经网络预测实例权重来扩展到大型数据集。我们在多个公共数据集上显示,在各种样本偏见下,我们提出的方法大大减少了大数据集上的计算时间,同时与其他重要的加权方法相比,保持了相似的样本偏差校正性能。所提出的方法似乎是唯一能够在合理时间内使用多达200万个数据的大型数据集进行相关重新加权的方法。
translated by 谷歌翻译
当前的骨架动作表示方法学习的方法通常集中在受约束的场景上,其中在实验室环境中记录了视频和骨骼数据。在处理现实世界视频中估计的骨骼数据时,由于受试者和摄像机观点之间的差异很大,因此此类方法的性能差。为了解决这个问题,我们通过一种新颖的视图自动编码器介绍了自我监视的骨架动作表示学习。通过Leverage在不同的人类表演者之间进行运动重新定位作为借口任务,以便在2D或3D骨架序列的视觉表示之上删除潜在的动作特异性“运动”特征。这种“运动”功能对于骨架几何和相机视图是不变的,并允许通过辅助,跨视图和跨视图动作分类任务。我们进行了一项研究,重点是针对基于骨架的动作识别的转移学习,并在现实世界数据(例如Posetics)上进行自我监督的预训练。我们的结果表明,从VIA中学到的骨架表示足以提高最新动作分类精度,不仅在3D实验室数据集(例如NTU-RGB+D 60和NTU-RGB+D 120)上,而且还在在仅准确估计2D数据的现实数据集中,例如Toyota Smarthome,UAV-Human和Penn Action。
translated by 谷歌翻译
深度神经网络在人类分析中已经普遍存在,增强了应用的性能,例如生物识别识别,动作识别以及人重新识别。但是,此类网络的性能通过可用的培训数据缩放。在人类分析中,对大规模数据集的需求构成了严重的挑战,因为数据收集乏味,廉价,昂贵,并且必须遵守数据保护法。当前的研究研究了\ textit {合成数据}的生成,作为在现场收集真实数据的有效且具有隐私性的替代方案。这项调查介绍了基本定义和方法,在生成和采用合成数据进行人类分析时必不可少。我们进行了一项调查,总结了当前的最新方法以及使用合成数据的主要好处。我们还提供了公开可用的合成数据集和生成模型的概述。最后,我们讨论了该领域的局限性以及开放研究问题。这项调查旨在为人类分析领域的研究人员和从业人员提供。
translated by 谷歌翻译