在本文中,我们描述了RTZR团队Voxceleb扬声器识别挑战2022(VOXSRC-22)的最高得分提交,在封闭的数据集中,扬声器验证轨道1.最高执行的系统是7型型号的融合,其中包含3种不同类型的类型模型体系结构。我们专注于培训模型以学习周期性信息。因此,所有型号均以4-6秒的镜头训练,每次发言。此外,我们采用了较大的保证金微调策略,该策略在我们的某些融合模型的先前挑战上表现出良好的表现。在评估过程中,我们应用了具有自适应对称归一化(AS-NORM)和矩阵得分平均值(MSA)的评分方法。最后,我们将模型与逻辑回归混合在一起,以融合所有受过训练的模型。最终提交在VOXSRC22测试集上实现了0.165 DCF和2.912%EER。
translated by 谷歌翻译
Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.
translated by 谷歌翻译
An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn Action datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.
translated by 谷歌翻译
We propose a domain adaptation method, MoDA, which adapts a pretrained embodied agent to a new, noisy environment without ground-truth supervision. Map-based memory provides important contextual information for visual navigation, and exhibits unique spatial structure mainly composed of flat walls and rectangular obstacles. Our adaptation approach encourages the inherent regularities on the estimated maps to guide the agent to overcome the prevalent domain discrepancy in a novel environment. Specifically, we propose an efficient learning curriculum to handle the visual and dynamics corruptions in an online manner, self-supervised with pseudo clean maps generated by style transfer networks. Because the map-based representation provides spatial knowledge for the agent's policy, our formulation can deploy the pretrained policy networks from simulators in a new setting. We evaluate MoDA in various practical scenarios and show that our proposed method quickly enhances the agent's performance in downstream tasks including localization, mapping, exploration, and point-goal navigation.
translated by 谷歌翻译
The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on two real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.
translated by 谷歌翻译
拆分学习和推理建议运行跨客户设备和云的大型模型的培训/推理。但是,这样的模型拆分引起了隐私问题,因为流过拆分层的激活可能会泄漏有关客户端私人输入数据的信息。当前,没有一个好方法可以量化通过分层泄漏多少私人信息,也没有一种将隐私提高到所需级别的好方法。在这项工作中,我们建议将Fisher信息用作隐私指标来衡量和控制信息泄漏。我们表明,Fisher信息可以直观地理解以无偏重建攻击者的限制的错误形式通过拆分层泄漏了多少私人信息。然后,我们提出了一种增强隐私的技术REFIL,可以在拆分层上强制使用用户呈现的Fisher信息泄漏,以实现高隐私,同时保持合理的实用程序。
translated by 谷歌翻译
Wearable sensor-based human activity recognition (HAR) has emerged as a principal research area and is utilized in a variety of applications. Recently, deep learning-based methods have achieved significant improvement in the HAR field with the development of human-computer interaction applications. However, they are limited to operating in a local neighborhood in the process of a standard convolution neural network, and correlations between different sensors on body positions are ignored. In addition, they still face significant challenging problems with performance degradation due to large gaps in the distribution of training and test data, and behavioral differences between subjects. In this work, we propose a novel Transformer-based Adversarial learning framework for human activity recognition using wearable sensors via Self-KnowledgE Distillation (TASKED), that accounts for individual sensor orientations and spatial and temporal features. The proposed method is capable of learning cross-domain embedding feature representations from multiple subjects datasets using adversarial learning and the maximum mean discrepancy (MMD) regularization to align the data distribution over multiple domains. In the proposed method, we adopt the teacher-free self-knowledge distillation to improve the stability of the training procedure and the performance of human activity recognition. Experimental results show that TASKED not only outperforms state-of-the-art methods on the four real-world public HAR datasets (alone or combined) but also improves the subject generalization effectively.
translated by 谷歌翻译
联合学习(FL)旨在对多个数据所有者持有的分布式数据执行隐私的机器学习。为此,FL要求数据所有者在本地执行培训,并与中央服务器共享梯度更新(而不是私人输入),然后将其安全地汇总在多个数据所有者上。尽管汇总本身并不能证明提供隐私保护,但先前的工作表明,如果批处理大小足够大,则足够了。在本文中,我们提出了鸡尾酒会攻击(CPA),与先前的信念相反,能够从汇总的渐变中恢复私人输入,这是批量较大的大小。 CPA利用了至关重要的见解,即来自完全连接的层的总梯度是其输入的线性组合,这使我们将梯度反演作为盲源分离(BSS)问题(非正式地称为鸡尾酒会问题)。我们适应独立的组件分析(ICA) - BSS问题的经典解决方案 - 恢复针对完全连接和卷积网络的私人输入,并表明CPA明显优于先前的梯度反转攻击,对成像网的输入量表,并表现出Imagenet大小的输入的范围最高可达1024的大批量。
translated by 谷歌翻译
FP8是加速深度学习训练推论以外的16位格式的自然发展。在本文中,我们提出了一个8位浮点(FP8)二进制互换格式,该格式由两个编码组成-E4M3(4位指数和3位Mantissa)和E5M2(5位指数和2位指数和2位Mantissa)。尽管E5M2遵循IEEE 754惯例代表特殊值的惯例,但E4M3的动态范围是通过不代表无限态,只有一个Mantissa Bit-Pattern来扩展NAN。我们证明了FP8格式对各种图像和语言任务的功效,从而有效地匹配了16位培训课程所达到的质量。我们的研究涵盖了主要的现代神经网络体系结构 - CNN,RNN和基于变压器的模型,使所有超参数与16位基线训练课程保持不变。我们的培训实验包括大型,最多175b参数,语言模型。我们还检查了使用16位格式训练的语言模型的FP8训练后定量化,该格式抗拒固定点INT8量化。
translated by 谷歌翻译
增强学习(RL)在接触式操纵中的经验成功(RL)从基于模型的角度来理解了很多待理解,其中关键困难通常归因于(i)触点模式的爆炸,(ii)僵硬,非平滑接触动力学和由此产生的爆炸 /不连续梯度,以及(iii)计划问题的非转换性。 RL的随机性质通过有效采样和平均接触模式来解决(i)和(ii)。另一方面,基于模型的方法通过分析平滑接触动力学来解决相同的挑战。我们的第一个贡献是建立两种方法的简单系统方法的理论等效性,并在许多复杂示例上提供定性和经验的等效性。为了进一步减轻(II),我们的第二个贡献是凸面的凸面,可区分和准动力的触点动力学表述,这两个方案都可以平滑方案,并且通过实验证明了对接触富含接触的计划非常有效。我们的最终贡献解决了(III),在其中我们表明,当通过平滑度抽取接触模式时,基于经典的运动计划算法在全球计划中可以有效。将我们的方法应用于具有挑战性的接触式操纵任务的集合中,我们证明了基于模型的有效运动计划可以实现与RL相当的结果,而计算却大大较少。视频:https://youtu.be/12ew4xc-vwa
translated by 谷歌翻译