智能论文笔记

Speech Forensics: Blind Voice Mimicry Detection

Sahar Al Ajmi , Khizar Hayat , Alaa M. Al Obaidi , Naresh Kumar , Munaf Najmuldeen , Baptiste Magnier

分类：人工智能 | 机器学习 | 神经与进化计算

2022-09-26

音频是人类交流最常用的方式之一，但与此同时，它很容易被欺骗人们滥用。随着AI的革命，几乎每个人都可以访问相关技术，从而使罪犯犯罪和伪造变得简单。在这项工作中，我们引入了一种深度学习方法，以开发一种分类器，该分类器将盲目地将输入音频分类为真实或模仿。提出的模型接受了从大型音频数据集提取的一组重要功能的培训，以获取分类器，该分类器已在不同音频的相同功能上进行了测试。为这项工作创建了两个数据集；所有英语数据集和混合数据集（阿拉伯语和英语）。这些数据集已通过GitHub提供，可在https://github.com/sass7/dataset上使用研究社区。为了进行比较，还通过人类检查对音频进行了分类，主题是母语人士。随之而来的结果很有趣，并且表现出强大的精度。

translated by 谷歌翻译

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

Ahmed M. Alaa , Boris van Breugel , Evgeny Saveliev , Mihaela van der Schaar

分类：机器学习 | (统计)机器学习

2021-02-17

为生成模型设计域和模型不合稳定的评估指标是一个重要且尚未解决的问题。大多数仅根据图像合成设置量身定制的指标表现出有限的能力，可以诊断跨更广泛的应用域的生成模型的不同模式。在本文中，我们介绍了三维评估度量标准（$ \ alpha $ - precision，$ \ beta $ - recall，autherticity），其特征是任何生成模型中任何生成模型的保真度，多样性和概括性的表征。我们的度量标准通过精确重新分析统一统计差异度量，从而实现了模型保真度和多样性的样本和分布级诊断。我们将概括作为额外的独立维度（对忠诚度多样性权衡取舍），该概括量化了模型复制培训数据的程度 - 在对敏感数据建模具有隐私要求的敏感数据时，这是至关重要的绩效指标。这三个度量组件对应于（可解释的）概率数量，并通过样品级二进制分类估算。我们指标的样本级别的性质激发了一种新颖的用例，我们称之为模型审核，其中我们判断（Black-Box）模型生成的单个样品的质量，丢弃了低质量样品，从而改善了整体模型性能事后方式。

translated by 谷歌翻译

A C++ Implementation of a Cartesian Impedance Controller for Robotic Manipulators

Matthias Mayr , Julian M. Salt-Ducaju

分类：机器人

2022-12-21

Cartesian impedance control is a type of motion control strategy for robots that improves safety in partially unknown environments by achieving a compliant behavior of the robot with respect to its external forces. This compliant robot behavior has the added benefit of allowing physical human guidance of the robot. In this paper, we propose a C++ implementation of compliance control valid for any torque-commanded robotic manipulator. The proposed controller implements Cartesian impedance control to track a desired end-effector pose. Additionally, joint impedance is projected in the nullspace of the Cartesian robot motion to track a desired robot joint configuration without perturbing the Cartesian motion of the robot. The proposed implementation also allows the robot to apply desired forces and torques to its environment. Several safety features such as filtering, rate limiting, and saturation are included in the proposed implementation. The core functionalities are in a re-usable base library and a Robot Operating System (ROS) ros_control integration is provided on top of that. The implementation was tested with the KUKA LBR iiwa robot and the Franka Emika Robot (Panda) both in simulation and with the physical robots.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Transformers for End-to-End InfoSec Tasks: A Feasibility Study

Ethan M. Rudd , Mohammad Saidur Rahman , Philip Tully

分类：机器学习 | 人工智能

2022-12-05

In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experimental settings to determine the ingredients necessary for performant detection models. We show that in contrast to conventional transformers trained on more standard NLP-related tasks, our URL transformer model requires a different training approach to reach high performance levels. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to binary classification of malicious or benign URLs, but 2) that using an auxiliary auto-regressive loss improves performance when training from scratch. We introduce a method for mixed objective optimization, which dynamically balances contributions from both loss terms so that neither one of them dominates. We show that this method yields quantitative evaluation metrics comparable to that of several top-performing benchmark classifiers. Unlike URLs, binary executables contain longer and more distributed sequences of information-rich bytes. To accommodate such lengthy byte sequences, we introduce additional context length into the transformer by providing its self-attention layers with an adaptive span similar to Sukhbaatar et al. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets, but also point out the need for further exploration into model improvements in scalability and compute efficiency.

translated by 谷歌翻译

Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming

Veronica Piccialli , Antonio M. Sudoso

分类：机器学习 | (统计)机器学习

2022-09-19

最近已扩展了最小方形聚类（MSSC）或K-均值类型聚类的最小总和，以利用每个群集的基数的先验知识。这种知识用于提高性能以及解决方案质量。在本文中，我们提出了一种基于分支和切割技术的精确方法，以解决基数受限的MSSC。对于下边界的例程，我们使用Rujeerapaiboon等人最近提出的半决赛编程（SDP）放松。 [Siam J. Optim。 29（2），1211-1239，（2019）]。但是，这种放松只能用于小型实例中的分支和切割方法。因此，我们得出了一种新的SDP松弛，该松弛随着实例大小和簇的数量更好。在这两种情况下，我们都通过添加多面体切割来增强结合。从量身定制的分支策略中受益，该策略会实施成对的约束，我们减少了儿童节点中出现的问题的复杂性。相反，对于上限，我们提出了一个本地搜索过程，该过程利用在每个节点上求解的SDP松弛的解。计算结果表明，所提出的算法在全球范围内首次求解了大小的现实实例，比通过最新精确方法求解的算法大10倍。

translated by 谷歌翻译

Learning to translate by learning to communicate

C. M. Downey , Leo Z. Liu , Xuhui Zhou , Shane Steinert-Threlkeld

分类：自然语言处理 | 人工智能

2022-07-14

我们制定并测试一种使用概括的多语言模型使用新兴通信（EC）的技术，以改进现代无监督的NMT系统，尤其是对于低资源语言。有人认为，目前在NLP上的主要范式仅在文本语料库上进行预处理，不会产生强大的自然语言理解系统，并且强调了对接地，面向目标和互动语言学习的需求。在我们的方法中，我们将现代的多语言模型（Mbart，Liu etal。2020）嵌入到EC图像引用游戏中，其中该模型被激励使用多语言世代来完成视力基础的任务，并假设有假设是这将使多种语言与共享的任务空间保持一致。我们提出了EC微调的两种变体（Steinert-Threlkeldet。Al。2022），其中一种在6/8翻译设置中优于基于反射的基线，并证明对尼泊尔和尼泊尔和尼泊尔和低资产的语言特别有益僧伽罗。

translated by 谷歌翻译

Lookback for Learning to Branch

Prateek Gupta , Elias B. Khalil , Didier Chetélat , Maxime Gasse , Yoshua Bengio , Andrea Lodi , M. Pawan Kumar

分类：机器学习 | (统计)机器学习

2022-06-30

表达性和计算便宜的两分图神经网络（GNN）已被证明是基于深度学习的混合成分线性程序（MILP）求解器的重要组成部分。最近的工作证明了此类GNN在分支结合（B＆B）求解器中取代分支（可变选择）启发式方面的有效性。这些GNN经过训练，离线和集合，以模仿一个非常好但计算昂贵的分支启发式，强大的分支。鉴于B＆B会导致子隔间树，我们问（a）目标启发式启发式在B＆B树的邻近节点之间是否存在很强的依赖性，并且（b）如果是这样，我们是否可以将它们合并到我们的培训程序。具体来说，我们发现，有了强大的分支启发式，孩子节点的最佳选择通常是父母的第二好的选择。我们将其称为“回顾”现象。令人惊讶的是，Gasse等人的典型分支GNN。（2019年）经常错过这个简单的“答案”。为了通过将回顾现象纳入GNN来更紧密地模仿目标行为，我们提出了两种方法：（a）标准跨凝性损失函数的目标平滑，（b）添加父级（PAT）target（PAT）回顾量学期。最后，我们提出了一个模型选择框架，以结合更难构建的目标，例如在最终模型中解决时间。通过对标准基准实例进行广泛的实验，我们表明我们的提案导致B＆B树大小的22％减少，并且在解决时间的解决方案中提高了15％。

translated by 谷歌翻译

UniMorph 4.0: Universal Morphology

Khuyagbaatar Batsuren , Omer Goldman , Salam Khalifa , Nizar Habash , Witold Kieraś , Gábor Bella , Brian Leonard , Garrett Nicolai , Kyle Gorman , Yustinus Ghanggo Ate

分类：自然语言处理

2022-05-07

通用形态（UNIMORPH）项目是一项合作的努力，可为数百种世界语言实例化覆盖范围的标准化形态拐角。该项目包括两个主要的推力：一种无独立的特征架构，用于丰富的形态注释，并以各种语言意识到该模式的各种语言的带注释数据的类型级别资源。本文介绍了过去几年对几个方面的扩张和改进（自McCarthy等人（2020年）以来）。众多语言学家的合作努力增加了67种新语言，其中包括30种濒危语言。我们已经对提取管道进行了一些改进，以解决一些问题，例如缺少性别和马克龙信息。我们还修改了模式，使用了形态学现象所需的层次结构，例如多肢体协议和案例堆叠，同时添加了一些缺失的形态特征，以使模式更具包容性。鉴于上一个UniMorph版本，我们还通过16种语言的词素分割增强了数据库。最后，这个新版本通过通过代表来自metphynet的派生过程的实例丰富数据和注释模式来推动将衍生物形态纳入UniMorph中。

translated by 谷歌翻译

Test-time adaptation with slot-centric models

Mihir Prabhudesai , Anirudh Goyal , Sujoy Paul , Sjoerd van Steenkiste , Mehdi S. M. Sajjadi , Gaurav Aggarwal , Thomas Kipf , Deepak Pathak , Katerina Fragkiadaki

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-03-21

Current supervised visual detectors, though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric inductive bias, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation in Slot-TTA greatly improves instance segmentation in out-of-distribution scenes. We evaluate Slot-TTA in several 3D and 2D scene instance segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised test-time adaptation methods.

translated by 谷歌翻译