情感语音综合旨在使人类的声音具有各种情感影响。当前的研究主要集中于模仿属于特定情感类型的平均风格。在本文中,我们试图在运行时与情感混合在一起。我们提出了一种新颖的表述,可以衡量不同情绪的语音样本之间的相对差异。然后,我们将公式纳入序列到序列情感文本到语音框架中。在培训期间,该框架不仅明确地表征了情感风格,而且还通过用其他情感量化差异来探索情绪的序数。在运行时,我们通过手动定义情感属性向量来控制模型以产生所需的情绪混合物。客观和主观评估验证了拟议框架的有效性。据我们所知,这项研究是关于言语中混合情绪的建模,综合和评估混合情绪的第一项研究。
translated by 谷歌翻译
得益于语音情绪识别(SER),计算机可以以情感智能的方式理解并与人互动。但是,可以显着改善SER在交叉和现实世界中的实时数据供稿方案中的性能。无法将现有模型调整到新域是SER方法的缺点之一。为了应对这一挑战,研究人员开发了域的适应技术,这些技术转移了模型在整个领域中学习的知识。尽管现有的域适应技术已经改善了跨域的性能,但可以改进它们以适应现实世界中的实时数据提要情况,在这种情况下,模型可以在部署时可以自动调整。在本文中,我们提出了一种基于强化的学习策略(RL-DA),用于在与环境互动并收集持续反馈的同时,将预训练的模型调整为现实世界中的实时数据供稿设置。 RL-DA对SER任务进行了评估,包括跨语言和跨语言域自适应模式。评估结果表明,在实时数据供稿设置中,RL-DA在跨科普斯和跨语言场景中的基线策略分别优于基线策略。
translated by 谷歌翻译
情绪转换(EVC)寻求转换话语的情绪状态,同时保留语言内容和扬声器身份。在EVC,情绪通常被视为离散类别,忽略了言论也传达了听众可以感知的各种强度水平的情绪。在本文中,我们的目标是明确地表征和控制情绪强度。我们建议解开语言内容的扬声器风格,并将扬声器风格编码成一个嵌入的嵌入空间,形成情绪嵌入的原型。我们进一步从情感标记的数据库中了解实际的情感编码器,并研究使用相对属性来表示细粒度的情绪强度。为确保情绪可理解性,我们将情感分类损失和情感嵌入了EVC网络培训中的相似性损失。根据需要,所提出的网络控制输出语音中的细粒度情绪强度。通过目标和主观评估,我们验证了建议网络的情感表达和情感强度控制的有效性。
translated by 谷歌翻译
We present 3D Highlighter, a technique for localizing semantic regions on a mesh using text as input. A key feature of our system is the ability to interpret "out-of-domain" localizations. Our system demonstrates the ability to reason about where to place non-obviously related concepts on an input 3D shape, such as adding clothing to a bare 3D animal model. Our method contextualizes the text description using a neural field and colors the corresponding region of the shape using a probability-weighted blend. Our neural optimization is guided by a pre-trained CLIP encoder, which bypasses the need for any 3D datasets or 3D annotations. Thus, 3D Highlighter is highly flexible, general, and capable of producing localizations on a myriad of input shapes. Our code is publicly available at https://github.com/threedle/3DHighlighter.
translated by 谷歌翻译
We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. The motivation for our framework is driven by interactive workflows used for decaling, texturing, or painting on surfaces. Our key idea is to incorporate segmentation probabilities as weights of a classical parameterization method, implemented as a novel differentiable parameterization layer within a neural network framework. We train a segmentation network to select 3D regions that are parameterized into 2D and penalized by the resulting distortion, giving rise to segmentations which are distortion-aware. Following training, a user can use our system to interactively select a point on the mesh and obtain a large, meaningful region around the selection which induces a low-distortion parameterization. Our code and project page are currently available.
translated by 谷歌翻译
There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthesis and editing. Loops are a non-local description of the underlying shape, as simple loop manipulations (such as shifts) result in significant structural changes to the geometry. This is in contrast to manipulating local primitives such as points in a point cloud or a triangle in a triangle mesh. We further demonstrate that loops are intuitive and natural primitive for analyzing and editing shapes, both computationally and for users.
translated by 谷歌翻译
We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. We only require a short video clip of the person to create the avatar and assume no knowledge about the lighting environment. We present a novel framework to model humans while disentangling their geometry, texture, and also lighting environment from monocular RGB videos. To simplify this otherwise ill-posed task we first estimate the coarse geometry and texture of the person via SMPL+D model fitting and then learn an articulated neural representation for photorealistic image generation. RANA first generates the normal and albedo maps of the person in any given target body pose and then uses spherical harmonics lighting to generate the shaded image in the target lighting environment. We also propose to pretrain RANA using synthetic images and demonstrate that it leads to better disentanglement between geometry and texture while also improving robustness to novel body poses. Finally, we also present a new photorealistic synthetic dataset, Relighting Humans, to quantitatively evaluate the performance of the proposed approach.
translated by 谷歌翻译
We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer
translated by 谷歌翻译
Machine learning is the study of computer algorithms that can automatically improve based on data and experience. Machine learning algorithms build a model from sample data, called training data, to make predictions or judgments without being explicitly programmed to do so. A variety of wellknown machine learning algorithms have been developed for use in the field of computer science to analyze data. This paper introduced a new machine learning algorithm called impact learning. Impact learning is a supervised learning algorithm that can be consolidated in both classification and regression problems. It can furthermore manifest its superiority in analyzing competitive data. This algorithm is remarkable for learning from the competitive situation and the competition comes from the effects of autonomous features. It is prepared by the impacts of the highlights from the intrinsic rate of natural increase (RNI). We, moreover, manifest the prevalence of the impact learning over the conventional machine learning algorithm.
translated by 谷歌翻译
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.
translated by 谷歌翻译