跨言扬声器风格的转移旨在提取给定参考语音的语音样式,可以在任意目标扬声器的音色中复制。有关此主题的现有方法已经探索了利用语音级样式标签通过全球或本地规模样式表示进行样式转移。但是,有声读物数据集通常以本地韵律和全球类型的形式进行特征,并且很少伴有发言级风格的标签。因此,正确地将阅读方式转移到不同的扬声器上仍然是一项具有挑战性的任务。本文旨在介绍块的多尺度跨言式风格模型,以捕获有声读物的全球类型和本地韵律。此外,通过使用拟议的可切换对手分类器来解开扬声器的音色和样式,提取的阅读样式可适应不同扬声器的音色。实验结果证实,该模型设法将给定的阅读方式转移到新的目标扬声器上。在局部韵律和全球流派类型预测指标的支持下,进一步揭示了所提出的方法在多扬声器有声读物中的潜力。
translated by 谷歌翻译
Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS). As large scale pre-trained text representation develops, bidirectional encoder representations from Transformers (BERT) has been proven to embody semantic information and employed to TTS recently. However, original or simply fine-tuned BERT embeddings still cannot provide sufficient semantic knowledge that expressive TTS models should take into account. In this paper, we propose a word-level semantic representation enhancing method based on dependency structure and pre-trained BERT embedding. The BERT embedding of each word is reprocessed considering its specific dependencies and related words in the sentence, to generate more effective semantic representation for TTS. To better utilize the dependency structure, relational gated graph network (RGGN) is introduced to make semantic information flow and aggregate through the dependency structure. The experimental results show that the proposed method can further improve the naturalness and expressiveness of synthesized speeches on both Mandarin and English datasets.
translated by 谷歌翻译
尽管Sylvester方程在各种图形挖掘应用程序(例如半监督标签学习和网络对齐)上取得了成功,但仍存在一些限制。Sylvester方程无法建模非线性关系以及对不同任务进行调整的僵化性限制了其绩效。在本文中,我们提出了一个端到端的神经框架Symgnn,该框架由多网络神经聚合模块和先前的多网络协会结合学习模块组成。提出的框架继承了Sylvester方程的关键思想,同时将其推广以克服上述局限性。对现实世界数据集的经验评估表明,Symgnn总体的实例超过了几何矩阵完成任务中的基准,其低级别的实例化可以将记忆消耗降低16.98%\%。
translated by 谷歌翻译
动态和多模式特征是两个重要的属性,并且在许多真实世界优化问题中广泛存在。前者说明了这些问题的目标和/或限制随着时间的推移而变化,而后者意味着在每个环境中存在多于一个最佳解决方案(有时包括接受的本地解决方案)。动态多峰优化问题(DMMOPS)具有这些特征,这些特征都在进化计算和群体智能领域中进行了多年,并吸引了越来越多的关注。解决这些问题需要优化算法在更改环境中同时跟踪多个Optima。因此,决策者可以根据他们的经验和偏好挑选每个环境中的一个最佳解决方案,或者当当前一个无法正常工作时,或者快速转向其他解决方案。这对决策者来说非常有帮助,特别是在面临改变环境时。在本次竞争中,给出了关于DMMOPS的测试套装,其中模拟了现实世界的应用程序。具体而言,该测试服采用8个多模函数和8种变化模式来构建24个典型的动态多模态优化问题。同时,还可以给出度量来测量算法性能,这考虑了所有环境中发现的最佳解决方案的平均数。促进动态多式化优化算法的发展将非常有帮助。
translated by 谷歌翻译
从嘈杂,不均匀和无知点云中的表面重建是计算机视觉和图形中的一个令人迷人但具有挑战性的问题。随着3D扫描技术的创新,强烈希望直接转换原始扫描数据,通常具有严重噪声,进入歧管三角网格。现有的基于学习的方法旨在学习零级曲面对底层形状进行的隐式功能。然而,大多数人都无法获得嘈杂和稀疏点云的理想结果,限制在实践中。在本文中,我们介绍了神经IML,一种新的方法,它直接从未引起的原始点云学习抗噪声符号距离功能(SDF)。通过最大限度地减少由隐式移动最小二乘函数获得的损耗,我们的方法通过最小化了自我监督的方式,从原始点云中从原始点云中的底层SDF,而不是明确地学习前提。 (IML)和我们的神经网络另一个,我们的预测器的梯度定义了便于计算IML的切线束。我们证明,当几个SDFS重合时,我们的神经网络可以预测符号隐式功能,其零电平集用作底层表面的良好近似。我们对各种基准进行广泛的实验,包括合成扫描和现实世界扫描,以表现出从各种投入重建忠实形状的能力,特别是对于具有噪音或间隙的点云。
translated by 谷歌翻译
本文档介绍了生成连续动态优化问题实例的广义移动峰值基准(GMPB)。GMPB产生的景观是通过组装多种可控特性的多种可控特性来构建的,该景观包括从单峰的高度多峰,对称的,对称,平滑地高度不规则,以及各种可变的相互作用和不均匀程度。在本文档中,我们解释了如何通过GMPB的不同参数设置生成这些特征。还解释了GMPB的MATLAB源代码。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译