Accent plays a significant role in speech communication, influencing understanding capabilities and also conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's speech that is converted to any desired target accent. Our thorough experiments validate the effectiveness of our proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the ability to manipulate accents in the synthesized speech and provide a promising avenue for future accented TTS research.
translated by 谷歌翻译
在本文中,我们介绍了联合主义者,这是一种能够感知的多仪器框架,能够转录,识别和识别和将多种乐器与音频剪辑分开。联合主义者由调节其他模块的仪器识别模块组成:输出仪器特异性钢琴卷的转录模块以及利用仪器信息和转录结果的源分离模块。仪器条件设计用于明确的多仪器功能,而转录和源分离模块之间的连接是为了更好地转录性能。我们具有挑战性的问题表述使该模型在现实世界中非常有用,因为现代流行音乐通常由多种乐器组成。但是,它的新颖性需要关于如何评估这种模型的新观点。在实验过程中,我们从各个方面评估了模型,为多仪器转录提供了新的评估观点。我们还认为,转录模型可以用作其他音乐分析任务的预处理模块。在几个下游任务的实验中,我们的转录模型提供的符号表示有助于解决降低检测,和弦识别和关键估计的频谱图。
translated by 谷歌翻译
根据1,870家公司的Rackspace技术的最近调查,总共34%的AI研究和开发项目失败或被遗弃。我们提出了一项新的战略框架,Aistrom,使管理者基于彻底的文献综述,创建一个成功的AI战略。这提供了一种独特而综合的方法,可以通过实施过程中的各种挑战引导经理和牵头开发人员。在Aistrom框架中,我们首先识别顶部N潜在项目(通常为3-5)。对于每个人,彻底分析了七个重点区域。这些领域包括创建一个数据策略,以考虑独特的跨部门机器学习数据要求,安全性和法律要求。然后,Aistrom指导经理思考如何鉴于AI人才稀缺的跨学科人工智能(AI)实施团队。一旦建立了AI团队战略,它需要在组织内,跨部门或作为单独的部门定位。其他考虑因素包括AI作为服务(AIAAS)或外包开发。看着新技术,我们必须考虑偏见,黑匣子模型的合法性等挑战,并保持循环中的人类。接下来,与任何项目一样,我们需要基于价值的关键性能指标(KPI)来跟踪和验证进度。根据公司的风险策略,SWOT分析(优势,劣势,机会和威胁)可以帮助进一步分类入住项目。最后,我们应该确保我们的战略包括持续的雇员的持续教育,以实现采用文化。这种独特综合的框架提供了有价值的,经理和铅开发商的工具。
translated by 谷歌翻译
Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modeling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modeling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations.
translated by 谷歌翻译