Next-Generation Expressive Personalized Voices for Speech-Generating Devices

用于语音生成设备的下一代富有表现力的个性化声音

基本信息

批准号：
10547241
负责人：
H TIMOTHY Bunnell
金额：
$ 27.58万
依托单位：
SYNFONICA, LLC
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-08-15 至 2024-08-14
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10547241
关键词：
ALS patients Adoption Adult Age Algorithms Amyotrophic Lateral Sclerosis Augmentative and Alternative Communication Characteristics Child Child Health Client Depressed mood Disease Dysarthria Emotions Encapsulated Evaluation Female Generations Goals Government Human Hybrids Individual Knowledge Laboratory Research Learning Linguistics Machine Learning Methods Modeling Network-based Neurodegenerative Disorders Onset of illness Outcome Output Persons Phase Process Production Reading Records Rehabilitation therapy Risk Running Services Speech Structure Surveys System Technology Text Training Voice Voice Quality base commercial application communication device deep neural network design experience experimental study improved knowledge base machine learning algorithm male mimetics next generation novel sound success virtual vocal tract

项目摘要

Project Summary/Abstract The creation of personalized synthetic voices has wide application in medical/rehabilitation settings for pa- tients who rely on a speech-generating device (SGD) for communication. One common application is voice banking, wherein a person who risks losing their voice, such as somebody with a neurodegenerative disease like Amyotrophic Lateral Sclerosis (ALS), records their own speech before the onset of disease-related dysar- thria for later use in an SGD that mimics their natural speech characteristics. While the technology underlying the creation of such personalized synthetic voices is growing in maturity and adoption by SGD users, it still suf- fers from two primary limitations: a lack of expressiveness and a burdensome amount of recording needed to create highly natural-sounding voices. The proposed project aims to remedy this situation by marrying the ma- chine-learning technology behind ModelTalker, a pioneering voice-banking text-to-speech service developed at Nemours Children’s Health, with the knowledge-based technology underlying Synfony, a rule-based text-to- speech system developed by Synfonica LLC, which is capable of generating a variety of speech styles and ex- pressive modes. The expert knowledge built into Synfonica will be used to design an optimal set of sentences for voice bankers to record, and its algorithms for the generation of natural-sounding prosody in different modes and styles will be integrated into ModelTalker’s machine-learning algorithms, creating a hybrid system that embraces the best qualities of both approaches. The new text-to-speech (TTS) system resulting from this project will (a) require a minimal amount of recorded speech from the voice banker, (b) accurately capture their vocal identity, and (c) be structured such that new expressive modes and speech styles can be added easily without additional recording. The feasibility of the project will be demonstrated by recording the voices of an adult male, an adult female, and a child, and generating TTS voices that can speak in three expressive modes (neutral, happy, and sad). Perceptual experiments will be run to evaluate their intelligibility, naturalness, suc- cess in capturing the vocal identity of the speaker, and the appropriateness of their expressive modes. In gen- eral, the project will be a major step forward in enabling the users of personalized synthetic voices to express their emotions and intentions.

项目摘要/摘要个性化合成声音的创建在医疗/康复环境中广泛应用依靠语音生成设备（SGD）进行通信的t。一个常见的应用是语音银行业务，其中一个冒着失去声音的人，例如患有神经退行性疾病的人像肌萎缩性侧索硬化症（ALS）一样，在与疾病相关的dysar-发作之前记录了自己的语音供以后使用的SGD使用，该SGD模仿其自然语音特征。而底层技术这种个性化的合成声音的创建在成熟和被SGD用户的采用中增长，它仍然可以从两个主要局限创建高度自然的声音。拟议的项目旨在通过嫁给Ma-来记住这种情况 ModelTalker背后的Chine学习技术，ModelTalker，这是一款开发的语音言论语音到语音服务 Nemours儿童健康，具有基于知识的技术的基础技术，这是一种基于规则的文本到基础的文本 Synfonica LLC开发的语音系统，该系统能够产生各种语音风格和extress 压力模式。 Synfonica内置的专家知识将用于设计最佳句子集供语音银行家记录及其在不同模式和样式将集成到ModelTalker的机器学习算法中，创建一个混合系统这具有两种方法的最佳品质。由此产生的新文本到语音（TTS）系统项目将（a）要求语音银行家的记录量最小，（b）准确捕获它们的声音身份以及（c）的结构使得可以轻松添加新的表达方式和语音样式没有其他记录。该项目的可行性将通过记录一个成年男性，一个成年女性和一个孩子，并产生可以以三种表达模式说话的TTS声音（中立，快乐和悲伤）。将进行感知实验，以评估其智力，自然性，成功在捕捉说话者的声音身份以及其表达方式的适当性时。在一般错误，该项目将是使个性化合成声音的用户表达的主要一步他们的情绪和意图。