Generating Personalized Synthetic Speech for Progressive Dysarthria Using Severity-Appropriate Adaptation Strategies for Neural Text-to-Speech and Voice Conversion

使用神经文本到语音和语音转换的严重程度适当的适应策略为进行性构音障碍生成个性化合成语音

基本信息

批准号：
10525903
负责人：
Mili Kuruvilla-Dugdale
金额：
$ 22.63万
依托单位：
UNIVERSITY OF MISSOURI-COLUMBIA
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-07-01 至 2024-06-30
项目状态：
已结题

项目摘要

PROJECT SUMMARY More than 2 million Americans have a complex communication disorder that impairs their ability to talk. The loss of speech is among the most debilitating effects of neurological diseases like amyotrophic lateral sclerosis (ALS), where 95% will progressively lose their ability to speak and get trapped in a state of isolation. Communication devices with electronic voice output allow patients to augment or replace verbal communication as their speech deteriorates. The text (alphabet, messages) available on these devices is accessed directly using functioning body parts (fingers, head, eyes), and the selected text is converted to speech through text-to-speech (TTS) technology. Electronic TTS voices available on current devices have limited options in terms of age, sex, and/or dialect, which diminishes the experience of a genuine discourse because neither the user nor their communication partner can relate to the device voice. Voice is an integral part of a person’s identity and without a voice that captures this identity, users tend to withdraw from interactions, greatly reducing their quality of life, and leading to low acceptance of the technology. Personalized TTS voice options are a critical need for the ALS population in order for them to be able to communicate freely in the face of major life changes. The long-term goal of this research is software-based, high-performance personalized speech synthesis that can be used on mobile platforms and commercial speech devices by people with communication disorders. Our short-term goal is to investigate innovative methods that leverage state-of-the-art, end-to-end neural TTS, to generate intelligible, natural, and personalized synthetic speech for people who already exhibit speech loss from ALS. Neural TTS has significantly outperformed the previous generations of TTS technology, and has lowered the barrier to develop high-quality TTS systems. While it is clearly desirable to use neural TTS, the need for large quantities of high-quality speech data prohibits training such a system directly for those with ALS. We address this problem through our two specific aims in this exploratory project: (i) adapt neural TTS output by using voice conversion to personalize TTS voice options for ALS and (ii) adapt neural TTS input features and network parameters to personalize TTS voice options for ALS. Our methods for both aims will preserve TTS speech intelligibility and naturalness while enhancing voice similarity, by using modest amounts of speech data from persons with ALS. Our adapted neural TTS system is expected to generate personalized synthetic speech that has the voice characteristics of individual ALS users along with intelligibility and naturalness to promote communication and listening comfort. The project goals align with NIH-NIDCD’s priority area related to “Advancing Research in Novel Augmentative and Alternative Communication (AAC) Approaches”. The project outcomes are expected to provide a significant number of people who have communication disorders from varying etiologies (ALS, stroke, trauma) with personalized vocal expression and social identity.

项目摘要超过200万美国人患有复杂的沟通障碍，会损害他们的谈话能力。损失言语是神经疾病的最令人衰弱的作用之一 95％的人将逐渐失去他们说话的能力，并被陷入孤立状态。沟通带有电子语音输出的设备允许患者在演讲时增强或更换口头交流恶化。这些设备上可用的文本（字母，消息）可直接使用功能直接访问身体部位（手指，头部，眼睛），选定的文本通过文本到语音转换为语音（TTS）技术。当前设备上可用的电子TT声音在年龄，性别和/或方言，这会减少真正的话语的体验通信伙伴可以与设备语音有关。声音是一个人身份的组成部分，没有捕捉这种身份的声音，用户倾向于退出互动，大大降低了生活质量，并导致对技术的接受程度低。个性化的TTS语音选项是ALS的关键需求人口以使他们能够面对重大生活变化自由交流。这项研究的长期目标是基于软件的高性能个性化语音综合，可以通过沟通障碍的人在移动平台和商业语音设备上使用。我们的短期目标是调查利用最先进的端到端神经TTS的创新方法为已经表现出言语丧失的人们产生可理解，自然和个性化的合成语音 ALS。神经TTS明显优于前几代TTS技术，并降低了开发高质量TTS系统的障碍。虽然显然需要使用神经TTS，但需要大量数量的高质量语音数据禁止直接针对ALS的人进行培训。我们解决这个探索性项目中的两个具体目标通过我们的两个特定目标：（i）使用语音调整神经TTS输出转换以个性化ALS的TTS语音选项，（ii）适应神经TTS输入功能和网络个性化ALS的TTS语音选项的参数。我们两个目标的方法都将保留TTS演讲通过使用适度的语音数据来提高语音相似性的清晰度和自然性 ALS的人。我们适应的神经TTS系统有望产生个性化的合成语音，具有声音单个ALS使用者的特征以及可理解和自然性，以促进沟通和听力舒适。项目目标与NIH-NIDCD的优先领域保持一致，与“在小说中进行研究有关增强性和替代性交流（AAC）方法”。预计项目成果将提供来自不同病因的沟通障碍的大量人（ALS，Stroke，创伤）具有个性化的声音表达和社会认同。