RI: Small: Collaborative Research: Automatic Creation of New Speech Sound Inventories
RI:小型:协作研究:自动创建新语音库存
基本信息
- 批准号:1909075
- 负责人:
- 金额:$ 23.92万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-07-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Speech technology is supposed to be available for everyone, but in reality, it is not. There are 7000 languages spoken in the world, but speech technology (speech-to-text recognition and text-to-speech synthesis) only works in a few hundred of them. This project will solve that problem, by automatically figuring out the set of phonemes for each new language, that is, the set of speech sounds that define differences between words (for example, "peek" versus "peck:" long-E and short-E are distinct phonemes in English). Phonemes are the link between speaking and writing. A neural net that converts speech into text using some kind of phoneme inventory, and then back again, can be said to have used the correct phoneme inventory if its resynthesized speech always has the same meaning as the speech it started with. This approach can even be tested in languages that don't have any standard written form, because the text doesn't have to be real text: it could be chat alphabet (the kind of pseudo-Roman-alphabet that speakers of Arabic and Hindi sometimes use on twitter), or it could even be a picture (showing, in an image, what the user was describing). This research will make it possible for people to talk to their artificial intelligence systems (smart speakers, smart phones, smart cars, etc.) using their native languages. This research will advance science by providing big-data tools that scientists can use to study languages that do not have a (standard) writing system.End-to-end neural network methods can be used to develop speech-to-text-to-speech (S2T2S) and other spoken language processing applications with little additional software infrastructure, and little background knowledge. In fact, toolkits provide recipes so that a researcher with no prior speech experience can train an end-to-end neural system after only a few hours of data preparation. End-to-end systems are only practical, however, for languages with thousands of hours of transcribed data. For under-resourced languages (languages with very little transcribed speech) cross-language adaptation is necessary; for unwritten languages (those lacking any standard and well-known orthographic convention), it is necessary to define a spoken language task that doesn't require writing before one can even attempt cross-language adaptation. Preliminary evidence suggests that both types of cross-language adaptation are performed more accurately if the system has available, or creates, a phoneme inventory for the under-resourced language, and leverages the phoneme inventory to facilitate adaptation. The aim of this project is to automatically infer the acoustic phoneme inventory for under-resourced and unwritten languages in order to maximize the speech technology quality of an end-to-end neural system adapted into that language. The research team has demonstrated that it is possible to visualize sub-categorical distinctions between sounds as a neural net adapts to a new phoneme category; proposed experiments 1 and 2 leverage visualizations of this type, along with other methods of phoneme inventory validation, to improve cross-language adaptation. Experiments 3 and 4 go one step further, by adapting to languages without orthography; for a speech technology system to be trained and used in a language without orthography, it must first learn a useful phoneme inventory. Innovations in this project that occur nowhere else include: (1) the use of articulatory feature transcription as a multi-task training criterion for an end-to-end neural system that seeks to learn the phoneme set of a new language, (2) the use of visualization error rate as a training criterion in multi-task learning -- this training criterion is based on a method recently developed to visualize the adaptation of phoneme categories in a neural network, (3) the application of cross-language adaptation to improve the error rates of image2speech applications in a language without orthography, (4) the use of non-standard orthography (chat alphabet) to transcribe speech in an unwritten language, and (5) the use of non-native transcription (mismatched crowdsourcing) to jump-start the speech2chat training task. The methods proposed here will facilitate the scientific study of language, for example, by helping phoneticians to document the phoneme inventories of undocumented languages, thereby expediting the study of currently undocumented endangered languages before they disappear. Conversely, in minority languages with active but shrinking native speaker populations, planned methods will help develop end-to-end neural training methods with which the native speakers can easily develop new speech applications. All planned software will be packaged as recipes for the speech recognition virtual kitchen, permitting high school students and undergraduates with no speech expertise to develop systems for their own languages, and encouraging their interest in speech.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
语音技术应该适合每个人,但实际上并非如此。 世界上有7000种语言,但是语音技术(语音到文本识别和文本到语音综合)仅在几百种中起作用。该项目将通过自动找出每种新语言的音素集,即定义单词之间差异的语音集(例如,“ peek” ves'peck:“ longe and short-e在英语中是独特的音素),可以解决这个问题。 音素是口语和写作之间的联系。 可以使用某种音素清单将语音转换为文本的神经网,然后再次返回,如果其重新合成的语音始终具有与启动的语音相同的含义,则可以说已经使用了正确的音素清单。 甚至可以用没有任何标准书面形式的语言对这种方法进行测试,因为文本不必是真实的文本:可以是聊天字母(这种伪 - 罗马 - 阿拉伯语的伪 - 罗马 - 阿拉伯语扬声器有时在Twitter上使用),或者它甚至可以是图片(用户在图像中显示的图片,在图像中显示))。 这项研究将使人们可以使用其母语与他们的人工智能系统(智能扬声器,智能手机,智能汽车等)进行交谈。 这项研究将通过提供科学家可以用来研究没有(标准)写作系统的语言来推动科学。端到端的神经网络方法可用于开发语音到文本到文本到语音(S2T2S)和其他口头语言处理应用程序以及很少的其他软件基础架构和几乎没有背景知识。实际上,工具包提供食谱,因此只有几个小时的数据准备后,没有先前语音经验的研究人员就可以培训端到端的神经系统。但是,对于具有数千小时的转录数据的语言,端到端系统仅是实用的。对于资源不足的语言(语言很少的语言)是必需的跨语言适应性;对于不成文的语言(那些缺乏任何标准和著名的拼字法定的语言),有必要定义口语任务,该任务甚至不需要写作,甚至可以尝试跨语言适应。初步证据表明,如果系统可用或创建用于资源不足的语言的音素库存,则两种类型的跨语言适应性都可以更准确地执行,并利用音素库存来促进适应。该项目的目的是自动推断出资源不足和未成文的语言的声音音素清单,以最大程度地提高适应该语言的端到端神经系统的语音技术质量。研究小组表明,可以将声音之间的次级分类区别可视化,因为神经网适应了新的音素类别。提出的实验1和2利用这种类型的可视化以及其他音素库存验证的方法,以改善跨语言适应性。实验3和4通过适应没有拼字法的语言,然后再进行一步。为了使语音技术系统接受培训和使用,不含拼字法的语言,必须首先学习有用的音素清单。该项目中无处可发生的该项目的创新包括:(1)使用台式功能转录作为多任务训练标准,用于端到端神经系统,试图学习新语言的音素集,(2)使用可视化误差率作为在多任务学习中的训练标准的使用 - 该培训标准是基于该培训标准的应用程序,该培训标准是基于一个练习的应用程序,(3)是基于一个方法的应用程序,(3)用于可视化的范围,(3)可视化的应用程序,(3)跨语言适应以提高不具有拼字法的语言的Image2speech应用程序的错误率,(4)使用非标准拼字法(CHAT字母)用不成文的语言转录语音,以及(5)使用非本地转录(不匹配的人群)来开始演讲2Chat训练任务。 这里提出的方法将促进语言科学研究,例如,通过帮助语音家记录无证语言的音素发明,从而加快了当前无证件的濒危语言的研究,然后才能消失。相反,在有活跃但缩小母语人群的少数族裔语言中,计划的方法将有助于开发端到端的神经训练方法,以这些方式以这些方法可以轻松地开发新的语音应用。所有计划中的软件将被包装为语音识别虚拟厨房的食谱,允许没有语音专业知识的高中生和本科生为自己的语言开发系统,并鼓励他们对语音的兴趣。该奖项反映了NSF的法定任务,并通过评估该基金会的知识分子功能和广泛的影响来审查Criteria。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Najim Dehak其他文献
Deep Stroop: Integrating eye tracking and speech processing to characterize people with neurodegenerative disorders while performing neuropsychological tests
- DOI:
10.1016/j.compbiomed.2024.109398 - 发表时间:
2025-01-01 - 期刊:
- 影响因子:
- 作者:
Trevor Meyer;Anna Favaro;Esther S. Oh;Ankur Butala;Chelsie Motley;Pedro Irazoqui;Najim Dehak;Laureano Moro-Velázquez - 通讯作者:
Laureano Moro-Velázquez
Najim Dehak的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Najim Dehak', 18)}}的其他基金
RI:Small: Nonlinear signal representations for speech applications
RI:Small:语音应用的非线性信号表示
- 批准号:
1816165 - 财政年份:2018
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant
相似国自然基金
基于超宽频技术的小微型无人系统集群协作关键技术研究与应用
- 批准号:
- 批准年份:2020
- 资助金额:57 万元
- 项目类别:面上项目
异构云小蜂窝网络中基于协作预编码的干扰协调技术研究
- 批准号:61661005
- 批准年份:2016
- 资助金额:30.0 万元
- 项目类别:地区科学基金项目
密集小基站系统中的新型接入理论与技术研究
- 批准号:61301143
- 批准年份:2013
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
ScFVCD3-9R负载Bcl-6靶向小干扰RNA治疗EAMG的试验研究
- 批准号:81072465
- 批准年份:2010
- 资助金额:31.0 万元
- 项目类别:面上项目
基于小世界网络的传感器网络研究
- 批准号:60472059
- 批准年份:2004
- 资助金额:21.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232054 - 财政年份:2023
- 资助金额:
$ 23.92万 - 项目类别:
Standard Grant