RI: Medium: Collaborative Research: Explicit Articulatory Models of Spoken Language, with Application to Automatic Speech Recognition

RI：媒介：协作研究：口语显式发音模型及其在自动语音识别中的应用

基本信息

批准号：
0905341
负责人：
Jeffrey Bilmes
金额：
$ 37.8万
依托单位：
University of Washington
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2009
资助国家：
美国
起止时间：
2009-07-01 至 2013-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0905341&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Explicit

项目摘要

Proposal Title: RI: Medium: Collaborative Research: Explicit Articulatory Models ofSpoken Language, with Application to Automatic SpeechRecognitionInstitution: Toyota Technological Institute at ChicagoAbstract Date: 05/22/09This award is funded under the American Recovery and Reinvestment Act of 2009(Public Law 111-5).One of the main challenges in automatic speech recognition is variability in speakingstyle, including speaking rate changes and coarticulation. Models of the articulators(such as the lips and tongue) can succinctly represent much of this variability. Mostprevious work on articulatory models has focused on the relationship between acousticsand articulation, but more significant improvements require models of the hiddenarticulatory state structure. This work has both a technological goal of improvingrecognition and a scientific goal of better understanding articulatory phenomena.The project considers larger model classes than previously studied. In particular, theproject develops graphical models, including dynamic Bayesian networks andconditional random fields, designed to take advantage of articulatory knowledge. A newframework for hybrid directed and undirected graphical models is being developed, inrecognition of the benefits of both directed and undirected models, and of bothgenerative and discriminative training. The project activities include major extension ofearlier articulatory models with context modeling, asynchrony structures, andspecialized training; development of factored conditional random field models ofarticulatory variables; and discriminative training to alleviate word confusability.The scientific goal addresses questions about the ways in which articulatory trajectoriesvary in different contexts. Existing databases are used, and initial work in manualarticulatory annotation is being extended. In addition, the project uses articulatorymodels to perform forced transcription of larger data sets, providing an additionalresource for the research community. Other broad impacts include new models andtechniques with applicability to other time-series modeling problems. Extending theapplicability of speech recognition will help it fulfill its promise of enabling more efficientstorage of and access to spoken information, and equalizing the technological playingfield for those with hearing or motor disabilities.NATIONAL SCIENCE FOUNDATIONProposal AbstractProposal:0905633 PI Name:Livescu, KarenPrinted from eJacket: 06/10/09 Page 1 of 1

提案标题：RI：媒介：协作研究：口语显式发音模型，并应用于自动语音识别机构：芝加哥丰田技术研究所摘要日期：2009 年 5 月 22 日该奖项由《2009 年美国复苏和再投资法案》（公法 111）资助-5).自动语音识别的主要挑战之一是说话风格的可变性，包括说话速率变化和协同发音。咬合器（例如嘴唇和舌头）的模型可以简洁地代表这种变化的大部分。以前关于发音模型的大多数工作都集中在声学和发音之间的关系上，但更重要的改进需要隐藏的发音状态结构的模型。这项工作既有提高识别能力的技术目标，也有更好地理解发音现象的科学目标。该项目考虑了比之前研究的更大的模型类。特别是，该项目开发了图形模型，包括动态贝叶斯网络和条件随机场，旨在利用发音知识。认识到有向模型和无向模型以及生成训练和判别训练的优点，正在开发混合有向和无向图形模型的新框架。项目活动包括通过上下文建模、异步结构和专业培训对早期发音模型进行重大扩展；关节变量因子条件随机场模型的开发；科学目标解决了不同语境下发音轨迹如何变化的问题。使用现有数据库，并且正在扩展手动发音注释的初始工作。此外，该项目使用发音模型对更大的数据集进行强制转录，为研究社区提供额外的资源。其他广泛的影响包括适用于其他时间序列建模问题的新模型和技术。扩展语音识别的适用性将有助于其实现其承诺，即更有效地存储和访问语音信息，并为听力或运动障碍人士提供平等的技术竞争环境。国家科学基金会提案摘要提案：0905633 PI 姓名：Livescu, Karen 印自 eJacket： 06/10/09 第 1 页，共 1 页