CAREER: Breaking the phonetic code: novel acoustic-lexical modeling techniques for robust automatic speech recognition

职业：打破语音密码：用于鲁棒自动语音识别的新颖声学词汇建模技术

基本信息

批准号：
0643901
负责人：
Eric Fosler-Lussier
金额：
$ 50.3万
依托单位：
Ohio State University Research Foundation -DO NOT USE
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2006
资助国家：
美国
起止时间：
2006-12-15 至 2012-11-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0643901&HistoricalAwards=false
关键词：
CAREER Breaking phonetic code novel

项目摘要

Spontaneous speech, accented speech, and speech in noise continue to provide automatic speech recognition (ASR) technology with significant challenges; error rates of ASR systems are still unacceptably high for these types of speech. This project establishes a consistent framework that seeks to cope with all of these conditions. The novel approach to phonetic variability investigated here views the problem as one of phonetic information underspecification: some subset of information that the listener receives will be missing or uncertain. Lexical access is thus a phonetic code-breaking problem --- how can a system accumulate phonetic cues in each of these conditions to recognize words on the basis of incomplete evidence? The research program of this project takes a multidisciplinary approach to integrating linguistic theory with speech recognition technology; discriminative statistical models of linguistic features are employed to model nonlinear, overlapping phonological effects observed in speech. The framework allows derivation of new linguistic insights through analysis of trained systems. The educational program fosters interdisciplinary research (with cross-disciplinary graduate seminars) and increases participation of underrepresented students in Computer Science by introducing language technology topics early into the undergraduate curriculum and encouraging undergraduate research. Apart from cultivating a new way of thinking about pronunciation variation for ASR, the broader impacts of this research are to provide collaborative resources for the ASR and linguistics communities to discuss in tutorial and workshop settings. Addressing noise, accent, and speaking style in a consistent framework will also improve ASR technology for many who are underserved by current systems.

噪音中自发的言语，重音语音和语音继续为自动语音识别（ASR）技术带来重大挑战。对于这些类型的语音，ASR系统的错误率仍然很高。该项目建立了一个一致的框架，旨在应对所有这些条件。这里研究的新型语音变异方法将问题视为语音信息的指定之一：听众收到的某些信息将缺失或不确定。因此，词汇访问是一个语音代码的问题---系统如何在每个条件中积累语音提示以基于不完整的证据识别单词？该项目的研究计划采用了多学科方法，将语言理论与语音识别技术融为一体。语言特征的歧视性统计模型用于模拟语音中观察到的非线性，重叠的语音效应。该框架可以通过分析受过训练的系统来推导新的语言见解。教育计划促进了跨学科研究（包括跨学科研究生研讨会），并通过将语言技术主题引入本科课程并鼓励本科生研究来增加代表性不足的学生参与计算机科学的参与。除了培养关于ASR的发音变化的新思维方式外，这项研究的更广泛影响还旨在为ASR和语言学社区提供协作资源，以在教程和讲习班环境中讨论。在一致的框架中解决噪音，口音和口语风格也将改善许多因当前系统服务不足的人的ASR技术。