A scheme for continuous speech recognition in a large context based on the human process of spoken language recognition

基于人类口语识别过程的大上下文连续语音识别方案

基本信息

批准号：
03452164
负责人：
FUJISAKI Hiroya
金额：
$ 4.48万
依托单位：
Science University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for General Scientific Research (B)
财政年份：
1991
资助国家：
日本
起止时间：
1991 至 1992
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-03452164/
关键词：
Spoken Language Human Processes of Recognition Large Context Continuous Speech Speech Recognition System Syntactic Information Semantic Information Discourse Information 認識過程人間内部辞書辞書検索

项目摘要

Most of the current systems for automatic speech recognition fail to achieve recognition performance comparable to human listeners, since they are constructed without paying attention to the human processes of spoken language recognition. From this point of view, the present study investigates the human processes and incorporates the findings into a scheme for automatic recognition of continuous speech in a large context. The followings are the main results:1. Experimental investigation and modeling of the human processes of spoken language recognitionUsing as stimuli natural utterances with controlled acoustic, syntactic and semantic information, the following findings were obtained on the human processes of spoken language recognition.(1) The unit of speech recognition varies widely from phones and syllables to words and phrases depending on the experimental condition and context.(2) Larger units generally require less accuracy of representation for correct recognition.(3) The amount … More of acoustic information necessary for recognition of a given unit varies widely depending on the size of context and prior knowledge on the part of the listener.(4) The accuracy and speed of access to mental lexicon varies dynamically depending on the acoustic, syntactic, semantic and discourse information available to the listener.Based on these findings, a model has been constructed for the human processes of spoken language recognition.2. Proposal and implementation of a scheme for automatic recognition of spoken language recognitionBased upon the above findings and the model, a scheme for automatic recognition of continuous speech in a large context has been proposed, featuring (1) use of multiple size units and accuracy of acoustic feature representation, (2) use of prosodic features for word and phrase boundary detection, (3) extraction of syntactic, sematic, and idiosyncratic information from a large context. The main components of the system have been implemented.3. Demonstration of the validity of the proposed schemeThe proposed scheme has been tested by recognition experiments of phones, syllables and words in continuous speech with a large context, and the results have confirmed the essential validity and feasibility of the proposed scheme. Less

目前大多数自动语音识别系统都无法达到与人类听众相当的识别性能，因为它们是在没有关注人类口语识别过程的情况下构建的。从这个角度来看，本研究调查了人类过程并结合了它们。在大范围内自动识别连续语音的方案的主要结果如下：1.对人类口语识别过程的实验研究和建模，使用受控的声学、句法和语义信息作为刺激自然话语。以下调查结果是(1) 语音识别的单位根据实验条件和上下文的不同，从音素和音节到单词和短语都有很大差异。(2) 较大的单位通常需要较低的表示准确度才能正确识别(3) 识别给定单元所需的声学信息量根据上下文的大小和听者的先验知识而变化很大。(4) 访问心理词典的准确性和速度动态变化取决于声学，听者可获得的句法、语义和话语信息。基于这些发现，为口语识别的人类过程构建了一个模型。2．提出并实施了口语识别自动识别方案。该模型提出了一种在大上下文中自动识别连续语音的方案，其特点是（1）使用多个尺寸单元和声学特征表示的准确性，（2）使用韵律特征进行单词和短语边界检测，（ 3) 提取系统的主要组成部分已经实现，该方案的有效性通过连续语音中的音素、音节和单词的识别实验进行了测试。大背景下，结果证实了所提出方案的有效性和可行性。