Large Vocabulary Continuous Speech Recognition System on Japanese Newspaper Reading Task

日语报纸阅读任务的大词汇量连续语音识别系统

基本信息

批准号：
10680368
负责人：
KOHDA Masaki
金额：
$ 2.11万
依托单位：
Yamagata University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1998
资助国家：
日本
起止时间：
1998 至 2000
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-10680368/
关键词：
Large Vocabulary Continuous Speech Recognition Acoustic Model Language Model Decoder Hidden Markov Net N-gram Speaker Adaptation Task Adaptation クラスN-gram パープレキシティ単語誤り率エルゴディックHMM マルチパスサーチ音素グラフ単語グラフ HM-Net SCFG MLLR話者適

项目摘要

We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results.(1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed.(2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency gramma … More r instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram.(3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting.(4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. Less

我们研究了日本报纸阅读任务上的大型词汇连续语音识别（LVCSR）系统，并获得了以下结果。（1）声学模型：隐藏的马尔可夫网络（HM-NET）是一种高度准确且可靠的声学模型，代表了上下文依赖的隐藏马克夫模型作为网络的匹配状态结构。我们提出了一种基于状态聚类的快速拓扑设计方法，以生成LVCSR的高精度HM Net。（TI文本）和特定任务的小语料库（AD文本），以及员工简单的权重，可以混合Ti和Ad文本。我们提出了一个新的SCFG（随机上下文无语法）模型，该模型使用基于短语的依赖性gramma…更多而不是一般的CFG.单词错误率在使用拟议的SCFG模型和Trigram上使用的混合模型的情况下，单词错误率少于仅使用Trigram的情况。假设限制，有效地预处了搜索空间。在提出的方法中，在预处理阶段生成了音素图，然后在主要识别阶段使用音素图的信息限制假设的扩展时搜索了最佳单词序列。在使用Word Graph作为中间数据结构的多个Pass LVCSR系统中，应优化解码器参数以生成一个好的单词图。提出了一种优化这些参数的新方法。该方法使用使用BigRam LM的重新分组来重新计算，而不是为每个参数设置生成许多单词图。（4）软件工具：我们描述了一个用于单词和基于类的N-gram的统计语言模型工具包。该工具包具有与CMU-Cambridge SLM工具包的命令级兼容性，并支持N-gram和N-gram计数混合物以及使用线性插值的组合语言模型。较少的