A study on content summarization for large spoken documents and content retrieval through spoken dialogue

大型口语文档内容摘要及口语对话内容检索研究

基本信息

批准号：
13480095
负责人：
NAKAGAWA Seiichi
金额：
$ 9.47万
依托单位：
Toyohashi University of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2001
资助国家：
日本
起止时间：
2001 至 2004
项目状态：
已结题

项目摘要

To develop an accurate large vocabulary continuous speech recognition system for spoken document retrieval in open domain, we proposed a search method using two search algorithms in parallel to achieve efficient and accurate decoding. We evaluated this new search algorithm and obtained significant improvement of recognition performance without severe increase of computational cost We also proposed to apply machine learning techniques to the task of combining outputs of multiple LVCSR models. The proposed technique had advantages over that by voting schemes such as ROVER, especially when the majority of participating models are not reliable. By using this technique, we performed a speech-driven Web retrieval task and improved speech recognition accuracy of spoken queries and then improved retrieval accuracy in speech driven Web retrieval We tried the summarization of spoken lectures. For this purpose, we investigated relations between linguistic surface information and human's results, and we obtained useful surface linguistic information. Next, we summarized spoken lectures based on this information, and compared them with human's results. As a result, we obtained a better F-measure and k-value comparable with human's results. We have developed a portable speech recognition module and an interpreter module in a spoken dialogue system. Furthermore, we also developed a dialogue strategy design tool, applied it to Mt.Fuji sightseeing guidance retrieval, literature retrieval and hotel reservation retrieval and then confirmed the usefulness.

为了为开放域中的口语检索开发准确的大型词汇连续语音识别系统，我们使用两种并行的搜索算法提出了一种搜索方法，以实现有效而准确的解码。我们评估了这种新的搜索算法，并获得了识别性能的显着改善，而没有严重增加计算成本，我们还建议将机器学习技术应用于结合多个LVCSR模型的输出的任务。提出的技术通过投票方案（例如Rover）具有优势，尤其是当大多数参与模型不可靠时。通过使用此技术，我们执行了语音驱动的Web检索任务，并提高了口语查询的语音识别精度，然后提高了语音驱动的Web检索的检索准确性，我们尝试了口语讲座的汇总。为此，我们研究了语言表面信息与人类的结果之间的关系，并获得了有用的表面语言信息。接下来，我们根据这些信息概括了口语讲座，并将其与人类的结果进行了比较。结果，我们获得了与人类的结果相当的更好的F量和K值。我们已经在口语对话系统中开发了一种便携式语音识别模块和解释器模块。此外，我们还开发了一种对话策略设计工具，将其应用于Mt.Fuji观光指导检索，文献检索和酒店预订检索中，然后确认了有用性。