融合言语产生系统发音信息和中层鉴别性表征的说话人识别与语种识别

项目介绍

AI项目解读

基本信息

批准号：
61401524
项目类别：
青年科学基金项目
资助金额：
28.0万
负责人：
李明
依托单位：
中山大学
学科分类：
F0117.多媒体信息处理
结题年份：
2017
批准年份：
2014
项目状态：
已结题
起止时间：
2015-01-01 至2017-12-31

项目参与者：
许银亮；禹之鼎；曾宇钿；谭宇泉；
关键词：
说话人识别中层鉴别性表征语音产生语种识别

项目摘要

(限3000 Characters): Speaker recognition and language recognition continue to attract attention and remain as hot research topics in speech processing. Conventional methods mainly focus on information from phonetic, acoustic, and prosodic these three levels. In this project, we plan to use an electromagnetic articulography device to collect a large scale real-time articulatory trajectory database from multiple speakers. Although this database is focusing on Chinese, its content do cover multiple languages and Chinese region dialects. We tend to use this database to study the variabilities caused by different speakers and languages in the articulatory space. We also plan to study the speaker independent acoustic-to-articulatory inversion technique based on multiple exemplar speakers and languages to estimate the articulatory feature for any telephone or microphone testing speech data. These estimated articulatory features could be used with acoustic features together to improve the system performance. This project also tend to apply the latest ideas of mid-level discriminative patches from the image based scene classification domain to language recognition and speaker recognition tasks in order to improve the system performance. We are going to study the mid-level discriminative tokenization framework on speech data, redefine and study the definition of mid-level units, segmentation, discriminative learning of tokens, representation and the backend classifiers on speech. This project not only can provide two new ways to perform speaker and language recognition but also bring new ideas on the speech production and paralinguistic auditory perception which is important and valuable from both the theoretical and practical point of view.

(限400字)：说话人识别与语种识别一直是语音信息处理领域的研究热点。传统方法主要是利用音素层，声学层和韵律层的信息。本文旨在从言语产生系统发音信息和中层鉴别性表征这两个方向展开说话人识别与语种识别的研究。本项目拟利用电磁发音仪采集一个以汉语为主体的较大规模的函盖多个语种或方言多个说话人的言语产生系统发音轨迹数据库，并以此为基础研究不同说话人和语种在发音层上的差异并提出新特征。研究基于多个参考说话人及语种的语音到发音逆求解方法来估计普通信道下的发音特征以用来提高识别性能。本项目还拟把图像场景分析中的最新研究热点中层鉴别性图像块思想用于语种识别和说话人识别上以提高系统综合性能。我们将研究中层鉴别性表征在语音上的中层定义，分段，代表单元学习，表征，后端分类等一系列核心问题。本项目不仅为说话人和语种识别提供两种新途径，也为言语产生和副语言信息听觉认知模型带来新观点，具有重要的理论意义与实际价值。

结项摘要

说话人识别与语种识别一直是语音信息处理领域的研究热点。传统的方法主要利用音素层，声学层和韵律层的信息。本文旨在围绕言语产生系统发音信息和中层鉴别性表征两个方向展开说话人识别与语种识别的研究。本项目利用电磁发音仪采集了一个以汉语为主体的较大规模的函盖多个语种或方言多个说话人的言语产生系统发音轨迹数据库，并研究通过融合语音到发音逆求解特征用以提高说话人识别系统的性能。本项目还研究了中层鉴别性表征在声纹识别和语种识别中的作用，并提出了广义全空间差异因子分析算法以及在特征层融合音素层后验概率信息的算法，在NIST2010和LRE07上均取得了显著的性能提高。本项目还在传统概率线性鉴别分析（PLDA）算法的基础上，提出了多种改性方法。本研究提出的一些算法也被运用于其他副语言语音属性识别，语音转换，变声攻击检测等任务上，提高了系统的性能。本项目不仅为说话人和语种识别提供两种新途径，也为言语产生和副语言语音属性识别带来新观点，具有重要的理论意义与实际价值。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（1）

会议论文数量（15）

专利数量（0）

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

具有语音标记和串联特征的广义 I 向量表示，用于文本独立和文本相关的说话者验证

DOI：
10.1007/s11265-015-1019-z
发表时间：
2015-07
期刊：
Journal of Signal Processing Systems for Signal Image and Video Technology
影响因子：
1.8
作者：
Li Ming;Liu Lun;Cai Weicheng;Liu Wenbo
通讯作者：
Liu Wenbo

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.

基于语音声学和反发音信号融合的说话人验证

DOI：
10.1016/j.csl.2015.05.003
发表时间：
2016-03
期刊：
Computer speech & language
影响因子：
4.3
作者：
Li M;Kim J;Lammert A;Ghosh PK;Ramanarayanan V;Narayanan S
通讯作者：
Narayanan S

Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning Framework

根据面部处理异常识别自闭症谱系障碍儿童：机器学习框架

DOI：
10.1002/aur.1615
发表时间：
2016-08-01
期刊：
AUTISM RESEARCH
影响因子：
4.7
作者：
Liu, Wenbo;Li, Ming;Yi, Li
通讯作者：
Yi, Li

Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing

通过随机二元正交矩阵投影哈希的可取消语音模板

DOI：
10.1016/j.patcog.2017.10.041
发表时间：
2018-04
期刊：
Pattern Recognition
影响因子：
8
作者：
Kong-Yik Chee;Zhe Jin;Danwei Cai;Ming Li;Wun-She Yap;Yen-Lung Lai;Bok-Min Goi
通讯作者：
Bok-Min Goi