Mismatch between training and testing data is a major error source for both automatic speech recognition (ASR) and automatic speaker identification (ASI). In this paper, we first present a statistical weighting concept to exploit the unequal sensitivity of mel-frequency cepstral coefficients (MFCC) components to against the mismatch, such as ambient noise, recording equipment, transmission channels, and inter-speaker variations. We further design a new Kullback-Leibler (KL) distance based weighting algorithm according to the proposed weighting concept to real-world problems in which the label information is often not provided. We examine our algorithm in ASR with mismatch by different speakers and also in ASI with mismatch by channel noises. Experimental results demonstrate the effectiveness and robustness of our proposed method.
训练数据和测试数据之间的不匹配是自动语音识别(ASR)和自动说话人识别(ASI)的主要误差源。在本文中,我们首先提出一种统计加权概念,以利用梅尔频率倒谱系数(MFCC)分量对诸如环境噪声、录音设备、传输通道以及说话人之间的差异等不匹配情况的不同敏感度。我们根据所提出的加权概念,针对通常不提供标签信息的实际问题,进一步设计了一种新的基于库尔贝克 - 莱布勒(KL)距离的加权算法。我们在不同说话人导致不匹配的自动语音识别以及通道噪声导致不匹配的自动说话人识别中检验了我们的算法。实验结果证明了我们所提出方法的有效性和稳健性。