World Englishes” indicates well one aspect of the current state of English as an international language, which claims that there is no need to standardize the language of English and we have to accept its diversity. The purpose of this study is to propose a technique to realize individual-basis pronunciation clustering in order to inform each user of how his/her pronunciation is located in the diversity of World Englishes pronunciations. In this paper, we tried to predict inter-speaker pronunciation distances only from their speech signals. At first, some experiments were conducted to investigate how suitable the proposed definition of pronunciation distances is for World Englishes clustering. Next, for automatic prediction of distances, we introduced absolute features which are derived by direct comparison between corresponding phoneme HMMs of a speaker pair. By increasing temporal and spectral resolutions, HMM-supervector was derived as one kind of the absolute features. Experiments showed that these features were very effective for prediction and we can expect that the structural features, which we proposed in our previous studies, will bring a much better performance by increasing their resolutions.
“世界英语”很好地体现了英语作为一种国际语言的现状的一个方面,它主张没有必要对英语语言进行标准化,我们必须接受其多样性。本研究的目的是提出一种实现基于个体的发音聚类技术,以便告知每个使用者他/她的发音在世界英语发音多样性中所处的位置。在本文中,我们试图仅从语音信号预测说话人之间的发音距离。首先,进行了一些实验来研究所提出的发音距离定义对于世界英语聚类的适用性。接下来,为了自动预测距离,我们引入了绝对特征,这些特征是通过对一对说话人的相应音素隐马尔可夫模型(HMM)进行直接比较得出的。通过提高时间和频谱分辨率,得到了HMM超向量作为一种绝对特征。实验表明,这些特征对于预测非常有效,并且我们可以预期,我们在先前研究中提出的结构特征通过提高其分辨率将带来更好的性能。