Robust Syllable Recognition in the Acousic-Waveform Domain

声音波形域中的鲁棒音节识别

基本信息

批准号：
EP/D053005/1
负责人：
Zoran Cvetkovic
金额：
$ 26.44万
依托单位：
King's College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2006
资助国家：
英国
起止时间：
2006 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FD053005%2F1
关键词：
Robust Syllable Recognition Acousic Waveform

项目摘要

This proposal is concerned with robust classification/recognition of speech units (phonemes and consonant-vowel syllables) in the domain of acoustic waveforms. The motivation for this research comes from the idea that speech units should be much better separated in the high-dimensional spaces formed by acoustic waveforms than in the smaller representation spaces which are used in state-of-the-art speech recognition systems and which involve significant compression and dimension reduction. Hence, recognition/classification in the acoustic waveform domain should exhibit a higher level of robustness to additive noise than classification in low-dimensional feature spaces.In the first phase of the project we will investigate classification of speech units in the acoustic waveform domain under severe noise conditions, around 0dB signal-to-noise ratio and below, while in the second phase we will study techniques which would make classification robust also to linear filtering. The particular tasks that will be tackled in the first phase can be summarized as follows:1. Study the detailed structure of the sets of acoustic waveforms of individual speech units; in particular their intrinsic dimensions, and the existence of possible nonlinear surfaces on which the data are concentrated.2. Guided by the findings from item 1 above, estimate statistical models of the distribution of speech units in the acoustic waveform domain. We will then design and systematically assess so-called generative classifiers, whose defining property is that they are based on such statistical models.3. Investigate classification of speech units in the acoustic waveform domain using discriminative classification techniques (artificial neural networks, support vector machines, and relevance vector machines). These can be a useful alternative to generative techniques because they focus directly on the classification problem without building explicit models of waveform distributions for each speech unit.4. Construct classifiers by grouping speech units hierarchically. Top-level classifiers will be constructed to distinguish between a small of groups of similar speech units, followed by classifiers separating groups into subgroups and so on. Different methods for defining subgroups will be explored, including confusion matrices of the classifiers from item 3, appropriate distance measures between the statistical models obtained in item 2, and possibly perceptual experiments.A potential argument against our approach is that classification in the acoustic waveform domain will break down in the presence of linear filtering. However, this can be avoided by considering narrow-band signals: for these, the effect of linear filtering is approximately equivalent to amplitude scaling and time delay. In the second phase of the project, we will therefore consider speech classification using narrow-band components of acoustic waveforms. For classification of signals in individual sub-bands, the techniques investigated in the first phase of the project will be considered. A new issue is then how to combine the results of sub-band classifiers to minimize the overall classification error. Here recently developed machine learning techniques will be used, as specified in the case for support.As explained, individual sub-band classifiers should be robust to linear filtering because the latter does not significantly alter the shape of narrow-band signals. On the other hand, the dimension of the spaces of sub-band waveforms will be still high enough to facilitate classification robust to additive noise. Hence, the overall scheme is expected to be robust to both additive noise and linear fitering.

该建议与声波形域中的语音单元（音素和辅音元音音节）的鲁棒分类/识别有关。这项研究的动机来自于以下观点：语音单元在由声波形成的高维空间中要比在最先进的语音识别系统中使用的较小的表示空间中更好地分开，并且涉及大量的压缩和降低尺寸。因此，与低维特征空间中的分类相比，声波形域中的识别/分类应表现出更高的鲁棒性。将在第一阶段解决的特定任务总结如下：1。研究单个语音单元的声波形集合的详细结构；特别是它们的内在维度，以及数据集中的可能的非线性表面的存在。2。在上面项目1的发现的指导下，估计声波形域中语音单元分布的统计模型。然后，我们将设计并系统地评估所谓的生成分类器，其定义属性是它们基于此类统计模型3。使用判别性分类技术（人工神经网络，支持向量机和相关向量机）研究声波形域中语音单元的分类。这些可以是生成技术的有用替代方法，因为它们直接关注分类问题，而无需为每个语音单位构建波形分布的明确模型。4。通过按层次进行分组来构建分类器。将构建顶级分类器，以区分一组相似的语音单元，然后分类器将组分为子组等等。将探索定义子组的不同方法，包括第3项分类器的混淆矩阵，项目2中获得的统计模型之间的适当距离度量，以及可能的感知实验。反对我们的方法的潜在论点是，在线性过滤的存在下，声学波形中的分类将分类。但是，可以通过考虑窄带信号来避免这种情况：对于这些信号，线性滤波的效果大致等于振幅缩放和时间延迟。因此，在项目的第二阶段中，我们将使用声波形的窄带分量来考虑语音分类。对于单个子带中的信号分类，将考虑项目第一阶段研究的技术。然后，一个新的问题是如何结合子频段分类器的结果以最大程度地减少整体分类错误。如在支持案例中所指定的那样，最近开发的机器学习技术将被使用。正如解释的那样，单个子频段分类器应对线性过滤具有鲁棒性，因为后者并没有显着改变窄带信号的形状。另一方面，子频段波形空间的尺寸仍然足够高，可以促进分类鲁棒至添加噪声。因此，预计整体方案对加性噪声和线性效果都具有鲁棒性。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Combined Features and Kernel Design for Noise Robust Phoneme Classification Using Support Vector Machines

使用支持向量机进行噪声稳健音素分类的组合特征和内核设计

DOI：
10.1109/tasl.2010.2090657
发表时间：
2011
期刊：
IEEE Transactions on Audio, Speech, and Language Processing
影响因子：
0
作者：
Yousafzai J
通讯作者：
Yousafzai J

Towards robust phoneme classification: Augmentation of PLP models with acoustic waveforms

迈向稳健的音素分类：用声学波形增强 PLP 模型

DOI：
发表时间：
2008
期刊：
European Signal Processing Conference
影响因子：
0
作者：
Ager M.
通讯作者：
Ager M.

Tuning support vector machines for robust phoneme classification with acoustic waveforms

调整支持向量机以利用声学波形进行稳健的音素分类

DOI：
发表时间：
2009
期刊：
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
影响因子：
0
作者：
Yousafzai J.
通讯作者：
Yousafzai J.

Robust phoneme classification: exploiting the adaptability of acoustic waveform models

鲁棒音素分类：利用声学波形模型的适应性

DOI：
发表时间：
期刊：
European Signal Processing Conference, EUSIPCO 2009
影响因子：
0
作者：
Matthew Ager (Author)
通讯作者：
Matthew Ager (Author)

Combined PLP - acoustic waveform classification for robust phoneme recognition using support vector machines

组合 PLP - 使用支持向量机进行稳健音素识别的声学波形分类

DOI：
发表时间：
2008
期刊：
European Signal Processing Conference, EUSIPCO 2008
影响因子：
0
作者：
J Yousafzai
通讯作者：
J Yousafzai

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zoran Cvetkovic其他文献

Overcomplete expansions and robustness

过度完备的扩展和鲁棒性

DOI：
10.1109/tfsa.1996.547479
发表时间：
1996
期刊：
Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96)
影响因子：
0
作者：
Zoran Cvetkovic;Martin Vetterli
通讯作者：
Martin Vetterli