We propose the notion of latent acoustic topics to capture contextual information embedded within a collection of audio signals. The central idea is to learn a probability distribution over a set of latent topics of a given audio clip in an unsupervised manner, assuming that there exist latent acoustic topics and each audio clip can be described in terms of those latent acoustic topics. In this regard, we use the latent Dirichlet allocation (LDA) to implement the acoustic topic models over elemental acoustic units, referred as acoustic words, and perform text-like audio signal processing. Experiments on audio tag classification with the BBC sound effects library demonstrate the usefulness of the proposed latent audio context modeling schemes. In particular, the proposed method is shown to be superior to other latent structure analysis methods, such as latent semantic analysis and probabilistic latent semantic analysis. We also demonstrate that topic models can be used as complementary features to content-based features and offer about 9% relative improvement in audio classification when combined with the traditional Gaussian mixture model (GMM)–Support Vector Machine (SVM) technique.
我们提出潜在声学主题的概念,以捕捉嵌入在一组音频信号中的上下文信息。核心思想是以无监督的方式学习给定音频片段在一组潜在主题上的概率分布,假设存在潜在声学主题,并且每个音频片段都可以用这些潜在声学主题来描述。在这方面,我们使用潜在狄利克雷分配(LDA)在基本声学单元(称为声学词)上实现声学主题模型,并进行类似文本的音频信号处理。使用英国广播公司音效库进行的音频标签分类实验证明了所提出的潜在音频上下文建模方案的有效性。特别是,所提出的方法被证明优于其他潜在结构分析方法,如潜在语义分析和概率潜在语义分析。我们还证明了主题模型可以用作基于内容的特征的补充特征,并且当与传统的高斯混合模型(GMM) - 支持向量机(SVM)技术相结合时,在音频分类中提供了约9%的相对改进。