There exist two classical linear methods for feature extraction, i.e. principal component analysis (PCA) and Fisher discriminant analysis (FDA). PCA best represents the data while FDA best separates the data in the least squares sense with different scatter measures from samples. This paper discusses a regularized scatter measure (RSM) as a linear combination of within-class and between-class scatters for feature extraction. The tradeoff between for representation and for discrimination is controlled via some suitable regularization parameters and the corresponding eigenvalue problem is resolved without singularity. Experiments on two different size data sets demonstrate the effectiveness of the method. In addition, we can see that the counterpart of PCA, i.e. minor component analysis (MCA), is to optimize one special case of RSM. And this provides another easy way for understanding why MCA outperforms PCA for feature extraction in one-class classification problem.
存在两种经典的线性特征提取方法,即主成分分析(PCA)和费舍尔判别分析(FDA)。PCA能最好地表示数据,而FDA在最小二乘意义下,根据样本的不同离散度量能最好地分离数据。本文讨论了一种正则化离散度量(RSM),它是类内离散和类间离散的线性组合,用于特征提取。通过一些合适的正则化参数来控制表示和判别之间的权衡,并且相应的特征值问题可以在无奇异性的情况下得到解决。在两个不同规模数据集上的实验证明了该方法的有效性。此外,我们可以看到PCA的对应方法,即小成分分析(MCA),是优化RSM的一种特殊情况。这为理解为什么在单类分类问题中MCA在特征提取方面优于PCA提供了另一种简便的途径。