Motivation: Brain imaging genetics, which studies the linkage between genetic variations and structural or functional measures of the human brain, has become increasingly important in recent years. Discovering the bi-multivariate relationship between genetic markers such as single-nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs) is one major task in imaging genetics. Sparse Canonical Correlation Analysis (SCCA) has been a popular technique in this area for its powerful capability in identifying bi-multivariate relationships coupled with feature selection. The existing SCCA methods impose either the l(1)-norm or its variants to induce sparsity. The l(0)-norm penalty is a perfect sparsity-inducing tool which, however, is an NP-hard problem.Results: In this paper, we propose the truncated l(1)-norm penalized SCCA to improve the performance and effectiveness of the l(1)-norm based SCCA methods. Besides, we propose an efficient optimization algorithms to solve this novel SCCA problem. The proposed method is an adaptive shrinkage method via tuning tau. It can avoid the time intensive parameter tuning if given a reasonable small tau. Furthermore, we extend it to the truncated group-lasso (TGL), and propose TGL-SCCA model to improve the group-lasso-based SCCA methods. The experimental results, compared with four benchmark methods, show that our SCCA methods identify better or similar correlation coefficients, and better canonical loading profiles than the competing methods. This demonstrates the effectiveness and efficiency of our methods in discovering interesting imaging genetic associations.
动机:脑影像遗传学研究基因变异与人类大脑结构或功能指标之间的联系,近年来变得越来越重要。发现单核苷酸多态性(SNP)等基因标记与神经影像数量性状(QT)之间的双变量关系是影像遗传学的一项主要任务。稀疏典型相关分析(SCCA)因其在识别双变量关系以及特征选择方面的强大能力,已成为该领域的一种流行技术。现有的SCCA方法要么施加l(1) -范数要么施加其变体来诱导稀疏性。l(0) -范数惩罚是一种理想的诱导稀疏性的工具,然而它是一个NP难问题。
结果:在本文中,我们提出截断的l(1) -范数惩罚的SCCA,以提高基于l(1) -范数的SCCA方法的性能和有效性。此外,我们提出一种高效的优化算法来解决这个新的SCCA问题。所提出的方法是一种通过调整τ的自适应收缩方法。如果给定一个合理小的τ,它可以避免耗时的参数调整。此外,我们将其扩展到截断组套索(TGL),并提出TGL - SCCA模型以改进基于组套索的SCCA方法。与四种基准方法相比,实验结果表明我们的SCCA方法能够识别出更好或相似的相关系数,以及比竞争方法更好的典型载荷分布。这证明了我们的方法在发现有趣的影像遗传关联方面的有效性和高效性。