The identification of genes involved in human complex diseases remains a great challenge in computational systems biology. Although methods have been developed to use disease phenotypic similarities with a protein-protein interaction network for the prioritization of candidate genes, other valuable omics data sources have been largely overlooked in these methods.
With this understanding, we proposed a method called BRIDGE to prioritize candidate genes by integrating disease phenotypic similarities with such omics data as protein-protein interactions, gene sequence similarities, gene expression patterns, gene ontology annotations, and gene pathway memberships. BRIDGE utilizes a multiple regression model with lasso penalty to automatically weight different data sources and is capable of discovering genes associated with diseases whose genetic bases are completely unknown.
We conducted large-scale cross-validation experiments and demonstrated that more than 60% known disease genes can be ranked top one by BRIDGE in simulated linkage intervals, suggesting the superior performance of this method. We further performed two comprehensive case studies by applying BRIDGE to predict novel genes and transcriptional networks involved in obesity and type II diabetes.
The proposed method provides an effective and scalable way for integrating multi omics data to infer disease genes. Further applications of BRIDGE will be benefit to providing novel disease genes and underlying mechanisms of human diseases.
在计算系统生物学中,识别与人类复杂疾病相关的基因仍然是一个巨大的挑战。尽管已经开发出一些方法,利用疾病表型相似性以及蛋白质 - 蛋白质相互作用网络对候选基因进行优先级排序,但在这些方法中,其他有价值的组学数据源在很大程度上被忽视了。
基于这种认识,我们提出了一种名为BRIDGE的方法,通过将疾病表型相似性与蛋白质 - 蛋白质相互作用、基因序列相似性、基因表达模式、基因本体注释以及基因通路成员等组学数据相结合,对候选基因进行优先级排序。BRIDGE利用带有套索惩罚的多元回归模型自动对不同数据源进行加权,并且能够发现与那些遗传基础完全未知的疾病相关的基因。
我们进行了大规模的交叉验证实验,并证明在模拟连锁区间中,超过60%的已知疾病基因能够被BRIDGE排在首位,这表明该方法具有优越的性能。我们进一步通过将BRIDGE应用于预测肥胖症和II型糖尿病所涉及的新基因和转录网络,进行了两个全面的案例研究。
所提出的方法为整合多组学数据以推断疾病基因提供了一种有效且可扩展的途径。BRIDGE的进一步应用将有助于提供新的疾病基因以及人类疾病的潜在机制。