The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-β signaling pathway (P<10−50). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era.
高通量技术用于产生大量蛋白质 - 蛋白质相互作用(PPI)数据,这增加了对系统地且自动地提示蛋白质之间功能关系的方法的需求。在酵母PPI网络中,先前的研究表明,局部连接拓扑结构,特别是对于两个具有异常多共同邻居的蛋白质,能够预测功能关联。在本研究中,我们通过开发一种新算法改进了预测方案,并将其应用于人类PPI网络以进行全基因组功能推断。我们使用新算法来测量并减少枢纽蛋白对检测功能相关蛋白对的影响。我们使用基因本体论(GO)和京都基因与基因组百科全书(KEGG)的注释作为基准来比较和评估功能相关性。将我们的算法应用于人类PPI数据,在1754个蛋白质中产生了4233个显著的功能关联。对它们之间进一步的功能比较使我们能够为274个蛋白质指定466个KEGG通路注释,为114个蛋白质指定123个GO注释,KEGG的估计错误发现率<21%,GO的估计错误发现率<30%。我们根据功能关联对1729个蛋白质进行聚类,并通过对一个在转化生长因子 - β信号通路中高度富集的子聚类(P < 10⁻⁵⁰)的详细分析进行功能推断。对另外四个子聚类的分析也表明在六个信号通路中可能存在新的参与者,值得进一步进行实验研究。我们的研究清晰地洞察了基于共同邻居的预测方案,并为这个后基因组时代的大规模功能注释提供了一种可靠的方法。