QPhosphorylation is a crucial way to control the activity of proteins in many eukaryotic organisms in vivo. Experimental methods to determine phosphorylation sites in substrates are usually restricted by the in vitro condition of enzymes and very intensive in time and labor. Although some in silico methods and web servers have been introduced for automatic detection of phosphorylation sites, sophisticated methods are still in urgent demand to further improve prediction performances. Protein primary sequences can help predict phosphorylation sites catalyzed by different protein kinase and most computational approaches use a short local peptide to make prediction. However, the useful information may be lost if only the conservative residues that are not close to the phosphorylation site are considered in prediction, which would hamper the prediction results. A novel prediction method named IEPP (Information-Entropy based Phosphorylation Prediction) is presented in this paper for automatic detection of potential phosphorylation sites. In prediction, the sites around the phosphorylation sites are selected or excluded by their entropy values. The algorithm was compared with other methods such as GSP and PPSP on the ABL, MAPK and PKA PK families. The superior prediction accuracies were obtained in various measurements such as sensitivity (Sn) and specificity (Sp). Furthermore, compared with some online prediction web servers on the new discovered phosphorylation sites, IEPP also yielded the best performance. IEPP is another useful computational resource for identification of PK-specific phosphorylation sites and it also has the advantages of simpleness, efficiency and convenience.
磷酸化是体内许多真核生物中控制蛋白质活性的关键方式。确定底物中磷酸化位点的实验方法通常受酶的体外条件限制,并且非常耗时费力。尽管已经引入了一些用于自动检测磷酸化位点的计算机模拟方法和网络服务器,但仍然迫切需要更先进的方法来进一步提高预测性能。蛋白质一级序列有助于预测由不同蛋白激酶催化的磷酸化位点,并且大多数计算方法使用短的局部肽段进行预测。然而,如果在预测中仅考虑远离磷酸化位点的保守残基,可能会丢失有用信息,这将影响预测结果。本文提出了一种名为IEPP(基于信息熵的磷酸化预测)的新型预测方法,用于自动检测潜在的磷酸化位点。在预测中,根据其熵值选择或排除磷酸化位点周围的位点。该算法在ABL、MAPK和PKA蛋白激酶家族上与其他方法(如GSP和PPSP)进行了比较。在诸如灵敏度(Sn)和特异性(Sp)等各种测量指标中都获得了较高的预测准确性。此外,与一些针对新发现的磷酸化位点的在线预测网络服务器相比,IEPP也表现出最佳性能。IEPP是另一种用于识别蛋白激酶特异性磷酸化位点的有用计算资源,并且它还具有简单、高效和便捷的优点。