Protein surfaces serve as an interface with the molecular environment and are thus tightly bound to protein function. On the surface, geometric and chemical complementarity to other molecules provides interaction specificity for ligand binding, docking of bio-macromolecules, and enzymatic catalysis.
As of today, there is no accepted general scheme to represent protein surfaces. Furthermore, most of the research on protein surface focuses on regions of specific interest such as interaction, ligand binding, and docking sites. We present a first step toward a general purpose representation of protein surfaces: a novel surface patch library that represents most surface patches (~98%) in a data set regardless of their functional roles.
Surface patches, in this work, are small fractions of the protein surface. Using a measure of inter-patch distance, we clustered patches extracted from a data set of high quality, non-redundant, proteins. The surface patch library is the collection of all the cluster centroids; thus, each of the data set patches is close to one of the elements in the library.
We demonstrate the biological significance of our method through the ability of the library to capture surface characteristics of native protein structures as opposed to those of decoy sets generated by state-of-the-art protein structure prediction methods. The patches of the decoys are significantly less compatible with the library than their corresponding native structures, allowing us to reliably distinguish native models from models generated by servers. This trend, however, does not extend to the decoys themselves, as their similarity to the native structures does not correlate with compatibility with the library.
We expect that this high-quality, generic surface patch library will add a new perspective to the description of protein structures and improve our ability to predict them. In particular, we expect that it will help improve the prediction of surface features that are apparently neglected by current techniques.
The surface patch libraries are publicly available at http://www.cs.bgu.ac.il/~keasar/patchLibrary.
蛋白质表面作为与分子环境的接触面,因此与蛋白质功能紧密相关。在表面上,与其他分子的几何和化学互补性为配体结合、生物大分子对接以及酶催化提供了相互作用特异性。
到目前为止,还没有一种被广泛接受的通用方案来表示蛋白质表面。此外,大多数关于蛋白质表面的研究都集中在特定感兴趣的区域,如相互作用、配体结合和对接位点。我们朝着通用表示蛋白质表面迈出了第一步:一个新的表面斑块库,它能表示数据集中的大多数表面斑块(约98%),而不论其功能角色如何。
在这项工作中,表面斑块是蛋白质表面的一小部分。利用斑块间距离的度量,我们对从高质量、无冗余蛋白质数据集中提取的斑块进行了聚类。表面斑块库是所有聚类中心的集合;因此,数据集中的每个斑块都接近库中的一个元素。
我们通过该库捕捉天然蛋白质结构表面特征的能力,展示了我们方法的生物学意义,与由最先进的蛋白质结构预测方法生成的诱饵集的表面特征形成对比。诱饵的斑块与库的兼容性明显低于其相应的天然结构,这使我们能够可靠地区分天然模型和服务器生成的模型。然而,这种趋势并不适用于诱饵本身,因为它们与天然结构的相似性与与库的兼容性无关。
我们期望这个高质量、通用的表面斑块库将为蛋白质结构的描述增添一个新视角,并提高我们预测蛋白质结构的能力。特别是,我们期望它将有助于改进对当前技术明显忽视的表面特征的预测。
表面斑块库可在http://www.cs.bgu.ac.il/~keasar/patchLibrary公开获取。