Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search.
In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells.
Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.
The online version of this article (10.1186/s12864-018-5026-x) contains supplementary material, which is available to authorized users.
通过自上而下的串联质谱法进行蛋白质亚型鉴定时,数据库搜索一直是主要方法。然而,当产生光谱的目标蛋白质亚型包含翻译后修饰(PTMs)和/或突变时,在大型数据库中针对所有无任何PTMs和突变的蛋白质序列比对查询光谱是非常耗时的。因此,开发高效且灵敏的过滤算法以加快数据库搜索速度至关重要。
在本文中,我们提出了一种基于光谱图匹配(SGM)的蛋白质序列过滤方法,用于自上而下的质谱鉴定。它利用查询光谱的子光谱生成光谱图,并针对蛋白质数据库进行搜索以报告最佳候选序列。由于序列标签和带间隙标签方法需要提取和选择标签的预处理步骤,SGM过滤方法避开了这一预处理步骤,从而简化了数据处理。我们在一个大肠杆菌自上而下的质谱数据集上,使用各种参数设置评估了SGM过滤方法的过滤效率,并在一个MCF - 7细胞数据集上比较了SGM过滤方法和两种基于标签的过滤方法的性能。
数据集上的实验结果表明,SGM过滤方法在蛋白质序列过滤中实现了高灵敏度。在与光谱比对算法结合时,在自上而下的质谱数据分析中,与基于标签的方法相比,SGM过滤方法显著增加了已鉴定的蛋白质亚型光谱匹配数量。
本文的在线版本(10.1186/s12864 - 2018 - 5026 - x)包含补充材料,授权用户可获取。