Electrospray tandem mass spectrometry (ESI-MS/MS) is commonly used in high throughput metabolomics. One of the key obstacles to the effective use of this technology is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS or MS/MS spectrum to the spectra in a reference database, ranking candidates based on the closeness of the match. However the limited coverage of available databases has led to an interest in computational methods for predicting reference MS/MS spectra from chemical structures. This work proposes a probabilistic generative model for the MS/MS fragmentation process, which we call competitive fragmentation modeling (CFM), and a machine learning approach for learning parameters for this model from MS/MS data. We show that CFM can be used in both a MS/MS spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target MS/MS spectrum). In the MS/MS spectrum prediction task, CFM shows significantly improved performance when compared to a full enumeration of all peaks corresponding to substructures of the molecule. In the metabolite identification task, CFM obtains substantially better rankings for the correct candidate than existing methods (MetFrag and FingerID) on tripeptide and metabolite data, when querying PubChem or KEGG for candidate structures of similar mass.
电喷雾串联质谱(ESI - MS/MS)常用于高通量代谢组学。有效使用该技术的关键障碍之一是难以解释测量的光谱,从而准确且高效地识别代谢物。用于自动代谢物识别的传统方法是将目标MS或MS/MS光谱与参考数据库中的光谱进行比较,并根据匹配的接近程度对候选物进行排序。然而,现有数据库的有限覆盖范围导致人们对从化学结构预测参考MS/MS光谱的计算方法产生了兴趣。这项工作提出了一种用于MS/MS裂解过程的概率生成模型,我们称之为竞争裂解建模(CFM),以及一种从MS/MS数据中学习该模型参数的机器学习方法。我们表明,CFM可用于MS/MS光谱预测任务(即从化学结构预测质谱)以及假定代谢物识别任务(对目标MS/MS光谱的可能结构进行排序)。在MS/MS光谱预测任务中,与分子子结构对应的所有峰的完全枚举相比,CFM显示出显著提高的性能。在代谢物识别任务中,当在PubChem或KEGG中查询质量相似的候选结构时,对于三肽和代谢物数据,CFM为正确候选物获得的排名比现有方法(MetFrag和FingerID)要好得多。