Motivation: The use of liquid chromatography coupled to mass spectrometry has enabled the high-throughput profiling of the metabolite composition of biological samples. However, the large amount of data obtained can be difficult to analyse and often requires computational processing to understand which metabolites are present in a sample. This article looks at the dual problem of annotating peaks in a sample with a metabolite, together with putatively annotating whether a metabolite is present in the sample. The starting point of the approach is a Bayesian clustering of peaks into groups, each corresponding to putative adducts and isotopes of a single metabolite.
Results: The Bayesian modelling introduced here combines information from the mass-to-charge ratio, retention time and intensity of each peak, together with a model of the inter-peak dependency structure, to increase the accuracy of peak annotation. The results inherently contain a quantitative estimate of confidence in the peak annotations and allow an accurate trade-off between precision and recall. Extensive validation experiments using authentic chemical standards show that this system is able to produce more accurate putative identifications than other state-of-the-art systems, while at the same time giving a probabilistic measure of confidence in the annotations.
Availability and implementation: The software has been implemented as part of the mzMatch metabolomics analysis pipeline, which is available for download at http://mzmatch.sourceforge.net/.
Contact: Ronan.Daly@glasgow.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
动机:液相色谱与质谱联用使得生物样本代谢物成分的高通量分析成为可能。然而,所获得的大量数据可能难以分析,并且通常需要进行计算处理以了解样本中存在哪些代谢物。本文探讨了用代谢物对样本中的峰进行注释以及对代谢物是否存在于样本中进行推定注释的双重问题。该方法的起点是将峰进行贝叶斯聚类分组,每个组对应于单个代谢物的假定加合物和同位素。
结果:这里介绍的贝叶斯模型结合了每个峰的质荷比、保留时间和强度信息,以及峰间依赖结构模型,以提高峰注释的准确性。结果本身包含了对峰注释置信度的定量估计,并允许在精确率和召回率之间进行精确的权衡。使用真实化学标准品进行的大量验证实验表明,该系统能够比其他现有先进系统产生更准确的推定鉴定结果,同时对注释给出了置信度的概率度量。
可用性和实现:该软件已作为mzMatch代谢组学分析流程的一部分实现,可从http://mzmatch.sourceforge.net/下载。
联系人:Ronan.Daly@glasgow.ac.uk
补充信息:补充数据可在Bioinformatics在线获取。