Abstract The correlation between messenger RNA (mRNA) and protein abundances has long been debated. RNA sequencing (RNA-seq), a high-throughput, commonly used method for analyzing transcriptional dynamics, leaves questions about whether we can translate RNA-seq-identified gene signatures directly to protein changes. In this study, we utilized a set of 17 widely assessed immune and wound healing mediators in the context of canine volumetric muscle loss to investigate the correlation of mRNA and protein abundances. Our data reveal an overall agreement between mRNA and protein levels on these 17 mediators when examining samples from the same experimental condition (e.g. the same biopsy). However, we observed a lack of correlation between mRNA and protein levels for individual genes under different conditions, underscoring the challenges in converting transcriptional changes into protein changes. To address this discrepancy, we developed a machine learning model to predict protein abundances from RNA-seq data, achieving high accuracy. Our approach also effectively corrected multiple extreme outliers measured by antibody-based protein assays. Additionally, this model has the potential to detect post-translational modification events, as shown by accurately estimating activated transforming growth factor β1 levels. This study presents a promising approach for converting RNA-seq data into protein abundance and its biological significance.
摘要 信使RNA(mRNA)与蛋白质丰度之间的相关性长期以来一直备受争议。RNA测序(RNA - seq)是一种高通量、常用的分析转录动态的方法,这引发了我们是否能将RNA - seq所确定的基因特征直接转化为蛋白质变化的疑问。在本研究中,我们在犬肌肉容积性缺失的背景下,利用一组17种被广泛评估的免疫和伤口愈合介质来研究mRNA与蛋白质丰度的相关性。我们的数据显示,当检测来自相同实验条件(例如相同的活检样本)的样本时,这17种介质的mRNA和蛋白质水平总体上是一致的。然而,我们观察到在不同条件下单个基因的mRNA和蛋白质水平之间缺乏相关性,这突显了将转录变化转化为蛋白质变化所面临的挑战。为了解决这种差异,我们开发了一种机器学习模型,从RNA - seq数据预测蛋白质丰度,该模型达到了较高的准确性。我们的方法还有效地纠正了基于抗体的蛋白质检测所测得的多个极端异常值。此外,该模型有可能检测翻译后修饰事件,正如准确估计活化的转化生长因子β1水平所表明的那样。本研究为将RNA - seq数据转化为蛋白质丰度及其生物学意义提供了一种有前景的方法。