A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
基本信息
- 批准号:9894822
- 负责人:
- 金额:$ 47.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-06-05 至 2021-03-31
- 项目状态:已结题
- 来源:
- 关键词:3&apos Untranslated Regions5&apos Untranslated RegionsAddressAffectAlternative SplicingBinding SitesBiologicalBiological AssayBiologyBiotechnologyCodeComputational TechniqueDNA LibraryDataData SetDiseaseElementsEngineeringExpression LibraryGene ExpressionGenesGenetic ProgrammingGenetic TranscriptionGenetic TranslationGenetic VariationGenomeGenomicsHumanHuman EngineeringHuman GeneticsHuman GenomeImageIn VitroLearningLibrariesMachine LearningMeasuresMessenger RNAMethodsModelingNeural Network SimulationPerformancePolyribosomesProductionPropertyProteinsRNARNA SplicingRNA StabilityRNA-Binding ProteinsRandomizedRegulator GenesRegulatory ElementResearch Project GrantsResolutionRibosomesRoleSiteSourceStructureTechniquesTestingTherapeuticThiouridineTimeTrainingTranscriptTranslatingTranslationsUntranslated RegionsValidationVariantWorkbasecomparativecomputer scienceconvolutional neural networkdesignexperimental studygenetic variantmRNA Stabilitymachine visionmemberneural networknovelpolysome profilingpractical applicationpredictive modelingprotein expressionribosome profilingscreeningstable cell linestatistical learningsynthetic constructvoice recognition
项目摘要
The leading and trailing untranslated regions (UTRs) of an mRNA, along with the coding sequence (CDS),
control protein production by modulating translation and mRNA stability. However, although we have identified
a vast number of regulatory features in these regions, we are still far from being able to predict, for example,
whether and how a sequence variant affects the levels of protein being made. Here, we propose to combine
high-throughput experimental characterization of protein expression in synthetic libraries with machine learning
to create predictive models of translation and mRNA stability, addressing an urgent need. Recent progress in
machine vision, voice recognition and other fields of computer science has been driven by the availability of
enormous data sets on which to train models. Machine learning approaches have also had remarkable impact
in biology, but biological data sets often are comparatively small, limiting the quality of models that can be
learned. For example, there are only around 20,000 genes in the human genome, a restrictively small set of
examples for training a predictive model that captures the full extent of the genome’s “regulatory code.” In this
proposal, we aim to overcome this data size limitation by training predictive models of protein expression on
data from millions of synthetic constructs -- a data set several orders of magnitude larger than the number of
genes in the genome. Specifically, we will create libraries of in vitro transcribed mRNA with targeted variation
in the UTRs and CDS and will assay protein expression of each library member by performing high-throughput
polysome profiling, ribosome profiling, and mRNA stability assays. We will then use neural network
approaches to learn predictive models of the relationship between mRNA sequence and levels of protein
production. We will apply our models to three applications of practical importance: first, we expect to uncover
novel biology, for example identifying regulatory sequence elements and interactions between them. Second,
we will validate our models through the de novo design and experimental testing of sequences that result in
higher levels or protein production than any of the millions of randomly generated members of the original
library or than the endogenous UTR sequences currently used in biotechnology. Such stable and highly
translating mRNA constructs would be of particular value for the field or mRNA therapeutics. Third, we will
predict the functional consequences of genetic variation in UTRs on protein production and we will validate
these predictions experimentally. We are far from understanding which genetic variants compromise gene
regulatory function in ways that may contribute to disease, making such a comprehensive and quantitative
analysis of variants valuable.
mRNA的领先和尾随的未翻译区域(UTRS)以及编码序列(CDS),
通过调节翻译和mRNA稳定性来控制蛋白质的产生。但是,尽管我们已经确定
在这些地区,许多监管特征,我们仍然无法预测,例如
序列变体是否以及如何影响所产生的蛋白质水平。在这里,我们建议结合
通过机器学习的合成库中蛋白质表达的高通量实验表征
为了创建翻译和mRNA稳定性的预测模型,以满足迫切需求。最近的进展
机器视觉,语音识别和其他计算机科学领域的可用性
训练模型的巨大数据集。机器学习方法也产生了显着影响
在生物学中,但是生物学数据集通常相对较小,这限制了模型的质量
学会了。例如,人类基因组中只有大约20,000个基因,这是一组限制性的一组
培训示例一个预测模型,该模型捕获了基因组“监管法规”的全部范围。在这个
提案,我们旨在通过训练蛋白质表达的预测模型来克服此数据尺寸限制
来自数百万合成结构的数据 - 数据集比数字大几个数量级
基因组中的基因。具体而言,我们将创建具有针对性变化的体外转录mRNA的库
在UTR和CD中,将通过执行高通量来测定每个库成员的蛋白质表达
多层分析,核糖体分析和mRNA稳定性测定。然后,我们将使用神经网络
学习mRNA序列与蛋白质水平之间关系的预测模型的方法
生产。我们将将模型应用于三种实际重要性的应用:首先,我们希望揭露
新型生物学,例如确定调节序列元素及其之间的相互作用。第二,
我们将通过从头设计和序列的实验测试来验证我们的模型
比原始的数百万随机成员中的任何一个更高的水平或蛋白质产生
图书馆或生物技术当前使用的内源性UTR序列。如此稳定和高度
翻译mRNA构建体对于该领域或mRNA疗法具有特别的价值。第三,我们会的
预测UTR遗传变异对蛋白质产生的功能后果,我们将验证
这些预测是通过实验性的。我们远没有理解哪些遗传变异损害基因
监管功能以可能导致疾病的方式,使如此全面而定量
分析变体价值。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
- 批准号:
9306648 - 财政年份:2017
- 资助金额:
$ 47.31万 - 项目类别:
相似海外基金
Emerging mechanisms of viral gene regulation from battles between host and SARS-CoV-2
宿主与 SARS-CoV-2 之间的战斗中病毒基因调控的新机制
- 批准号:
10725416 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别:
Regulation of RNA sensing and viral restriction by RNA structures
RNA 结构对 RNA 传感和病毒限制的调节
- 批准号:
10667802 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别:
Mechanisms of viral RNA maturation by co-opting cellular exonucleases
通过选择细胞核酸外切酶使病毒 RNA 成熟的机制
- 批准号:
10814079 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别: