Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data
从大量 RNA-seq 数据中表征选择性剪接的计算方法
基本信息
- 批准号:10387065
- 负责人:
- 金额:$ 4.56万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-20 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:Alternative SplicingBig DataBioinformaticsCapillary ElectrophoresisCollectionComplexComputing MethodologiesDataData SetDiseaseEpigenetic ProcessGenesGraphHealthHigh-Throughput Nucleotide SequencingHumanHuman BiologyIndividualIntronsMethodsModelingMutationNoisePerformancePhysiologyQuantitative Reverse Transcriptase PCRRNARNA SplicingRNA analysisRegulationRegulator GenesRoleSamplingSequence AlignmentSpliced GenesStatistical MethodsStructureSurveysSystemTechniquesTestingThyroid carcinomaTissuesTranscriptVariantbasecell typedeep learningdesignfeature detectionfeature selectionheterogenous datahuman diseaseimprovedinnovationnext generationnovelprogramssample collectionsimulationtooltranscriptome sequencing
项目摘要
SUMMARY
Alternative splicing (AS) is a gene regulatory mechanism with important roles in human biology and disease.
High throughput sequencing of RNA (RNA-seq) is making it possible to survey the expressed genes and their
alternative splicing variations in a wide variety of cellular conditions. However, the short reads are challenging
to analyze, demanding highly sophisticated computational methods that can extract meaningful AS information
efficiently, accurately, and in a comprehensive way. While there has been great progress so far, current
methods based on assembling the short reads into transcript annotations have reached a plateau. We propose
two innovations that can help overcome the limits. The first is one-step simultaneous analyses of multiple
samples in an RNA-seq collection, in contrast with the current two-step approach that analyzes each sample
separately and then merges the results. The second is to create and interrogate assembly-free representations
of AS. The project will design a suite of tools that will leverage the latent information in large collections of
samples and from heterogeneous data types to build complete and accurate AS signatures of tissues and cell
types, and to elucidate the regulatory circuitry of AS and its functional implications. Aim 1 will develop a high-
performance multi-sample transcript assembly tool, combining subexon graph representations of genes and
AS variations, statistical methods for improved feature detection, and search space reduction techniques for
efficient sample processing. Aim 2 will build highly efficient and accurate feature selection tools to detect and
characterize assembly-free AS variations (subexons and introns), simultaneously from collections of RNA-seq
samples. It will combine novel regularized programs with complex models of intronic `noise' and other RNA-seq
confounders, and enable analyses of differential splicing and to identify individual and group-specific variations.
Lastly, Aim 3 will develop a system to comprehensively model the regulatory and functional circuitry of AS and
the effects of mutations, starting from deep learning models of sequences and alignments and integrating
expression, sequence, epigenetic and mutation data across tissues, cell types and conditions. We will
rigorously test and evaluate all tools in simulations and on large public data sets, as well as on thyroid and
head and neck cancer data provided by our collaborators, and we will experimentally validate random subsets
of predictions with capillary electrophoresis and qRT-PCR. Collectively, the concepts, methods and tools will
establish a new framework for analyzing RNA-seq data that can efficiently tackle the `big data' challenges,
leading to more complete discovery and annotation of AS structure and function in human health and disease.
概括
替代剪接(AS)是一种基因调节机制,在人类生物学和疾病中具有重要作用。
RNA的高通量测序(RNA-Seq)使得可以调查表达的基因及其
在各种细胞条件下的替代剪接变化。但是,简短的读物具有挑战性
为了分析,要求高度复杂的计算方法可以提取有意义的信息
有效,准确,全面。尽管到目前为止取得了很大进展,但目前
基于组装简短读取到成绩单注释的方法达到了平稳。我们建议
两项可以帮助克服限制的创新。第一个是同时分析多个的一步
与当前的两步方法相比,RNA-seq集合中的样品分析了每个样品
分别合并结果。第二个是创建和审问无装配的表示
as。该项目将设计一套工具,可以利用大量收藏中的潜在信息
样品和来自异质数据类型,以构建完整而准确的组织和细胞的特征
类型,并阐明AS的调节电路及其功能含义。 AIM 1将发展高
性能多样本成绩单组装工具,结合了基因和
作为变化,用于改进功能检测的统计方法以及搜索空间减少技术
有效的样品处理。 AIM 2将建立高效,准确的功能选择工具来检测和
从RNA-seq的集合中同时将无装配形式为变化(子效果和内含子)表征
样品。它将将新颖的正则程序与复杂的内含子“噪声”和其他RNA-Seq结合起来
混杂因素,并能够分析差异剪接并识别个体和特定组的变化。
最后,AIM 3将开发一个系统,以全面建模AS和功能电路
突变的效果,从序列和对齐的深度学习模型开始,然后整合
跨组织,细胞类型和条件的表达,序列,表观遗传和突变数据。我们将
严格测试和评估模拟和大型公共数据集中的所有工具,以及甲状腺和甲状腺和
我们的合作者提供的头和颈癌数据,我们将实验验证随机子集
毛细管电泳和QRT-PCR的预测。总体而言,概念,方法和工具将
建立一个新的框架来分析可以有效应对“大数据”挑战的RNA-seq数据,
导致更完整的发现和注释人类健康和疾病中的结构和功能。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Liliana D Florea其他文献
Liliana D Florea的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Liliana D Florea', 18)}}的其他基金
Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data
从大量 RNA-seq 数据中表征选择性剪接的计算方法
- 批准号:
10021689 - 财政年份:2019
- 资助金额:
$ 4.56万 - 项目类别:
Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data
从大量 RNA-seq 数据中表征选择性剪接的计算方法
- 批准号:
10218209 - 财政年份:2019
- 资助金额:
$ 4.56万 - 项目类别:
Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data
从大量 RNA-seq 数据中表征选择性剪接的计算方法
- 批准号:
10450006 - 财政年份:2019
- 资助金额:
$ 4.56万 - 项目类别:
相似国自然基金
基于医疗大数据的阿尔茨海默病症状发展预测模型
- 批准号:61802360
- 批准年份:2018
- 资助金额:27.0 万元
- 项目类别:青年科学基金项目
基于多组学与临床大数据分析的胰腺导管腺癌一致性分子分型研究与新药靶点的筛选
- 批准号:81802384
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
基于果蝇piRNA大数据挖掘的基因调控机制研究
- 批准号:61802256
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
基于海洋大数据深度学习的渔情预测模型研究
- 批准号:41776142
- 批准年份:2017
- 资助金额:59.0 万元
- 项目类别:面上项目
面向数据密集型计算的局部模式挖掘与搜索方法
- 批准号:61702161
- 批准年份:2017
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
RNA splicing regulation during alcohol withdrawal
酒精戒断过程中的 RNA 剪接调节
- 批准号:
10785159 - 财政年份:2023
- 资助金额:
$ 4.56万 - 项目类别:
Computational tools and resources to study alternative splicing and mRNA isoform variation
研究选择性剪接和 mRNA 亚型变异的计算工具和资源
- 批准号:
10669330 - 财政年份:2022
- 资助金额:
$ 4.56万 - 项目类别:
Generalizable biomedical informatics strategies for predictive modeling of treatment response
用于治疗反应预测建模的通用生物医学信息学策略
- 批准号:
10259888 - 财政年份:2020
- 资助金额:
$ 4.56万 - 项目类别:
Generalizable biomedical informatics strategies for predictive modeling of treatment response
用于治疗反应预测建模的通用生物医学信息学策略
- 批准号:
10463755 - 财政年份:2020
- 资助金额:
$ 4.56万 - 项目类别:
Generalizable biomedical informatics strategies for predictive modeling of treatment response
用于治疗反应预测建模的通用生物医学信息学策略
- 批准号:
10117702 - 财政年份:2020
- 资助金额:
$ 4.56万 - 项目类别: