Computational Methods for Assembling Multiple RNA-seq Samples
组装多个 RNA-seq 样本的计算方法
基本信息
- 批准号:10350634
- 负责人:
- 金额:$ 37.04万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-02-12 至 2026-01-31
- 项目状态:未结题
- 来源:
- 关键词:AddressAlgorithm DesignAlgorithmsBiologicalBiological AssayBiomedical ResearchBiotechnologyCommunitiesComplementComplicationComputing MethodologiesConsensusDataEffectivenessFoundationsGenesGenotype-Tissue Expression ProjectGraphIndividualIntronsLearningLengthMeasuresMethodsModelingNatureNeurofibrillary TanglesOutcomePhaseProtein IsoformsProtocols documentationRNA SplicingReproducibilityResearchReverse Transcriptase Polymerase Chain ReactionSamplingScallopSignal TransductionStatistical MethodsStatistical ModelsStructureTestingThe Cancer Genome AtlasTimeTissuesTranscriptTypologyUpdateVisionWeightbiological researchdata structuredesignexperimental studygenomic locusimprovedindexinglearning algorithmnovelopen sourcepreservationrepositorysimulationtranscriptometranscriptome sequencing
项目摘要
PROJECT SUMMARY / ABSTRACT
RNA-seq has become standard routine in many biological and biomedical experiments to study gene activities.
A very first step of RNA-seq analysis is usually to quantify the expression abundance of each transcript in the
reference transcriptome. However, studies have showed that current transcriptome is incomplete, which limits
the accuracy of expression quantification. As large-scale RNA-seq data are now available, an efficient and robust
way of constructing transcriptome is the assembly of the full-length expressed transcripts from a set of RNA-seq
samples, a computational problem known as meta-assembly. This proposal addresses this problem and aims to
develop efficient meta-assemblers for short-reads and long-reads RNA-seq data.
As previous studies, we have developed so far the most accurate single-sample assemblers Scallop (Nature
Biotechnology, 2017; for short-reads RNA-seq) and Scallop-LR (for long-reads RNA-seq). The core of Scallop
and Scallop-LR is the use of splice graph together with phasing paths, which encode reads spanning more than two
vertices, to represent reads alignment, and a novel algorithm that decomposes the splice graph while preserves
all phasing paths. This data structure and idea of “phase-preserving” provides algorithmic foundations for our
proposed meta-assembly algorithms.
The key of meta-assembly is to take advantage of shared and complementary information in the given samples.
We propose to combine multiple samples at the splice graph level. Specifically, for each gene locus, we construct
a single combined splice graph, through merging individual splice graphs and pooling their phasing paths. To
keep the information in individual splice graphs, their typologies will be encoded as additional phasing paths. The
entire data structure is therefore space-efficient and loss-free, and can be piped into following phasing-preserving
algorithms for decomposition. We will specialize our existing phasing-preserving algorithms to handle paired-end
phasing paths and long phasing paths. Eventually, statistical methods will be developed to infer the statistical
significance of each individual assembled transcript, and multiple hypothesis testing will be performed to control
overall falsely discovered transcripts. We also propose a new consensus-approach that learns a discriminator to
automatically select the optimal algorithm for different meta-assembly instances.
The outcomes of this project will be open-source, easy-to-use, reproducible and accurate meta-assemblers for
short-reads and long-reads RNA-seq data, respectively. These meta-assemblers will then enable more accurate
identification of novel isoforms and the annotation of gene structures. Combined with large-scale RNA-seq data,
data-driven transcriptomes can be constructed, benefiting downstream study such as RNA-seq quantification and
differential analysis.
项目摘要 /摘要
在许多生物学和生物医学实验中,RNA-SEQ已成为研究基因活性的标准常规。
RNA-seq分析的第一步通常是量化每个转录本在
参考转录组。但是,研究表明当前的转录组不完整,这限制了
表达定量的准确性。由于现在可以使用大规模的RNA-seq数据,因此有效而稳健
构造转录组的方法是从一组RNA-Seq组装的全长表达转录本的组装
样本,一种计算问题,称为元组装。该建议解决了这个问题,并旨在
为短阅读和长阅读RNA-seq数据开发有效的元组件。
正如先前的研究一样,我们已经开发了最准确的单样本组件扇贝(自然
生物技术,2017年;用于简读RNA-Seq)和扇贝LR(用于长阅读RNA-Seq)。扇贝的核心
Scallop-lr是剪接图的使用以及相相路径,该路径的编码读取了两个以上
顶点,代表读取对齐,以及一种分解剪接图的新型算法
所有分阶段路径。这种数据结构和“阶段保护”的想法为我们的算法基础提供了算法基础
拟议的元组装算法。
元组装的关键是利用给定样本中的共享和互补信息。
我们建议将多个样品在剪接图级别结合在一起。具体而言,对于每个基因基因座,我们构建
单个组合的剪接图,通过合并单个剪接图并汇总其相位路径。到
将信息保留在单个剪接图中,它们的类型将被编码为附加的相位路径。这
因此,整个数据结构是空间效率且无损失的,并且可以管道以遵循定位
分解算法。我们将专门为现有的Phasing-Phasing-pheserving算法处理配对端
相阶段和长相位路径。最终,将开发统计方法来推断统计
每个组装的成绩单的重要性以及将进行多个假设测试以控制
总体错误发现的成绩单。我们还提出了一个新的共识,可以学习一个歧视者
自动为不同的元组装实例选择最佳算法。
该项目的结果将是开源,易于使用,可重现和准确的元组件
短读和长阅读RNA-seq数据。然后,这些元组件将启用更准确的
新型同工型的鉴定和基因结构的注释。结合大规模RNA-seq数据,
可以构建数据驱动的转录组,从而使下游研究受益,例如RNA-seq量化和
差分分析。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MINGFU SHAO其他文献
MINGFU SHAO的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MINGFU SHAO', 18)}}的其他基金
Computational Methods for Assembling Multiple RNA-seq Samples
组装多个 RNA-seq 样本的计算方法
- 批准号:
10550251 - 财政年份:2021
- 资助金额:
$ 37.04万 - 项目类别:
相似国自然基金
利用细胞内RNA结构信息结合深度学习算法设计高效细胞环境特异的CRISPR-Cas13d gRNA
- 批准号:32300521
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于超图的装填与覆盖问题的多项式时间可解性及近似算法设计研究
- 批准号:12361065
- 批准年份:2023
- 资助金额:27 万元
- 项目类别:地区科学基金项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
图机器学习的理论、模型与算法设计
- 批准号:62376007
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
财富科技的监管挑战与监管设计:算法投资视角的研究
- 批准号:72303197
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
相似海外基金
A multicenter study in bronchoscopy combining Stimulated Raman Histology with Artificial intelligence for rapid lung cancer detection - The ON-SITE study
支气管镜检查结合受激拉曼组织学与人工智能快速检测肺癌的多中心研究 - ON-SITE 研究
- 批准号:
10698382 - 财政年份:2023
- 资助金额:
$ 37.04万 - 项目类别:
HEAR-HEARTFELT (Identifying the risk of Hospitalizations or Emergency depARtment visits for patients with HEART Failure in managed long-term care through vErbaL communicaTion)
倾听心声(通过口头交流确定长期管理护理中的心力衰竭患者住院或急诊就诊的风险)
- 批准号:
10723292 - 财政年份:2023
- 资助金额:
$ 37.04万 - 项目类别:
Traumatic Brain Injury Anti-Seizure Prophylaxis in the Medicare Program
医疗保险计划中的创伤性脑损伤抗癫痫预防
- 批准号:
10715238 - 财政年份:2023
- 资助金额:
$ 37.04万 - 项目类别:
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 37.04万 - 项目类别:
Enhanced Medication Management to Control ADRD Risk Factors Among African Americans and Latinos
加强药物管理以控制非裔美国人和拉丁裔的 ADRD 风险因素
- 批准号:
10610975 - 财政年份:2023
- 资助金额:
$ 37.04万 - 项目类别: