Algorithms and Software for Provably Accurate De Novo RNA-Seq Assembly
用于可证明准确的 De Novo RNA-Seq 组装的算法和软件
基本信息
- 批准号:9145263
- 负责人:
- 金额:$ 45.88万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-09-16 至 2018-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmic SoftwareAlgorithmsAlternative SplicingAnimal ModelAutomobile DrivingBenchmarkingBiological AssayBiological SciencesComplexComplicationComputer softwareDataData SetDetectionDevelopmentDiagnosticEvaluationFoundationsFundingGene StructureGenerationsGenesGenomeGovernmentHealthHigh-Throughput Nucleotide SequencingHumanIndividualIndustryInformation TheoryJointsLeadLengthLettersMalignant NeoplasmsMeasurementMemoryMethodologyMethodsMonitorOrganismPerformanceProcessProtein IsoformsPythonsRNAReadingSamplingSideSpeedStructureSystemTechnologyTestingTimeTissuesTranscriptWorkWritingbaseclinical applicationdesignheuristicsimprovedinsightnanoporenovel strategiesparallelizationpersonalized medicineprogramsprototypereconstructionreference genomeresearch studysoftware developmenttheoriestranscriptometranscriptome sequencingtranscriptomics
项目摘要
DESCRIPTION (provided by applicant): RNA-Seq has revolutionized transcriptomics and is one of the most important high-throughput sequencing assays invented in recent years. The key computational problem is that of de novo assembly: the reconstruction of the transcripts and their abundances from tens to hundreds of millions of short reads. The problem is challenging due to a confluence of several factors: large number of different transcripts (tens of thousands), long repeat across transcripts due to alternative splicing, widely varying abundances across transcripts, and the presence of read errors. Existing assemblers are mostly designed based on heuristic considerations and implement ad hoc methods that lead to unreliable transcriptome reconstructions. An accurate RNA-Seq assembler would enable more accurate identification of fusions in cancer transcriptomes, better gene annotations in model and non-model organisms, and more complete analyses of the dynamics of alternative splicing driving developmental and regulatory programs. In this proposal, we offer a systematic approach to the design of RNA-Seq assemblers based on information theoretic principles. We start by determining conditions data that guarantee that there enough information to reconstruct the transcriptome, and then propose an assembly algorithm that can reconstruct with the minimal information. This algorithm optimally uses the available read information to resolve repeats and disambiguate isoforms. A key insight derived from the information theoretic approach is that widely varying abundances across transcripts, rather than a complication, can actually be exploited as signatures of different transcripts to disambiguate among them. Based on our initial ideas, we have built, evaluated and compared an initial prototype with several existing software, on both real and simulated data. The encouraging results provide evidence that our approach, which we will fully develop, implement and evaluated during the funded period, can significantly outperform existing software. Additional functionalities such as mixed short/long read assembly, genome-assisted assembly and joint processing of multiple RNA samples, will be designed and incorporated into the software as part of the proposed project.
描述(由适用提供):RNA-Seq彻底改变了转录组学,是近年来发明的最重要的高通量测序测定法之一。关键的计算问题是从头组装:转录本的重建及其从数十万到数亿个简短读取。由于几个因素的汇合,问题是具有挑战性的:大量不同的转录本(成千上万个),由于替代剪接而对转录本进行了长时间的重复,在转录本之间存在巨大变化的丰度,以及读取错误的存在。现有的汇编器主要是基于启发式考虑而设计的,并实施了导致不可靠的转录组重建的临时方法。准确的RNA-seq组装程序将更准确地鉴定癌症转录组中的融合,模型和非模型生物中的更好的基因注释,以及对驱动发展和调节程序的替代剪接动力学的更完整分析。在此提案中,我们根据信息理论原理为RNA-Seq组装程序设计提供了系统的方法。我们首先确定条件数据,以确保有足够的信息重建转录组,然后提出一个可以用最小信息重建的组件算法。该算法最佳地使用可用的读取信息来解决重复序列并消除歧义同工型。从信息理论方法中得出的一个关键见解是,在成绩单之间而不是复杂性的不一致的广泛丰富性可以作为不同的成绩单的签名来探索,以消除它们之间的歧视。根据我们的最初想法,我们已经在真实和模拟数据上构建,评估和比较了几个现有软件的初始原型。令人鼓舞的结果提供了证据,表明我们将在资助时期充分开发,实施和评估的方法可以大大优于现有软件。作为拟议项目的一部分,将设计并将其整合到软件中,例如混合短/长阅读组装,基因组辅助组装和多个RNA样品的联合处理。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sreeram Kannan其他文献
Sreeram Kannan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sreeram Kannan', 18)}}的其他基金
Defining causal roles of genomic variants on gene regulatory networks with spatiotemporally-resolved single-cell multiomics
通过时空解析的单细胞多组学定义基因组变异对基因调控网络的因果作用
- 批准号:
10297331 - 财政年份:2021
- 资助金额:
$ 45.88万 - 项目类别:
Defining causal roles of genomic variants on gene regulatory networks with spatiotemporally-resolved single-cell multiomics
通过时空解析的单细胞多组学定义基因组变异对基因调控网络的因果作用
- 批准号:
10474569 - 财政年份:2021
- 资助金额:
$ 45.88万 - 项目类别:
Algorithms and Software for Provably Accurate De Novo RNA-Seq Assembly
用于可证明准确的 De Novo RNA-Seq 组装的算法和软件
- 批准号:
9624586 - 财政年份:2015
- 资助金额:
$ 45.88万 - 项目类别:
相似国自然基金
高通量分子模拟驱动机器学习预测分子热力学和输运性质的算法、软件和数据库
- 批准号:21973060
- 批准年份:2019
- 资助金额:66.0 万元
- 项目类别:面上项目
网络优化的实时扰动修复模型、算法、软件与应急决策管理
- 批准号:70471034
- 批准年份:2004
- 资助金额:14.0 万元
- 项目类别:面上项目
若干快速收敛的最优化方法及算法软件研究
- 批准号:19371084
- 批准年份:1993
- 资助金额:4.1 万元
- 项目类别:面上项目
相似海外基金
Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
- 批准号:
133416 - 财政年份:2018
- 资助金额:
$ 45.88万 - 项目类别:
Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
- 批准号:
0916351 - 财政年份:2009
- 资助金额:
$ 45.88万 - 项目类别:
Standard Grant