Improving overlap-finding techniques for whole genome shotgun data

改进全基因组鸟枪数据的重叠查找技术

基本信息

  • 批准号:
    0312360
  • 负责人:
  • 金额:
    $ 9.94万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2003
  • 资助国家:
    美国
  • 起止时间:
    2003-07-15 至 2005-06-30
  • 项目状态:
    已结题

项目摘要

Yorke A genome (the DNA in a cell) can be represented by asequence of letters called "bases." A large genome can consistof billions of bases. Chemical techniques allow scientists toread only a few hundred bases at a time. The whole genome shotgun(WGS) assembly technique creates a draft of the sequence of awhole genome by selecting such short fragments at random from thegenome, determining the sequence of the fragments, and thencomputationally re-assembling millions of these fragments. Twofragments are said to "overlap" if it is plausible that they comefrom the same part of the genome, based on a comparison of theirsequences. The goal of this project is to focus efforts onproducing an extremely robust set of overlaps, using acombination of sophisticated error-correction techniques, as wellas "localizing" fragments to validate overlaps by ensuring thatboth fragments come from the same vicinity of the genome.Several issues complicate the determination of which pairs offragments overlap. First, most genomes contain many "repeatregions," i.e., two or more almost identical copies of longstretches of sequence. Thus, two fragments that do not actuallyoverlap may look like they do. Second, the random samplingtechnique results in many base errors --- bases can be mis-reador missed entirely. These errors, combined with the fact thatrepeat regions usually differ slightly, make it very difficult todistinguish a spurious overlap from a true overlap in which oneor both fragments contain read errors. Thus, if extreme care isnot taken, it is easy to use a spurious overlap and therebymistakenly connect distant parts of the genome. Preliminaryresults in collaboration with Celera Genomics, the Baylor Collegeof Medicine, and The Institute for Genomic Research (TIGR) havedemonstrated that the investigator's current techniques canalready produce more sequence at higher quality. The goal isimprove these techniques and make them widely available. The determination and interpretation of genetic informationis one of the great challenges of the twenty-first century. Thegenome, i.e., all the DNA in a cell, is the molecular basis ofdiversity and the cornerstone of genetic information. Draftgenomes have been obtained for human, mouse, and some insects,fish, plants, and bacteria. This is a start, but a fullunderstanding of biological processes cannot be had by studyingthe genomes of only a handful of species. The federal governmentis spending about 100 million dollars per year generatingsequence data. Millions of small pieces of a genome are sampledfrom the genome. The second stage is called "assembly," whenthese pieces are re-assembled on a computer like a giant jigsawpuzzle. The puzzle is complicated by two facts: first, many ofthe puzzle pieces have small errors that make them mis-fitagainst pieces that they SHOULD fit with; and second, many piecesthat should NOT go together actually fit together quite well.This makes it extremely difficult to correctly assemble a genome.There are two ways to decrease the ambiguities: first, one couldgenerate more pieces. However, each new piece costs about $2,and one would need to generate millions of new pieces to have asignificant effect on assembly quality. The investigators use asecond route. They attempt to squeeze as much information out ofthe existing pieces as possible. The latter route issubstantially cheaper, and there is still much room forimprovement here over existing techniques. The investigators areusing sophisticated mathematics to help discern with extremeprecision those pairs of pieces that do, and those that do not,fit together. Preliminary results of the investigators -- incollaboration with several large sequencing centers -- havedemonstrated that using their techniques to "pre-process" thepieces can produce more of the genome, with fewer errors. Thisproject aims at extending these ideas further and making themfreely accessible to all investigators. The impact on the federalgenome (biotechnology) projects is potentially great.
约克一个基因组(细胞中的DNA)可以通过称为“碱”字母的字母来表示。 大型基因组可以组成数十亿个基础。 化学技术使科学家一次只能进行几百个基础。 整个基因组shot弹枪(WGS)组装技术通过从基因组中随机选择这样的短片段,确定片段的序列,并在这些片段上重新组装数百万这些片段,从而创造出awhole基因组序列的草稿。 据说,如果基于对序列的比较,据说双碎片是“重叠”的。 该项目的目的是通过使用复杂的错误纠正技术的结合来重点努力,从而产生一组极强的重叠集,以及通过确保从基因组的相同附近的片段来验证重叠的重叠的wellas“本地化”片段,从而使Piairs ofs offeragments offragments Replap Replap reclagments conterrap coptiment of Genome。 首先,大多数基因组都包含许多“ repeAtregions”,即两个或更多几乎相同的序列副本。 因此,两个实际上没有盖的片段看起来像它们一样。 其次,随机采样技术会导致许多基本错误 - 基地可能完全错过了错误的读数。 这些错误,再加上我通常会略有不同的事实,因此很难从一个真正的重叠中进行虚假的重叠,在这种重叠中,两个片段都包含读取错误。 因此,如果不采取极端的注意,就很容易使用虚假的重叠,从而很容易将基因组的遥远部分连接起来。 与Celera Genomics,Baylor College Ofichics和Genomic Research研究所(TIGR)合作的初步研究表明,研究人员的当前技术运载运动可在较高质量上产生更多的序列。 目标是这些技术,并使其广泛使用。 遗传信息的确定和解释是二十一世纪的重大挑战之一。 基因组,即细胞中的所有DNA,是多样性的分子基础,也是遗传信息的基石。 已经为人,小鼠和一些昆虫,鱼类,植物和细菌获得了草稿组。 这是一个开始,但是只能研究少数物种的基因组,就无法对生物过程进行充分的了解。 联邦政府每年花费约1亿美元生成序列数据。 基因组取样了数百万个基因组。 第二阶段称为“组装”,当零件像巨型拼图一样重新组装在计算机上。 这个难题是两个事实变得复杂的:首先,许多拼图都有很小的错误,使它们适合他们应该适合的碎片;其次,许多片段不应该很好地融合在一起。这使得正确组装基因组变得非常困难。有两种减少歧义的方法:首先,一个可以生成更多的碎片。 但是,每个新作品的价格约为2美元,并且需要产生数百万个新作品,以使组装质量产生无关紧要的效果。 调查人员使用Asecond路线。 他们试图从现有作品中挤出尽可能多的信息。 后者的路线更便宜,并且在现有技术方面仍然有很多房间的室外改进。 调查人员正在使用复杂的数学,以帮助辨别那些做有的碎片的极端分解,而那些不适合的碎片。 研究人员的初步结果 - 与几个大型测序中心的不销售 - Havedendsundextiment表明,使用其技术来“预处理”可能会产生更多的基因组,而误差较少。 该项目旨在进一步扩展这些想法,并使所有调查人员都可以访问它们。 对联邦(Biotechnology)项目的影响可能很大。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

James Yorke其他文献

James Yorke的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('James Yorke', 18)}}的其他基金

Mathematical Modeling of DNA Repeats and HIV Epidemics
DNA 重复和 HIV 流行的数学模型
  • 批准号:
    0616585
  • 财政年份:
    2006
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Applications of Nonlinear Dynamics
非线性动力学的应用
  • 批准号:
    0104087
  • 财政年份:
    2001
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Chaos with Multiple Positive Lyapunov Exponents
具有多个正李亚普诺夫指数的混沌
  • 批准号:
    9870183
  • 财政年份:
    1998
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: "Chaos with Multiple Positive Lyapunov Exponents
数学科学:“具有多个正李雅普诺夫指数的混沌
  • 批准号:
    9423843
  • 财政年份:
    1995
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Attractor Reconstruction from Experimental Data
根据实验数据重建吸引子
  • 批准号:
    9116391
  • 财政年份:
    1992
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Bifurcation and Global Continuation
数学科学:分岔和全局延拓
  • 批准号:
    8117967
  • 财政年份:
    1982
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Qualitative Behavior For Generalized Dynamical Processes
广义动态过程的定性行为
  • 批准号:
    7818221
  • 财政年份:
    1979
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Qualitative Behavior For Generalized Dynamical Processes
广义动态过程的定性行为
  • 批准号:
    7624432
  • 财政年份:
    1976
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Qualitative Behavior For Generalized Dynamical Processes
广义动态过程的定性行为
  • 批准号:
    7424310
  • 财政年份:
    1974
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant

相似国自然基金

社交化学习环境下面向动态异质学习者关系网络的重叠社区发现方法研究
  • 批准号:
    62077045
  • 批准年份:
    2020
  • 资助金额:
    48 万元
  • 项目类别:
    面上项目
融合上下文信息和重叠社区发现的个性化位置推荐方法研究
  • 批准号:
    61806083
  • 批准年份:
    2018
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目
基于图聚集技术的微博用户重叠社区发现方法研究
  • 批准号:
    61762078
  • 批准年份:
    2017
  • 资助金额:
    39.0 万元
  • 项目类别:
    地区科学基金项目
异质多社交网络信息融合与热点事件多维演化
  • 批准号:
    61772133
  • 批准年份:
    2017
  • 资助金额:
    65.0 万元
  • 项目类别:
    面上项目
层次粒化的不确定多态网络重叠社区发现方法研究
  • 批准号:
    61503273
  • 批准年份:
    2015
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Exploring the overlap between neurodevelopmental disorders and traits with adolescent hypomania
探索神经发育障碍和青少年轻躁狂特征之间的重叠
  • 批准号:
    2886920
  • 财政年份:
    2023
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Studentship
The cardiovascular consequences of sleep apnea plus COPD (Overlap syndrome)
睡眠呼吸暂停加慢性阻塞性肺病(重叠综合征)对心血管的影响
  • 批准号:
    10733384
  • 财政年份:
    2023
  • 资助金额:
    $ 9.94万
  • 项目类别:
Domestic Abuse Proceedings In Family Courts: Overlap And Pathways In Private And Public Family Justice
家庭法院的家庭暴力诉讼:私人和公共家庭司法的重叠和途径
  • 批准号:
    ES/X011399/1
  • 财政年份:
    2023
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Fellowship
Integrating Epidemiologic and Genomic Data to Elucidate the Genetic Overlap Between Congenital Anomalies and Pediatric Cancer
整合流行病学和基因组数据来阐明先天性异常和儿童癌症之间的遗传重叠
  • 批准号:
    10749761
  • 财政年份:
    2023
  • 资助金额:
    $ 9.94万
  • 项目类别:
The Changing Structure of the International Court of Justice: Overlap of Dispute Settlement and International Control
国际法院结构的变化:争端解决与国际控制的重叠
  • 批准号:
    23K01112
  • 财政年份:
    2023
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了