NEW WORD BASED METHODS FOR DNA SEQUENCE ASSEMBLY

基于新单词的 DNA 序列组装方法

基本信息

批准号：
2536784
负责人：
MAXIMILLIAN A KARLOVITZ
金额：
$ 10万
依托单位：
DANIEL H. WAGNER ASSOCIATES, INC.
依托单位国家：
美国
项目类别：
财政年份：
1998
资助国家：
美国
起止时间：
1998-02-01 至 1999-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/2536784
关键词：
artificial intelligence computer assisted sequence analysis computer program /software computer system design /evaluation method development nucleic acid sequence

项目摘要

The growing use of DNA sequence data in research, databases, diagnostic and therapeutic biotechnology, and even litigation dramatically increases the need to improve the quality of data being used. This proposal addresses the problem of assembling a large set of sequenced DNA fragments into a finished consensus. In order for a sequencing project to produce high quality finished sequence data, the assembly of sequence fragments must be correct and accurate both in its large scale structure and in the fine scale detail of the alignment of individual base calls. We propose to investigate new algorithms for consensus estimation and assembly of DNA sequence fragments. Recent novel word- based approaches to consensus estimation offer promise as a method for de novo assembly and for exploring alternative assemblies on the large scale. This will be especially important when sequences contain large exact or approximate repeats. We propose to develop several main enhancements to these algorithms. In particular, we will develop a global optimization algorithm for determining consensus sequences, replacing current locally optimizing methods. Also, we propose to develop algorithms allowing alternative alignments in regions of ambiguity. This approach will allow us to assess alignment accuracy at both the large and fine scale level. PROPOSED COMMERCIAL APPLICATION Accurate assemblies are at the heart of many sequencing projects central to biopharmaceutical, agricultural, and basic research as well as to the Human Genome Project. The proposed advances will provide the potential for simultaneously increasing reliability and automation in a bioinformatics software market totaling about 100 million dollars per year.

在研究，数据库，诊断中，DNA序列数据的使用日益增长和治疗性生物技术，甚至诉讼增加了提高所使用数据质量的需求。这提案解决了组装大量测序的问题 DNA片段成最终的共识。为了进行测序项目生成高质量的成品数据，组装序列片段必须大规模正确且准确结构以及个人对齐方式的细节基本电话。我们建议研究共识的新算法 DNA序列片段的估计和组装。最近的小说单词 - 基于共识估计的方法提供了承诺作为一种方法从头大会和探索大型组装规模。当序列包含大的序列时，这将特别重要精确或近似重复。我们建议开发几个主要这些算法的增强。特别是，我们将发展一个用于确定共识序列的全局优化算法，替换当前当前优化方法。另外，我们建议开发算法，允许在歧义。这种方法将使我们能够评估一致性的准确性大型和精细的水平。拟议的商业应用准确的组件是许多测序项目中心的核心生物制药，农业和基础研究以及人类基因组项目。拟议的进步将提供潜力同时提高可靠性和自动化生物信息学软件市场总计约1亿美元年。