Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
基本信息
- 批准号:8020878
- 负责人:
- 金额:$ 39.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-09-27 至 2013-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsBlast CellChimerismClinicalComputer softwareComputersComputing MethodologiesConsensusConserved SequenceDNADataData AnalysesData SetGenesGenomeGoalsHealthHomologous GeneHourHumanHuman MicrobiomeImageryIndividualInformaticsInternetMapsMetagenomicsMethodsModelingMorphologic artifactsPerformanceProcessProtocols documentationReadingRecruitment ActivityResearch PersonnelResourcesRibosomal RNARunningSamplingSolidSpeedStatistical MethodsTechnologyTestingTimeVariantWorkbaseexhaustgene discoveryheuristicsimprovedmetagenomic sequencingmicrobiomenew technologynext generationnovelopen sourceprogramspublic health relevancetool
项目摘要
DESCRIPTION (provided by applicant): The human microbiota is thought to have profound influence on human health. The goal of the Human Microbiome Project (HMP) is to expand our understanding in human microbiome by generating reference microbiome genomes, identifying "core" genomes, studying their variation related to human health, and developing new technologies and informatics tools. Huge amounts of sequences in HMP have been generated utilizing metagenomics and next-generation sequencing technologies. It is becoming very challenging for existing resources and methods to manage and analyze the HMP data. The challenges are not only imposed by the huge volume but also by the great diversity and complexity of sequence data. To address these challenges, we propose several new computational methods to rapidly and effectively analyze very large HMP datasets. (1) Consensus-based meta-assembler and pre-assembly processing. It is to significantly improve the assembly of metagenomic sequences. Instead of developing another assembly program, we will build a meta-assembler on top of available assemblers. We will also develop a pre-assembly protocol to filter and handle extra redundant and problematic sequences. (2) Fast fragment recruitment and large-scale clustering. We plan to develop a fast program to align raw metagenomic reads to reference or homolog genomes. It is to fill the gaps between very fast but very stringent mapping programs (e.g. Bowtie), very slow but very sensitive aligning programs (e.g. BLAST), and fast but less sensitive ones (e.g. BLAT). We also plan to enable our clustering program CD-HIT to handle really large next-generation sequences. (3) Dedicated utilities for annotation and comparison of metagenomes. In recent year, we developed a HMM-based method for identification of rRNAs from raw reads, a fast method to identify artificial 454 duplicates, an automated workflow for metagenome annotation, a rapid and reliable reciprocal sequence comparing protocol, and a statistical method to compare many metagenomes with a unique visualization interface. We plan to improve these metagenomics- specific tools to achieve much better speed, performance and capability. The methods will be available as open source software, as web servers or both. We have obtained very promising preliminary results. The proposed tools will effectively help researchers in HMP data analysis. Other HMP related informatics tools in gene prediction, binning and assembly will greatly benefit from our proposed works.
PUBLIC HEALTH RELEVANCE: The large amount of sequence data from the Human Microbiome Project (HMP) creates great challenges in data analysis. This proposal aims at addressing these challenges by developing novel and effective computational methods in metagenome assembly, annotation and comparison. The proposed methods will help researchers in preliminary data analysis, annotation, clinical sample comparison, novel gene discovery and other analysis in a very rapid way.
描述(由申请人提供):人类微生物群被认为对人类健康具有深远的影响。人类微生物组计划 (HMP) 的目标是通过生成参考微生物组基因组、识别“核心”基因组、研究其与人类健康相关的变异以及开发新技术和信息学工具来扩大我们对人类微生物组的了解。 HMP 中的大量序列是利用宏基因组学和下一代测序技术生成的。管理和分析 HMP 数据的现有资源和方法变得非常具有挑战性。挑战不仅来自于序列数据的巨大数量,还来自于序列数据的巨大多样性和复杂性。为了应对这些挑战,我们提出了几种新的计算方法来快速有效地分析非常大的 HMP 数据集。 (1)基于共识的元组装器和预组装处理。是为了显着改善宏基因组序列的组装。我们将在可用的汇编器之上构建一个元汇编器,而不是开发另一个汇编程序。我们还将开发一个预组装协议来过滤和处理额外的冗余和有问题的序列。 (2)快速片段招募和大规模聚类。我们计划开发一个快速程序,将原始宏基因组读数与参考或同源基因组进行比对。它是为了填补非常快但非常严格的映射程序(例如 Bowtie)、非常慢但非常敏感的对齐程序(例如 BLAST)和快速但不太敏感的程序(例如 BLAT)之间的差距。我们还计划使我们的聚类程序 CD-HIT 能够处理非常大的下一代序列。 (3) 用于宏基因组注释和比较的专用实用程序。近年来,我们开发了一种基于 HMM 的方法,用于从原始读段中识别 rRNA,一种识别人工 454 个重复的快速方法,一种宏基因组注释的自动化工作流程,一种快速可靠的相互序列比较协议,以及一种比较的统计方法许多宏基因组具有独特的可视化界面。我们计划改进这些宏基因组学专用工具,以实现更好的速度、性能和能力。 这些方法将以开源软件、网络服务器或两者的形式提供。我们已经获得了非常有希望的初步结果。所提出的工具将有效帮助研究人员进行 HMP 数据分析。其他 HMP 相关的基因预测、分箱和组装信息学工具将极大地受益于我们提出的工作。
公共卫生相关性:人类微生物组计划 (HMP) 的大量序列数据给数据分析带来了巨大挑战。该提案旨在通过在宏基因组组装、注释和比较方面开发新颖且有效的计算方法来应对这些挑战。所提出的方法将帮助研究人员以非常快速的方式进行初步数据分析、注释、临床样本比较、新基因发现和其他分析。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Weizhong Li其他文献
Weizhong Li的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Weizhong Li', 18)}}的其他基金
A study of antibiotics usage on early gut microbiome colonization and establishment in young children
抗生素使用对幼儿早期肠道微生物定植和建立的研究
- 批准号:
10113538 - 财政年份:2020
- 资助金额:
$ 39.48万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8150493 - 财政年份:2010
- 资助金额:
$ 39.48万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8294893 - 财政年份:2010
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7892867 - 财政年份:2009
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7495498 - 财政年份:2008
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7682840 - 财政年份:2008
- 资助金额:
$ 39.48万 - 项目类别:
相似国自然基金
随机阻尼波动方程的高效保结构算法研究
- 批准号:12301518
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大规模黎曼流形稀疏优化算法及应用
- 批准号:12371306
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于任意精度计算架构的量子信息处理算法硬件加速技术研究
- 批准号:62304037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于物理信息神经网络的雷达回波资料反演蒸发波导算法研究
- 批准号:42305048
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts
开发机器学习模型来分析大型异构队列中的拼接数据
- 批准号:
10672974 - 财政年份:2021
- 资助金额:
$ 39.48万 - 项目类别:
Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts
开发机器学习模型来分析大型异构队列中的拼接数据
- 批准号:
10315802 - 财政年份:2021
- 资助金额:
$ 39.48万 - 项目类别:
Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts
开发机器学习模型来分析大型异构队列中的拼接数据
- 批准号:
10506326 - 财政年份:2021
- 资助金额:
$ 39.48万 - 项目类别:
Validating Cases of Dementia and Mild Cognitive Impairment in OEF/OIF Veterans
验证 OEF/OIF 退伍军人的痴呆症和轻度认知障碍病例
- 批准号:
9033326 - 财政年份:2016
- 资助金额:
$ 39.48万 - 项目类别:
Validating Cases of Dementia and Mild Cognitive Impairment in OEF/OIF Veterans
验证 OEF/OIF 退伍军人的痴呆症和轻度认知障碍病例
- 批准号:
9198736 - 财政年份:2016
- 资助金额:
$ 39.48万 - 项目类别: