2020BBSRC-NSF/BIO: REDEFINE - Development of efficient, large-scale metagenomics sequence comparison algorithms to facilitate novel genomic insights

2020BBSRC-NSF/BIO:REDEFINE - 开发高效、大规模的宏基因组序列比较算法,以促进新的基因组见解

基本信息

  • 批准号:
    BB/W002965/1
  • 负责人:
  • 金额:
    $ 63.57万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2022
  • 资助国家:
    英国
  • 起止时间:
    2022 至 无数据
  • 项目状态:
    未结题

项目摘要

Microbes are ubiquitous and perform essential roles that help sustain life on earth, for e.g. environmental oxygenation, soil nutrient cycling to support plant growth or facilitating animal digestion. They cause many diseases in plants and animals and have the ability to rapidly evolve to exploit new niches and/or combat antimicrobials. A relatively new field, metagenomics is a culture independent method that applies sophisticated DNA sequencing technologies to analyse the total microbial genetic material from any environment. It is now possible to reassemble the millions of short DNA sequences to produce representations of the microbial genomes in a sample, termed metagenome assembled genomes (MAGs), especially for bacteria. While this approach remains computationally expensive, the computer algorithms used to recover these genomes have been substantially improved to increase accuracy of MAGs. Just in the past five years, many large-scale studies, including our own, have successfully applied these techniques to cumulatively generate millions of MAGs. This has provided scientists with novel insights into ~99% of organisms yet to be experimentally cultured and dramatically expanded the Tree of Life. These MAGs are reshaping our understanding of microbial community structure and the functional capacities of constituent members. This explosion in MAG numbers nevertheless presents new challenges. These large-scale analyses can generate genomes at magnitudes that match GenBank's large genome collection, which is derived from traditional techniques of sequencing experimentally isolated microbes. Such genome collections have taken decades to build and are managed by large data centres. Yet, there is now the need for groups to routinely perform comparisons between new MAG collections and such large reference genome collections. We propose to use a particular class of algorithm called MinHash, which rapidly estimates similarity between two sets based on the number of shared entities, in our case short sequences. Most implementations of this approach have focused on the rapid comparison of one genome to another. In this proposal, we aim to use a range of computational techniques to enable the comparison of a large query dataset to a large reference database, with the purview of being applied to microbial genomes, MAG collections and metagenomic sequences. We will develop and apply this tool to a range of datasets, particularly those housed in MGnify, a leading database of metagenomic data. The key applications are the identification of errors in MAGs which were introduced by the computational methods, data reduction by identifying duplicate MAGs between datasets, the rapid incorporation of MAGs into catalogues of genomes that have been found in a particular environment, taxonomic classification of MAGs (by converting similarity distances to evolutionary distances), and the profiling of metagenome datasets to determine which genomes are likely to be found. The latter set of profiles will also enable the delineation of datasets that are poorly characterised by MAG/genome collections and prioritise them for analysis (i.e. MAG generation). The outputs of this proposal are manifold. The first is a suite of software tools and associated workflows that can be installed and run on the computer command line. The application of the tool will lead to multiple new data outputs (refined MAGs, improved catalogues and metagenomic profiles) which will be made available via MGnify's web interfaces. To provide rapid access to these MAG catalogues, we will also deploy new web interfaces (implementing the new tools) that allow users to compare their own MAGs against established collections. This will not only democratise scientific research but also reduce the need for data duplication. We will also use specific use cases to demonstrate the utility of our tools and provide training and support for their use.
微生物无处不在,在维持地球生命方面发挥着重要作用,例如:环境氧合、土壤养分循环以支持植物生长或促进动物消化。它们会在植物和动物中引起许多疾病,并且能够快速进化以开发新的生态位和/或对抗抗菌药物。宏基因组学是一个相对较新的领域,是一种独立于培养物的方法,它应用复杂的 DNA 测序技术来分析来自任何环境的总微生物遗传物质。现在可以重新组装数百万个短 DNA 序列,以生成样本中微生物基因组的表示,称为宏基因组组装基因组 (MAG),尤其是细菌。虽然这种方法的计算成本仍然很高,但用于恢复这些基因组的计算机算法已得到大幅改进,以提高 MAG 的准确性。就在过去的五年里,包括我们自己在内的许多大规模研究已经成功地应用这些技术累计生成了数百万个 MAG。这为科学家提供了对约 99% 尚未进行实验培养的生物体的新见解,并极大地扩展了生命之树。这些 MAG 正在重塑我们对微生物群落结构和组成成员功能的理解。尽管如此,MAG 数量的爆炸式增长也带来了新的挑战。这些大规模分析可以生成与 GenBank 的大型基因组集合相匹配的基因组,该集合源自对实验分离的微生物进行测序的传统技术。这样的基因组集合花费了数十年的时间才建立起来,并由大型数据中心管理。然而,现在小组需要定期对新的 MAG 集合和如此大的参考基因组集合进行比较。我们建议使用一类称为 MinHash 的特定算法,该算法根据共享实体(在我们的例子中是短序列)的数量快速估计两个集合之间的相似性。这种方法的大多数实施都集中在一个基因组与另一个基因组的快速比较上。在本提案中,我们的目标是使用一系列计算技术来将大型查询数据集与大型参考数据库进行比较,并将其应用于微生物基因组、MAG 集合和宏基因组序列。我们将开发该工具并将其应用于一系列数据集,特别是位于领先的宏基因组数据数据库 MGnify 中的数据集。关键应用是识别由计算方法引入的 MAG 中的错误、通过识别数据集之间重复的 MAG 来减少数据、将 MAG 快速纳入在特定环境中发现的基因组目录、MAG 的分类(通过将相似性距离转换为进化距离),以及元基因组数据集的分析以确定可能发现哪些基因组。后一组配置文件还将能够描绘 MAG/基因组集合特征较差的数据集,并优先考虑它们进行分析(即 MAG 生成)。该提案的成果是多方面的。第一个是一套软件工具和相关工作流程,可以在计算机命令行上安装和运行。该工具的应用将带来多种新的数据输出(精炼的 MAG、改进的目录和宏基因组图谱),这些数据将通过 MGnify 的网络界面提供。为了快速访问这些 MAG 目录,我们还将部署新的 Web 界面(实施新工具),允许用户将自己的 MAG 与已建立的馆藏进行比较。这不仅将使科学研究民主化,还能减少数据重复的需求。我们还将使用具体的用例来展示我们工具的实用性,并为其使用提供培训和支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MGnify Genomes: A Resource for Biome-specific Microbial Genome Catalogues.
MGnify Genomes:生物群系特定微生物基因组目录的资源。
  • DOI:
    http://dx.10.1016/j.jmb.2023.168016
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    5.6
  • 作者:
    Gurbich TA
  • 通讯作者:
    Gurbich TA
Ensembl 2024.
合奏 2024。
  • DOI:
    http://dx.10.1093/nar/gkad1049
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Harrison PW
  • 通讯作者:
    Harrison PW
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Robert Finn其他文献

2-BLOCKS WITH MINIMAL NONABELIAN DEFECT GROUPS
具有最小非纳贝尔缺陷组的 2 块
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    B. E. S. Ambale;F. A. M. Athematik;Paul Balmer;Robert Finn;Sorin Popa;Vyjayanthi Chari;Kefeng Liu;Jie Qing;Daryl Cooper;Jiang;Paul Yang;Silvio Levy
  • 通讯作者:
    Silvio Levy
The small GTPase Rab4A interacts with the central region of cytoplasmic dynein light intermediate chain-1.
小 GTP 酶 Rab4A 与细胞质动力蛋白轻中间链 1 的中心区域相互作用。
Atomistic study of Urbach tail energies in (Al,Ga)N quantum well systems
(Al,Ga)N 量子阱系统中乌尔巴赫尾能的原子研究
Petersberg Papers on Afghanistan and the Region
关于阿富汗和该地区的彼得斯堡文件
  • DOI:
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wolfgang F. Danspeckgruber;Rangin Dadfar Spanta;Volker Stanzel;Rita Kieber;W. Maley;A. Wardak;A. Tarzi;Leanne Smith;A. Saikal;Susanne Schmeidl;M. Jansen;T. Ruttig;N. Banerjee;N. Bizhan;Zahir Tanin;Mahmoud Saikal;R. D. Mullen;V. Sahni;Carol Wang;Robert Finn
  • 通讯作者:
    Robert Finn
The shape of a pendant liquid drop
悬垂液滴的形状

Robert Finn的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Robert Finn', 18)}}的其他基金

Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications
丰富 MGnify 基因组以捕获微生物群的全谱并支持分类学分类
  • 批准号:
    BB/V01868X/1
  • 财政年份:
    2022
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
  • 批准号:
    BB/T000902/1
  • 财政年份:
    2020
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
EMERALD - Enriching MEtagenomics Results using Artificial intelligence and Literature Data
EMERALD - 使用人工智能和文献数据丰富宏基因组学结果
  • 批准号:
    BB/S009043/1
  • 财政年份:
    2019
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
EBI Metagenomics - enabling the reconstruction of microbial populations
EBI 宏基因组学 - 实现微生物种群的重建
  • 批准号:
    BB/R015228/1
  • 财政年份:
    2018
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
Bilateral NSF/BIO-BBSRC:A Metagenomics Exchange - enriching analysis by synergistic harmonisation of MG-RAST and the EBI Metagenomics Portal
双边 NSF/BIO-BBSRC:宏基因组学交流 - 通过 MG-RAST 和 EBI 宏基因组学门户的协同协调丰富分析
  • 批准号:
    BB/N018354/1
  • 财政年份:
    2017
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
Expanding Genome3D and disseminating the structural annotations via InterPro and PDBe
通过 InterPro 和 PDBe 扩展 Genome3D 并传播结构注释
  • 批准号:
    BB/N019172/1
  • 财政年份:
    2016
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
EBI Metagenomics Portal - Towards a better understanding of community metabolism
EBI 宏基因组学门户 - 更好地了解群落代谢
  • 批准号:
    BB/M011755/1
  • 财政年份:
    2015
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
14 NSFBIO:Towards detailed and consistent function prediction from protein family databases
14 NSFBIO:从蛋白质家族数据库进行详细且一致的功能预测
  • 批准号:
    BB/N00521X/1
  • 财政年份:
    2015
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
Collaborative Research: Capillary Interfaces
合作研究:毛细管接口
  • 批准号:
    0103954
  • 财政年份:
    2001
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Standard Grant
Proposal for Exploratory Research
探索性研究提案
  • 批准号:
    9729817
  • 财政年份:
    1997
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Standard Grant

相似国自然基金

SYNJ1蛋白片段通过促进突触蛋白NSF聚集在帕金森病发生中的机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
NSF蛋白亚硝基化修饰所介导的GluA2 containing-AMPA受体膜稳定性在卒中后抑郁中的作用及机制研究
  • 批准号:
    82071300
  • 批准年份:
    2020
  • 资助金额:
    55 万元
  • 项目类别:
    面上项目
参加中美(NSFC-NSF)生物多样性项目评审会
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    2 万元
  • 项目类别:
    国际(地区)合作与交流项目
中美(NSFC-NSF)EEID联合评审会
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    2.6 万元
  • 项目类别:
    国际(地区)合作与交流项目
中美(NSFC-NSF)EEID联合评审会
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    1.2 万元
  • 项目类别:
    国际(地区)合作与交流项目

相似海外基金

RII Track-4: NSF: Bio-inspired Solutions to Prevent Soil Erosion in Farmland and Scouring in Fluvial Regions
RII Track-4:NSF:防止农田水土流失和河流地区冲刷的仿生解决方案
  • 批准号:
    2327384
  • 财政年份:
    2024
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Standard Grant
NSF Postdoctoral Fellowship in Biology: Human Domestication of Maize as Bio-cultural Coevolution
美国国家科学基金会生物学博士后奖学金:人类驯化玉米作为生物文化协同进化
  • 批准号:
    2305694
  • 财政年份:
    2024
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Fellowship Award
22-BBSRC/NSF-BIO - Interpretable & Noise-robust Machine Learning for Neurophysiology
22-BBSRC/NSF-BIO - 可解释
  • 批准号:
    BB/Y008758/1
  • 财政年份:
    2024
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
  • 批准号:
    BB/Y001117/1
  • 财政年份:
    2024
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Research Grant
NSF Postdoctoral Fellowship in Biology: Human Domestication of Maize as Bio-cultural Coevolution
美国国家科学基金会生物学博士后奖学金:人类驯化玉米作为生物文化协同进化
  • 批准号:
    2305694
  • 财政年份:
    2024
  • 资助金额:
    $ 63.57万
  • 项目类别:
    Fellowship Award
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了