EMERALD - Enriching MEtagenomics Results using Artificial intelligence and Literature Data

EMERALD - 使用人工智能和文献数据丰富宏基因组学结果

基本信息

  • 批准号:
    BB/S009043/1
  • 负责人:
  • 金额:
    $ 77.25万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2019
  • 资助国家:
    英国
  • 起止时间:
    2019 至 无数据
  • 项目状态:
    已结题

项目摘要

Microbes like bacteria and fungi inhabit diverse environments, including soil, water, and human body sites, such as the mouth, skin and intestine. Ubiquitous in nature, they also show adaptation to extreme environments, such as acid mine drainage or hydrothermal vents. We have appreciated the potential of microbes for a long time - they are important for food and beverage manufacturing (e.g. cheese and beer), and are key players in bioremediation, as demonstrated by their pivotal role in breaking down complex oils following the Deep Horizon oil spill in the Gulf of Mexico. The field of metagenomics offers an exciting opportunity to examine these microbial communities and gain insights into various aspects of their existence, i.e. their interaction with humans and plants, their potential as disease reservoirs, and as sources of novel enzymes with bioremediation or plastic recycling abilities.Metagenomics studies microbial communities by sampling the environments directly, extracting and sequencing their genetic material (DNA), and applying computational methods to elucidate microbial composition and function. This sampling approach helps to characterise unculturable or as yet uncultured microbes in the laboratory. Metagenomics experimental data are typically large (10-100s of GBs per sequencing run; 100s of runs per project), complex (comprising 100-1000s of different microbes) and variable due to the nature of the underlying experiments and (sub-)sampling of the dynamic populations.Despite knowledge about fluxes within a microbial community (e.g. time of year or day), metagenomic datasets typically contain poor descriptions (termed metadata) relating to the sample origin or methods used to obtain the DNA and process the sequence data. To help interpret data across experiments and derive meaningful biological conclusions, it is crucial to know whether a difference between two metagenomics datasets is due to differences in underlying experimental techniques or the biological qualities of the sample. The lack of metadata has impeded our attempts to apply machine learning (ML) techniques to interpret new incoming data, and therefore our capacity to find novel biological applications.To circumvent these issues, our proposal aims to employ different ML methodologies to enrich the currently available metadata and start elucidating new knowledge embedded in the sequence data. The text mining approach will focus on identifying research articles on metagenomics experiments to unearth and extract detailed descriptions which will be used to enrich the metadata associated with the corresponding DNA sequences and generate new or improved classification systems. This dictionary of descriptor terms will also serve as the template for developing methods to discover previously unidentified metagenomics papers. We will train algorithms on this enriched metadata to progressively learn what criteria might be applied to incoming data with inadequate descriptions in order to determine sample origin, processing, as well as decipher which experimental biases affect the results, when comparing similar samples.ML approaches will also be used for the discovery of new biological functions. Bacteria encode gene cassettes that are responsible for producing compounds of pharmaceutical and agricultural value. Functional descriptions for the genes constituting these cassettes are incomplete, while many cassettes still await discovery. By combining the ML and text mining approaches, we intend to better describe these cassettes and also focus on the detection of novel groups.Data underpinning this work will originate from key EMBL-EBI databases, namely EBI Metagenomics and Europe PMC, as well as other resources (e.g. MIBiG). Developments aimed at herein will help resolve complexities underlying experimental data, enriching the metadata in the process and also laying the foundation for a new generation of reliable predictive models.
细菌和真菌等微生物栖息在不同的环境中,包括土壤、水和人体部位,如口腔、皮肤和肠道。它们在自然界中无处不在,也表现出对极端环境的适应能力,例如酸性矿山排水或热液喷口。我们长期以来一直认识到微生物的潜力 - 它们对于食品和饮料制造(例如奶酪和啤酒)非常重要,并且是生物修复的关键参与者,正如它们在分解深地平线石油之后的复杂油类中的关键作用所证明的那样墨西哥湾漏油事件。宏基因组学领域提供了一个令人兴奋的机会来研究这些微生物群落,并深入了解它们存在的各个方面,即它们与人类和植物的相互作用、它们作为疾病储存库的潜力,以及作为具有生物修复或塑料回收能力的新型酶的来源。宏基因组学通过直接对环境进行采样、提取其遗传物质 (DNA) 并对其进行测序,以及应用计算方法来阐明微生物的组成和功能来研究微生物群落。这种采样方法有助于表征实验室中不可培养或尚未培养的微生物。宏基因组学实验数据通常很大(每次测序运行 10-100 个 GB;每个项目运行 100 个)、复杂(包含 100-1000 个不同微生物)并且由于基础实验的性质和(子)采样而存在差异。尽管了解微生物群落内的通量(例如一年中或一天中的时间),但宏基因组数据集通常包含较差的描述(称为元数据)与样本来源或用于获取 DNA 和处理序列数据的方法有关。为了帮助解释实验数据并得出有意义的生物学结论,了解两个宏基因组数据集之间的差异是否是由于基础实验技术或样本生物学质量的差异所致至关重要。元数据的缺乏阻碍了我们应用机器学习 (ML) 技术来解释新传入数据的尝试,从而阻碍了我们寻找新颖的生物应用的能力。为了规避这些问题,我们的建议旨在采用不同的 ML 方法来丰富当前可用的方法元数据并开始阐明嵌入序列数据中的新知识。文本挖掘方法将侧重于识别有关宏基因组学实验的研究文章,以挖掘和提取详细描述,这些描述将用于丰富与相应 DNA 序列相关的元数据,并生成新的或改进的分类系统。这本描述符术语词典还将作为开发方法的模板,以发现以前未识别的宏基因组学论文。我们将在这些丰富的元数据上训练算法,以逐步了解哪些标准可能适用于描述不充分的传入数据,以确定样本来源、处理,以及在比较类似样本时破译哪些实验偏差会影响结果。ML 方法将也可用于发现新的生物功能。细菌编码负责产生具有医药和农业价值的化合物的基因盒。构成这些盒的基因的功能描述并不完整,而许多盒仍在等待发现。通过结合机器学习和文本挖掘方法,我们打算更好地描述这些盒式磁带,并专注于新群体的检测。支持这项工作的数据将来自关键的 EMBL-EBI 数据库,即 EBI Metagenomics 和 Europe PMC,以及其他资源(例如 MIBiG)。本文的开发将有助于解决实验数据背后的复杂性,丰富过程中的元数据,并为新一代可靠的预测模型奠定基础。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Expansion of novel biosynthetic gene clusters from diverse environments using SanntiS
使用 SanntiS 扩展来自不同环境的新型生物合成基因簇
  • DOI:
    10.1101/2023.05.23.540769
  • 发表时间:
    2023-10-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Santiago Sanchez;Joel D. Rogers;Ale;er B Rogers;er;Maaly Nassar;J. Mcentyre;M. Welch;F. Hollfelder;R. Finn
  • 通讯作者:
    R. Finn
Europe PMC in 2023
2023年欧洲PMC
  • DOI:
    http://dx.10.1093/nar/gkad1085
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Rosonovski S
  • 通讯作者:
    Rosonovski S
A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications.
用于从开放获取出版物中发现和丰富宏基因组元数据的机器学习框架。
  • DOI:
    http://dx.10.1093/gigascience/giac077
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    9.2
  • 作者:
    Nassar M
  • 通讯作者:
    Nassar M
A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications
用于从开放获取出版物中发现和丰富宏基因组元数据的机器学习框架
  • DOI:
    10.1093/gigascience/giac077
  • 发表时间:
    2022-08-11
  • 期刊:
  • 影响因子:
    9.2
  • 作者:
    Maaly Nassar;Ale;er B Rogers;er;Francesco Talo';Santiago Sanchez;Zunaira Shafique;R. Finn;J. Mcentyre
  • 通讯作者:
    J. Mcentyre
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Robert Finn其他文献

2-BLOCKS WITH MINIMAL NONABELIAN DEFECT GROUPS
具有最小非纳贝尔缺陷组的 2 块
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    B. E. S. Ambale;F. A. M. Athematik;Paul Balmer;Robert Finn;Sorin Popa;Vyjayanthi Chari;Kefeng Liu;Jie Qing;Daryl Cooper;Jiang;Paul Yang;Silvio Levy
  • 通讯作者:
    Silvio Levy
The small GTPase Rab4A interacts with the central region of cytoplasmic dynein light intermediate chain-1.
小 GTP 酶 Rab4A 与细胞质动力蛋白轻中间链 1 的中心区域相互作用。
Atomistic study of Urbach tail energies in (Al,Ga)N quantum well systems
(Al,Ga)N 量子阱系统中乌尔巴赫尾能的原子研究
F Ur Mathematik in Den Naturwissenschaften Leipzig Singular Solutions of the Capillary Problem
莱比锡自然科学学院数学 毛细管问题的奇异解
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Robert Finn;Robert Weston Neel
  • 通讯作者:
    Robert Weston Neel
BAVARD’S DUALITY THEOREM ON CONJUGATION-INVARIANT NORMS
共轭不变范数的巴伐德对偶定理
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. O. K. Awasaki;Paul Balmer;Robert Finn;Sorin Popa;Vyjayanthi Chari;Kefeng Liu;Igor Pak;Paul Yang;Daryl Cooper;Jiang;Jie Qing;Silvio Levy
  • 通讯作者:
    Silvio Levy

Robert Finn的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Robert Finn', 18)}}的其他基金

Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications
丰富 MGnify 基因组以捕获微生物群的全谱并支持分类学分类
  • 批准号:
    BB/V01868X/1
  • 财政年份:
    2022
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
2020BBSRC-NSF/BIO: REDEFINE - Development of efficient, large-scale metagenomics sequence comparison algorithms to facilitate novel genomic insights
2020BBSRC-NSF/BIO:REDEFINE - 开发高效、大规模的宏基因组序列比较算法,以促进新的基因组见解
  • 批准号:
    BB/W002965/1
  • 财政年份:
    2022
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
  • 批准号:
    BB/T000902/1
  • 财政年份:
    2020
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
EBI Metagenomics - enabling the reconstruction of microbial populations
EBI 宏基因组学 - 实现微生物种群的重建
  • 批准号:
    BB/R015228/1
  • 财政年份:
    2018
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
Bilateral NSF/BIO-BBSRC:A Metagenomics Exchange - enriching analysis by synergistic harmonisation of MG-RAST and the EBI Metagenomics Portal
双边 NSF/BIO-BBSRC:宏基因组学交流 - 通过 MG-RAST 和 EBI 宏基因组学门户的协同协调丰富分析
  • 批准号:
    BB/N018354/1
  • 财政年份:
    2017
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
Expanding Genome3D and disseminating the structural annotations via InterPro and PDBe
通过 InterPro 和 PDBe 扩展 Genome3D 并传播结构注释
  • 批准号:
    BB/N019172/1
  • 财政年份:
    2016
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
EBI Metagenomics Portal - Towards a better understanding of community metabolism
EBI 宏基因组学门户 - 更好地了解群落代谢
  • 批准号:
    BB/M011755/1
  • 财政年份:
    2015
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
14 NSFBIO:Towards detailed and consistent function prediction from protein family databases
14 NSFBIO:从蛋白质家族数据库进行详细且一致的功能预测
  • 批准号:
    BB/N00521X/1
  • 财政年份:
    2015
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
Collaborative Research: Capillary Interfaces
合作研究:毛细管接口
  • 批准号:
    0103954
  • 财政年份:
    2001
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Standard Grant
Proposal for Exploratory Research
探索性研究提案
  • 批准号:
    9729817
  • 财政年份:
    1997
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Standard Grant

相似国自然基金

亮氨酸丰富结构蛋白LRRC71通过促进AR进核介导前列腺癌雄激素非依赖性生长的分子机制研究
  • 批准号:
    82373031
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
接触丰富的零部件装配动态力-位图像学习理论及控制方法
  • 批准号:
    52375519
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
兰科吻兰族的系统学研究及其区域丰富度差异成因探讨
  • 批准号:
    32360063
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
基于NKA/GLT-1通路探讨丰富环境抑制缺血性脑损伤兴奋性毒性的机制研究
  • 批准号:
    82360455
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
丰富环境通过DNA甲基化调控Foxd3/miR-135a-5p通路改善AD小鼠学习记忆的机制研究
  • 批准号:
    82371442
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

Enriching Exhibition Stories: Adding Voices to Quire
丰富展览故事:为Quire添加声音
  • 批准号:
    AH/Y006011/1
  • 财政年份:
    2023
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Research Grant
Enriching ECHO Cohorts with High-risk Pregnancies and Children with Disabilities (Enriching ECHO)
丰富高危妊娠和残疾儿童的 ECHO 队列 (Enriching ECHO)
  • 批准号:
    10746674
  • 财政年份:
    2023
  • 资助金额:
    $ 77.25万
  • 项目类别:
Uncovering the Potential of Aural-Centric Pedagogies in Producing a More Enriching Learning Experience within the Teaching of GCSE-Level History in En
发掘以听觉为中心的教学法在 GCSE 水平历史教学中创造更丰富学习体验的潜力
  • 批准号:
    2854498
  • 财政年份:
    2023
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Studentship
CAREER: Enriching Conversational Information Retrieval via Mixed-Initiative Interactions
职业:通过混合主动交互丰富对话信息检索
  • 批准号:
    2143434
  • 财政年份:
    2022
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Continuing Grant
Collaborative Research: Elements: Enriching Scholarly Communication with Augmented Reality
合作研究:要素:通过增强现实丰富学术交流
  • 批准号:
    2209625
  • 财政年份:
    2022
  • 资助金额:
    $ 77.25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了