Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation

提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性

基本信息

  • 批准号:
    BB/R014892/1
  • 负责人:
  • 金额:
    $ 79.16万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2018
  • 资助国家:
    英国
  • 起止时间:
    2018 至 无数据
  • 项目状态:
    已结题

项目摘要

Evolution has given rise to families of protein domains where relatives are linked through speciation events or duplication events in the same genome. Extensive domain duplication and shuffling gives multi-domain proteins with varying functions depending on the domain composition.The CATH classification takes the domain as the primary evolutionary unit and classifies relatives having significantly similar structures and sequence patterns. Currently there are 5500 CATH superfamilies containing 93 million domains. Previous funding allowed us to hugely increase the number of domains in CATH. We want to keep increasing this data - even bigger expansions are expected as new technologies make it easier to solve structures and capture sequence data. We will improve the accuracy of our domain data by working with other classification experts (Alexey Murzin of SCOP) to establish a shared domain recognition platform for new domains at the European Bioinformatics Institute, with difficult assignments jointly validated by CATH/SCOP experts. This data will be public and valuable for other resources (eg SCOPe, ECOD).CATH has been established for 22 years and is renowned for providing accurate structural annotations for biological analyses. More recently it significantly increased its value to the biology community by providing functional predictions. Although the structural core of the superfamily is highly conserved, variations away from the core cause changes in function. CATH addresses this by grouping evolutionary relatives likely to have highly similar functions and structures into functional families (FunFams). Thus FunFams can accurately inherit information about structures and functions, between relatives. This is important as <10% of domains have been experimentally characterised. We verified in-silico that FunFams can accurately model structures of uncharacterised relatives and the ability of FunFams to inherit functional information between relatives has been validated by an international competition - CAFA. We will make the FunFams much more comprehensive and increase the accuracy of FunFams for enzymes.Extending our FunFam library will allow us to predict more accurate multi-domain annotations in genome sequences. This will help biologists comparing the genomes of organisms occupying different environmental niches, as identification of diverse domain combinations can hint at changes in the functional repertoires of the organisms and different abilities to exploit compounds in their environments.Because relatives in FunFams are so structurally conserved we can align and superpose them to extract the characteristics of this conserved structural core and use this information to build a '3D core-template'. These templates will help solve the structures of many more relatives since powerful new structural biology techniques (eg cryo-EM) can use core libraries like these to model the structures of uncharacterised proteins from electron dispersion data.In another exciting development for CATH we will harness the structural data and the additional power that comes from 200-fold greater sequence data to find residue sites in the protein, conserved throughout evolution for their functional importance. We will characterise these sites. We already predict functional sites well from conservation patterns in sequence data, but including structural data can help distinguish the type of site (eg site binding a compound or another protein) and identify additional residues involved in the functional mechanism. This data is valuable for protein design and understanding why mutations near these sites affect the protein and cause disease.We will disseminate our data via webpages and other web mechanisms and develop e-videos and training material for the new features. We'll also build more efficient mechanisms for scanning our website and for biologists to install our tools on their own computers to analyse genome data.
进化产生了蛋白质结构域家族,其中亲属通过同一基因组中的物种形成事件或复制事件联系起来。广泛的结构域复制和改组使多结构域蛋白具有不同的功能,具体取决于结构域的组成。CATH 分类将结构域作为主要进化单位,并对具有显着相似结构和序列模式的亲属进行分类。目前有 5500 个 CATH 超家族,包含 9300 万个域名。之前的资金使我们能够大幅增加 CATH 中的域名数量。我们希望继续增加这些数据——随着新技术使解决结构和捕获序列数据变得更容易,预计会有更大的扩展。我们将与其他分类专家(SCOP 的 Alexey Murzin)合作,在欧洲生物信息学研究所建立新领域的共享领域识别平台,并由 CATH/SCOP 专家共同验证困难的任务,从而提高领域数据的准确性。这些数据将是公开的,并且对其他资源(例如 SCOPe、ECOD)很有价值。CATH 已成立 22 年,以为生物分析提供准确的结构注释而闻名。最近,它通过提供功能预测显着增加了其对生物学界的价值。尽管超家族的结构核心高度保守,但远离核心的变异会导致功能变化。 CATH 通过将可能具有高度相似功能和结构的进化亲属分组为功能家族 (FunFams) 来解决这个问题。因此,FunFams 可以准确地继承亲属之间的结构和功能信息。这一点很重要,因为不到 10% 的域已经过实验表征。我们通过计算机验证,FunFams 可以准确地模拟无特征亲属的结构,并且 FunFams 继承亲属之间功能信息的能力已得到国际竞赛 CAFA 的验证。我们将使 FunFams 更加全面,并提高 FunFams 对酶的准确性。扩展我们的 FunFam 库将使我们能够预测基因组序列中更准确的多域注释。这将有助于生物学家比较占据不同环境生态位的生物体的基因组,因为不同结构域组合的识别可以暗示生物体功能库的变化以及在其环境中利用化合物的不同能力。因为 FunFams 中的亲戚在结构上非常保守,我们可以将它们对齐并叠加以提取该保守结构核心的特征,并使用此信息构建“3D 核心模板”。这些模板将有助于解决更多亲属的结构,因为强大的新结构生物学技术(例如冷冻电镜)可以使用像这样的核心库来根据电子色散数据对未表征的蛋白质的结构进行建模。在 CATH 的另一个令人兴奋的发展中,我们将利用结构数据和来自 200 倍大序列数据的额外能力可用于查找蛋白质中的残基位点,这些残基位点在整个进化过程中因其功能重要性而得以保守。我们将描述这些站点的特征。我们已经根据序列数据中的保守模式很好地预测了功能位点,但包括结构数据可以帮助区分位点的类型(例如结合化合物或另一种蛋白质的位点)并识别功能机制中涉及的其他残基。这些数据对于蛋白质设计和理解为什么这些位点附近的突变影响蛋白质并导致疾病非常有价值。我们将通过网页和其他网络机制传播我们的数据,并为新功能开发电子视频和培训材料。我们还将建立更有效的机制来扫描我们的网站,并让生物学家在他们自己的计算机上安装我们的工具来分析基因组数据。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units.
KinFams:使用 CATH 功能单元对蛋白激酶进行从头分类。
  • DOI:
    http://dx.10.3390/biom13020277
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    5.5
  • 作者:
    Adeyelu T
  • 通讯作者:
    Adeyelu T
SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals.
SARS-CoV-2 刺突蛋白预计会与多种哺乳动物的宿主受体蛋白直系同源物形成复合物。
  • DOI:
    http://dx.10.1038/s41598-020-71936-5
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Lam SD
  • 通讯作者:
    Lam SD
Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds
通过蛋白质结构追踪进化:数千倍的自然捕捉
CATH functional families predict functional sites in proteins.
CATH 功能家族预测蛋白质中的功能位点。
PDBe: improved findability of macromolecular structure data in the PDB.
PDBe:改进了 PDB 中大分子结构数据的可查找性。
  • DOI:
    http://dx.10.1093/nar/gkz990
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Armstrong DR
  • 通讯作者:
    Armstrong DR
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Christine Orengo其他文献

Christine Orengo的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Christine Orengo', 18)}}的其他基金

Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
  • 批准号:
    BB/X018563/1
  • 财政年份:
    2024
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
  • 批准号:
    BB/Y001117/1
  • 财政年份:
    2024
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
ProtFunAI: AI based methods for functional annotation of proteins in crop genomes
ProtFunAI:基于人工智能的作物基因组蛋白质功能注释方法
  • 批准号:
    BB/Y514044/1
  • 财政年份:
    2024
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
释放植物的化学潜力:根据复杂酶超家族的 DNA 序列预测功能
  • 批准号:
    BB/V014722/1
  • 财政年份:
    2022
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Transforming the Structural Landscape of CATH to Aid Variant Analyses in Human and Agricultural Organisms and their Pathogens
改变 CATH 的结构景观以帮助人类和农业生物体及其病原体的变异分析
  • 批准号:
    BB/W018802/1
  • 财政年份:
    2022
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
CATH-FunVar - Predicting Viral and Human Variants Affecting COVID-19 Susceptibility and Severity and Repurposing Therapeutics
CATH-FunVar - 预测影响 COVID-19 易感性和严重程度的病毒和人类变异并重新调整治疗用途
  • 批准号:
    BB/W003368/1
  • 财政年份:
    2021
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
BBSRC-NSF/BIO Expanding the fold library in the twilight zone to facilitate structure determination of macromolecular machines
BBSRC-NSF/BIO 扩展暮光区折叠库以促进大分子机器的结构测定
  • 批准号:
    BB/S016007/1
  • 财政年份:
    2020
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
  • 批准号:
    BB/S020039/1
  • 财政年份:
    2020
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
  • 批准号:
    BB/T002735/1
  • 财政年份:
    2020
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
3D-Gateway - Gateway to protein structure and function
3D-Gateway - 蛋白质结构和功能的门户
  • 批准号:
    BB/S020144/1
  • 财政年份:
    2020
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant

相似国自然基金

宽色域理论与技术研究
  • 批准号:
    60802050
  • 批准年份:
    2008
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
覆盖整个浓度范围的高分子溶液性质浓度依赖性的研究
  • 批准号:
    20474026
  • 批准年份:
    2004
  • 资助金额:
    24.0 万元
  • 项目类别:
    面上项目
大范围冰雪覆盖对中国气候的影响
  • 批准号:
    48870225
  • 批准年份:
    1988
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
  • 批准号:
    BB/X018563/1
  • 财政年份:
    2024
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性
  • 批准号:
    BB/X018660/1
  • 财政年份:
    2024
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Alzheimer Diagnosis in older Adults with Chronic Conditions ADACC Network
患有慢性病的老年人的阿尔茨海默病诊断 ADACC 网络
  • 批准号:
    10726511
  • 财政年份:
    2023
  • 资助金额:
    $ 79.16万
  • 项目类别:
Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation
提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性
  • 批准号:
    BB/R015201/1
  • 财政年份:
    2019
  • 资助金额:
    $ 79.16万
  • 项目类别:
    Research Grant
Effect of the resolution and accuracy of geodata on RF coverage prediction
地理数据分辨率和精度对射频覆盖预测的影响
  • 批准号:
    416087-2011
  • 财政年份:
    2011
  • 资助金额:
    $ 79.16万
  • 项目类别:
    University Undergraduate Student Research Awards
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了