Transforming the Structural Landscape of CATH to Aid Variant Analyses in Human and Agricultural Organisms and their Pathogens
改变 CATH 的结构景观以帮助人类和农业生物体及其病原体的变异分析
基本信息
- 批准号:BB/W018802/1
- 负责人:
- 金额:$ 111.5万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Proteins are Nature's molecular machines involved in most biochemical processes in living systems. Mutations in proteins can affect their stability and/or shape or chemical properties, altering their function. Knowing the 3D structure of the protein can be extremely helpful in understanding whether and how these mutations have this effect. Proteins are typically made up of multiple 'domains' - important functional modules - each associated with a distinct globular shape. Our CATH classification groups domains according to evolutionary ancestry. Relatives are recognised because they have similar structures in their core and often functional features in common, though variations outside the core can modify function. We therefore sub-classify relatives into functional families if they have highly similar structures and functions. Experimental techniques for determining protein structures are challenging <1% of known proteins have experimental structures. However, AI technologies for predicting structures have been improving immensely. The best use information from millions of protein sequences (1D strings of molecules (residues)) to predict how proteins will fold up in 3D. The massive increase in sequence data (> one billion sequences now known) obtained by sampling diverse environments have empowered new methods (DeepMind's AlphaFold2) to predict model structures that are as good as experimental structures. DeepMind will provide ~138 million protein structures in 2022, ~200 times more than exists now. We will transform knowledge in our CATH evolutionary classification by bringing in this vast 3D data - and we will also bring in the sequences involved in predicting the structures. This even vaster sequence data will reveal evolutionary conserved sites highly likely to be linked to function. To handle this massive amount of data we will build powerful new methods. Our recent trials using a new approach (CATHe) correctly assigned domain sequences to their evolutionary family ~90% of the time. Where we have an AlphaFold2 structure for the domain we will apply accurate structure comparisons to validate the classification. A major aim will be use this new 3D data and more accurately predicted functional sites to understand how mutations in pathogens (e.g. SARS-CoV-2) can lead to increased virulence or transmission. We'll do this through our CATH-FunVar platform which examines where mutations lie on the protein structure. Proximity to functional sites means the mutation may damage or enhance the function. We have started using FunVar to analyse variants of concern in SARS-CoV2. We will extend it to other organisms and pathogens linked to human health and well-being e.g. crops like wheat and rice that are essential for food security and where knowledge of variant impacts can guide selection and engineering of more hardy or faster growing varieties. To improve FunVar we will improve the accuracy of our predicted functional families and detection of conserved functional sites in them. To do this we will exploit the vast structure and sequence data and adapt our new AI methods to make them even more powerful for this challenging task. We will build tools to analyse structure - function relationships in these families and develop powerful new visualisations for displaying these insights. Since we'll need to handle massive expansions in the data coming into CATH and lots of new methods for processing it - and since some new data is now captured in a way our computer programs can't read - we will completely re-engineer existing pipelines for classifying domains in CATH. We have already built preliminary pipelines that brought over a quarter of a million AlphaFold2 models into CATH. This project will allow us to make these methods more robust and then apply them to bring in at least 100 fold more models to expand FunVar and determine the impacts of variants that could impact on human health and food security.
蛋白质是自然界的分子机器,参与生命系统中的大多数生化过程。蛋白质突变会影响其稳定性和/或形状或化学性质,从而改变其功能。了解蛋白质的 3D 结构对于了解这些突变是否以及如何产生这种影响非常有帮助。蛋白质通常由多个“结构域”(重要的功能模块)组成,每个结构域都与独特的球状形状相关。我们的 CATH 分类根据进化祖先对领域进行分组。亲戚之所以被认可,是因为它们的核心结构相似,并且通常具有共同的功能特征,尽管核心之外的变化可以改变功能。因此,如果亲属具有高度相似的结构和功能,我们会将其细分为功能性家庭。确定蛋白质结构的实验技术具有挑战性 <1% 的已知蛋白质具有实验结构。然而,用于预测结构的人工智能技术已经取得了巨大进步。最好利用来自数百万个蛋白质序列(一维分子串(残基))的信息来预测蛋白质如何在 3D 中折叠。通过对不同环境进行采样获得的序列数据(目前已知超过 10 亿个序列)的大量增加,使得新方法(DeepMind 的 AlphaFold2)能够预测与实验结构一样好的模型结构。 DeepMind 将在 2022 年提供约 1.38 亿个蛋白质结构,比现在多约 200 倍。我们将通过引入大量 3D 数据来转变 CATH 进化分类中的知识,并且我们还将引入参与预测结构的序列。这个更庞大的序列数据将揭示极有可能与功能相关的进化保守位点。为了处理如此大量的数据,我们将构建强大的新方法。我们最近使用新方法 (CATHe) 进行的试验在约 90% 的时间内正确地将域序列分配给其进化家族。如果该领域有 AlphaFold2 结构,我们将应用准确的结构比较来验证分类。主要目标是利用这种新的 3D 数据和更准确预测的功能位点来了解病原体(例如 SARS-CoV-2)的突变如何导致毒力或传播增加。我们将通过 CATH-FunVar 平台来完成此任务,该平台检查蛋白质结构上的突变位置。靠近功能位点意味着突变可能会损害或增强功能。我们已开始使用 FunVar 来分析 SARS-CoV2 中值得关注的变异。我们将把它扩展到与人类健康和福祉相关的其他生物体和病原体,例如。小麦和水稻等作物对粮食安全至关重要,对变异影响的了解可以指导选择和设计更耐寒或生长更快的品种。为了改进 FunVar,我们将提高预测功能家族的准确性以及其中保守功能位点的检测。为此,我们将利用大量的结构和序列数据,并调整我们的新人工智能方法,使其更强大地完成这项具有挑战性的任务。我们将构建工具来分析这些家族中的结构-功能关系,并开发强大的新可视化来显示这些见解。由于我们需要处理进入 CATH 的数据的大量扩展以及许多新的处理方法 - 并且由于现在一些新数据以我们的计算机程序无法读取的方式捕获 - 我们将完全重新设计现有的用于在 CATH 中对域进行分类的管道。我们已经建立了初步管道,将超过 25 万个 AlphaFold2 模型引入 CATH。该项目将使我们能够使这些方法更加稳健,然后应用它们引入至少 100 倍的模型来扩展 FunVar 并确定可能影响人类健康和粮食安全的变体的影响。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins.
AlphaFold2 3D 模型的大规模聚类揭示了蛋白质的结构和功能。
- DOI:http://dx.10.1016/j.molcel.2023.10.039
- 发表时间:2023
- 期刊:
- 影响因子:16
- 作者:Bordin N
- 通讯作者:Bordin N
Understanding structural and functional diversity of ATP-PPases using protein domains and functional families in CATH database
使用 CATH 数据库中的蛋白质结构域和功能家族了解 ATP-PPase 的结构和功能多样性
- DOI:10.1101/2023.10.12.562014
- 发表时间:2023-10-16
- 期刊:
- 影响因子:0
- 作者:V. Waman;Jialin Yin;Neeladri Sen;Mohd Firdaus;Su Datt Lam;C. Orengo
- 通讯作者:C. Orengo
FunPredCATH: An ensemble method for predicting protein function using CATH.
FunPredCATH:一种使用 CATH 预测蛋白质功能的集成方法。
- DOI:10.1016/j.bbapap.2023.140985
- 发表时间:2023-12-01
- 期刊:
- 影响因子:0
- 作者:Joseph Bonello;C. Orengo
- 通讯作者:C. Orengo
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christine Orengo其他文献
Christine Orengo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christine Orengo', 18)}}的其他基金
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
- 批准号:
BB/X018563/1 - 财政年份:2024
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
ProtFunAI: AI based methods for functional annotation of proteins in crop genomes
ProtFunAI:基于人工智能的作物基因组蛋白质功能注释方法
- 批准号:
BB/Y514044/1 - 财政年份:2024
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
释放植物的化学潜力:根据复杂酶超家族的 DNA 序列预测功能
- 批准号:
BB/V014722/1 - 财政年份:2022
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
CATH-FunVar - Predicting Viral and Human Variants Affecting COVID-19 Susceptibility and Severity and Repurposing Therapeutics
CATH-FunVar - 预测影响 COVID-19 易感性和严重程度的病毒和人类变异并重新调整治疗用途
- 批准号:
BB/W003368/1 - 财政年份:2021
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
BBSRC-NSF/BIO Expanding the fold library in the twilight zone to facilitate structure determination of macromolecular machines
BBSRC-NSF/BIO 扩展暮光区折叠库以促进大分子机器的结构测定
- 批准号:
BB/S016007/1 - 财政年份:2020
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
- 批准号:
BB/S020039/1 - 财政年份:2020
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
- 批准号:
BB/T002735/1 - 财政年份:2020
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
3D-Gateway - Gateway to protein structure and function
3D-Gateway - 蛋白质结构和功能的门户
- 批准号:
BB/S020144/1 - 财政年份:2020
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation
提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性
- 批准号:
BB/R014892/1 - 财政年份:2018
- 资助金额:
$ 111.5万 - 项目类别:
Research Grant
相似国自然基金
非线性模型结构性误差的动力学订正方法研究
- 批准号:42375059
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
橡胶木非结构性碳水化合物原位交联改性及梯级保护机制
- 批准号:32371791
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
高纬旱区复杂结构性特殊土水敏致灾机理与重大工程灾变防控
- 批准号:42330708
- 批准年份:2023
- 资助金额:231 万元
- 项目类别:重点项目
结构性软土循环变温热排水固结机理及理论研究
- 批准号:52378356
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
极地航行多因素耦合作用下装备结构性能演化规律及动态可靠性分析方法
- 批准号:52371361
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
K+ channel structural dynamics landscape: from selectivity to gating
K 通道结构动力学景观:从选择性到门控
- 批准号:
10663554 - 财政年份:2023
- 资助金额:
$ 111.5万 - 项目类别:
K+ channel structural dynamics landscape: from selectivity to gating
K 通道结构动力学景观:从选择性到门控
- 批准号:
10663554 - 财政年份:2023
- 资助金额:
$ 111.5万 - 项目类别:
Sequence and Environmental Determinants of the Protein Energy Landscape
蛋白质能量景观的序列和环境决定因素
- 批准号:
10623527 - 财政年份:2023
- 资助金额:
$ 111.5万 - 项目类别: