ProtFunAI: AI based methods for functional annotation of proteins in crop genomes
ProtFunAI:基于人工智能的作物基因组蛋白质功能注释方法
基本信息
- 批准号:BB/Y514044/1
- 负责人:
- 金额:$ 32.43万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2024
- 资助国家:英国
- 起止时间:2024 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Our project will 'build on existing links and deepen existing relationships' between the two groups pioneering the development of AI/Deep-Learning models for proteins (Rost Group, TUM) and the application of these to protein domain families (Orengo Group, UCL). It will leverage world leading expertise in protein Language Models (pLMs) in order to accelerate the scientific discovery of protein functions in the genomes of key agricultural crops important for food security. However, our approaches will be generic and rolled out to all UniProt proteins through existing collaborations.Synergies between both groups have evolved over several collaborations. Since 2019, ground-breaking results tuned the pLMs developed in the Rost Group (e.g. the ProtTrans series, incl. ProtT5, ProtTucker) with protein family and functional family data (CATH superfamilies and FunFams) generated and maintained by the Orengo Group.The partnership proposed, here, would allow researchers in the Rost and Orengo Groups to intensify exchanges through visiting each others labs and interacting more comprehensively to design more effective protocols that enhance (1) protein homologue detection (2) protein function prediction and (3) protein functional site prediction.The Orengo and Rost Groups began collaborating in 2000 when working together on protein family analysis for target identification in the NIH-funded USA Structural Genomics initiative (PSI), which ended in 2015 [21-23]. Subsequently funding from the German BMBF (Federal German Research Ministry) and DFG (German Research Foundation) supported visits of PhD and Masters students from both groups and resulted in the development of new approaches for protein function prediction [14,15]. This application seeks funds to continue these collaborations to leverage the latest advances in AI/Deep Learning. The Rost Group recently enhanced their pLMs significantly (ProstT5 [18]) and the funding would allow us to apply ProstT5 to exploit the hugely expanded CATH classification, which is currently integrating hundreds of millions of predicted protein structures from the AlphaFold portal (AFDB).The application is very timely as it will address key BBSRC strategic priorities around data intensive biology and AI and the important challenge of food security. We will apply improved function prediction methods to significantly increase the functional annotations of plant genomes. This will bring 'new knowledge about key biological principles and mechanisms using AI-based approaches' and bring 'AI in sustainable agriculture and food' and enable 'smart agriculture' by identifying genes implicated in biological systems associated with growth and stress resistance e.g. drought and antimicrobial resistance. Most genes (typically >90%) from plants valuable as crops (e.g. wheat, maize, rice, sorghum) are experimentally uncharacterized or very poorly annotated. Our methods will be state-of-the-art to accurately guide experimental validation.We will disseminate the annotations using our established web-based CATH resource accessed by over 27,000 users/month. Since CATH data is also disseminated by PDB, UniProt and InterPro the predictions will be accessible to >900,000s of users/month. We will also work closely with collaborators in the UK researching plant genomes to get feedback and solicit experimental validation where possible.The project will significantly enhance the AI/ML skills of UK based researchers in the Orengo Group, whose prior training was largely in biology. On the flip side, the more AI-focused members from the Rost group will deepen their understanding of individual proteins, organisms, and evolution. German scholars will also dive deeper into the workings of UK-based resources.
我们的项目将“建立在现有的链接上并加深现有关系”,这两组开创了蛋白质AI/深学习模型(ROST组,TUM)的开发,并将这些模型应用于蛋白质结构域家族(Orengo Group,UCL)。它将利用世界领先的蛋白质语言模型(PLM)领先专业知识,以加快对粮食安全重要的主要农作物基因组中蛋白质功能的科学发现。但是,我们的方法将是通用的,并通过现有的合作将所有Uniprot蛋白推出。两组之间的结合都在几个合作中发展。自2019年以来,突破性的结果调查了ROST组中开发的PLM(例如P我的Protthans系列,包括Prott5,Prott5,Prottucker,prottucker),具有蛋白质家族和功能性家族数据(CATH超家族和Funfams)(CATH超家族和Funfams),由Orengo组产生和维护。设计更有效的方案以增强(1)蛋白质同源物检测(2)蛋白质功能预测和(3)蛋白质功能位点预测。Orengo和Rost组在2000年开始在2000年进行蛋白质家族分析在NIH资助的美国结构基因组启动(PSI)中进行蛋白质家族分析,该蛋白质家族分析于2015年在2015年结构[215-23]。随后,德国BMBF(德国联邦研究部)和DFG(德国研究基金会)的资金支持了来自这两个小组的博士学位和硕士学生的访问,并导致了蛋白质功能预测的新方法[14,15]。该应用程序寻求资金继续这些合作,以利用AI/深度学习的最新进展。 The Rost Group recently enhanced their pLMs significantly (ProstT5 [18]) and the funding would allow us to apply ProstT5 to exploit the hugely expanded CATH classification, which is currently integrating hundreds of millions of predicted protein structures from the AlphaFold portal (AFDB).The application is very timely as it will address key BBSRC strategic priorities around data intensive biology and AI and the important challenge of food security.我们将采用改进的功能预测方法来显着增加植物基因组的功能注释。这将带来“使用基于AI的方法的关键生物学原理和机制的新知识”,并通过鉴定与生长和压力抗性相关的生物系统中涉及的基因,从而使“可持续农业和食品中的AI”在可持续农业和食品中启用“智能农业”。干旱和抗菌素耐药性。大多数基因(通常> 90%)来自具有农作物(例如小麦,玉米,大米,高粱)的植物,在实验上没有表征或注释不佳。我们的方法将是准确指导实验验证的最新方法。我们将使用我们既定的基于Web的CATA资源来传播注释,该资源每月超过27,000名用户访问。由于CATH数据也由PDB传播,Uniprot和Interpro也可以通过> 900,000个用户(每月)访问预测。我们还将与英国研究植物基因组的合作者紧密合作,以获取反馈并在可能的情况下征求实验验证。该项目将显着提高Orengo小组中英国基于英国的研究人员的AI/ML技能,Orengo小组的研究人员先前在生物学领域进行了很大的培训。另一方面,来自ROST组的以AI为中心的成员将加深他们对单个蛋白质,生物体和进化的理解。德国学者还将深入研究英国资源的运作。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christine Orengo其他文献
Globalization : Approaches to Diversities
全球化:实现多元化的途径
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Benoit H Dessailly;Natalie L Dawson;Kenji Mizuguchi;Christine Orengo;Hector Cuadra-Montiel - 通讯作者:
Hector Cuadra-Montiel
Christine Orengo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christine Orengo', 18)}}的其他基金
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
- 批准号:
BB/X018563/1 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
Transforming the Structural Landscape of CATH to Aid Variant Analyses in Human and Agricultural Organisms and their Pathogens
改变 CATH 的结构景观以帮助人类和农业生物体及其病原体的变异分析
- 批准号:
BB/W018802/1 - 财政年份:2022
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
释放植物的化学潜力:根据复杂酶超家族的 DNA 序列预测功能
- 批准号:
BB/V014722/1 - 财政年份:2022
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
CATH-FunVar - Predicting Viral and Human Variants Affecting COVID-19 Susceptibility and Severity and Repurposing Therapeutics
CATH-FunVar - 预测影响 COVID-19 易感性和严重程度的病毒和人类变异并重新调整治疗用途
- 批准号:
BB/W003368/1 - 财政年份:2021
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
3D-Gateway - Gateway to protein structure and function
3D-Gateway - 蛋白质结构和功能的门户
- 批准号:
BB/S020144/1 - 财政年份:2020
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
- 批准号:
BB/S020039/1 - 财政年份:2020
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
- 批准号:
BB/T002735/1 - 财政年份:2020
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
BBSRC-NSF/BIO Expanding the fold library in the twilight zone to facilitate structure determination of macromolecular machines
BBSRC-NSF/BIO 扩展暮光区折叠库以促进大分子机器的结构测定
- 批准号:
BB/S016007/1 - 财政年份:2020
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation
提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性
- 批准号:
BB/R014892/1 - 财政年份:2018
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
相似国自然基金
基于国产AI芯片的自动布局布线优化算法研究
- 批准号:62306286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于“人工智能算法+高精度遥感数据”的棉花表型信息识别及解析
- 批准号:32360436
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
基于原子贡献与人工智能的萃取精馏溶剂分子设计研究
- 批准号:22308037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于深度渐进学习的CT图像重建和多任务协同式AI辅助诊断模型研究
- 批准号:62371190
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
人与AI创造力共舞: AI介入的创造力悖论及基于综合创造力4D模型的破解机制研究
- 批准号:72372045
- 批准年份:2023
- 资助金额:42 万元
- 项目类别:面上项目
相似海外基金
I-Corps: Centralized, Cloud-Based, Artificial Intelligence (AI) Video Analysis for Enhanced Intubation Documentation and Continuous Quality Control
I-Corps:基于云的集中式人工智能 (AI) 视频分析,用于增强插管记录和持续质量控制
- 批准号:
2405662 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Standard Grant
CRII: CPS: FAICYS: Model-Based Verification for AI-Enabled Cyber-Physical Systems Through Guided Falsification of Temporal Logic Properties
CRII:CPS:FAICYS:通过时态逻辑属性的引导伪造,对支持人工智能的网络物理系统进行基于模型的验证
- 批准号:
2347294 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Standard Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y000455/1 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
Research Grant
I(eye)-SCREEN: A real-world AI-based infrastructure for screening and prediction of progression in age-related macular degeneration (AMD) providing accessible shared care
I(eye)-SCREEN:基于人工智能的现实基础设施,用于筛查和预测年龄相关性黄斑变性 (AMD) 的进展,提供可及的共享护理
- 批准号:
10102692 - 财政年份:2024
- 资助金额:
$ 32.43万 - 项目类别:
EU-Funded