Integrative deep learning algorithms for understanding protein sequence-structure-function relationships: representation, prediction, and discovery
用于理解蛋白质序列-结构-功能关系的集成深度学习算法:表示、预测和发现
基本信息
- 批准号:10712082
- 负责人:
- 金额:$ 36.29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-05 至 2028-07-31
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAccelerationAlgorithmsAmino Acid SequenceAmino AcidsAreaArtificial IntelligenceBiologicalBiological SciencesBiologyBiomedical EngineeringBiotechnologyComputer softwareComputing MethodologiesDataData SetDirected Molecular EvolutionDiseaseGoalsHealthHeterogeneityHumanIntelligenceKnowledgeLearningMeasurementMechanicsModelingNoiseOntologyPropertyProtein AnalysisProtein EngineeringProteinsResearchShapesStructureStructure-Activity RelationshipSystems IntegrationUncertaintyVariantanalytical toolartificial intelligence algorithmartificial intelligence methodcomputational platformcomputer frameworkdata resourcedeep learningdeep learning algorithmdesigndrug discoveryfunctional genomicsgenome-widehigh dimensionalityhuman diseaseimprovedlarge scale datamachine learning methodnext generation sequencingnovelpersonalized diagnosticspersonalized medicineprecision medicineprotein functionprotein structureprotein structure predictionstatistical learningstructural genomicssynthetic biologythree dimensional structurevaccine development
项目摘要
Abstract
Understanding the sequence-structure-function relationship of proteins is of vital importance to protein biology,
biomedicine, and bioengineering. Recent advances in biotechnology have been generating rich datasets to
characterize proteins, such as next-generation sequencing data, three-dimensional (3D) structures, ontology
annotations, and measurements of functional activities, yet how to computationally operationalize these datasets
to fully unveil the structural or functional mechanisms of proteins remains a significant challenge. Existing
computational methods often struggle with the size, high-dimensionality, heterogeneity, incompleteness, and
intrinsic noise of those data, limiting our ability to study protein biology in a holistic and integrated system view.
The goal of this research is to develop new artificial intelligence (AI) methods for effectively integrating and
intelligently modeling heterogeneous protein-related datasets and to advance our understanding of the
mechanical connections between proteins’ sequence, structure, and function. This project not only represents
timely research that leverages the unprecedented opportunities offered by recent AI breakthroughs such as
AlphaFold, but also goes beyond these efforts from protein structure prediction to systematic analyses of protein
biology and unlocks new analytic frameworks that could not be realized previously. Specifically, we will first
develop novel machine learning methods to learn statistical representations that are grounded on the sequence
and structure of proteins and reflect their functional properties. The learned representations will allow us to
characterize how the composition of amino acids and the 3D shape of protein structure determine the function
of a protein. Second, we will develop unified, biology-guided deep learning frameworks to integrate domain
knowledge, such as structural properties and evolutionary relationships, and study several key problems for
characterizing protein functions, including genome-scale function annotation and variant effect prediction. These
efforts will shift the classic sequence-first paradigm of previous studies to a new integrative paradigm and provide
accurate, robust, and interpretable predictions of protein functions. Finally, we will develop a computational
platform that combines data-efficient AI models, uncertainty-guided exploration algorithms, and deep learning-
based generative models for AI-aided directed evolution and sequence-structure co-design of proteins, which
will assist and accelerate the discovery and design of functional proteins. Overall, this proposal will study the
sequence-structure-function relationship of proteins from an integrative perspective, provide new state-of-the-art
AI algorithms with applications in fundamental problems for understanding protein function and human disease,
and generate new actionable biological hypotheses for the discovery and design of novel functional proteins.
The resulting software and data resources will be publicly available through open-access platforms.
抽象的
了解蛋白质的序列-结构-功能关系对于蛋白质生物学至关重要,
生物医学和生物工程的最新进展已经产生了丰富的数据集。
表征蛋白质,例如下一代测序数据、三维 (3D) 结构、本体论
注释和功能活动的测量,以及如何通过计算操作这些数据集
全面揭示蛋白质的结构或功能机制仍然是一个重大挑战。
计算方法经常与规模、高维性、异质性、不完整性和
这些数据的内在噪音,限制了我们从整体和集成系统角度研究蛋白质生物学的能力。
这项研究的目标是开发新的人工智能(AI)方法,以有效地整合和
智能地建模异质蛋白质相关数据集并增进我们对
该项目不仅代表了蛋白质序列、结构和功能之间的机械联系。
及时的研究利用了最近人工智能突破所提供的前所未有的机遇,例如
AlphaFold,还超越了从蛋白质结构预测到蛋白质系统分析的这些努力
生物学并解锁以前无法实现的新分析框架,具体来说,我们将首先。
开发新颖的机器学习方法来学习基于序列的统计表示
蛋白质和结构并反映它们的功能特性将使我们能够
描述氨基酸的组成和蛋白质结构的 3D 形状如何决定功能
其次,我们将开发统一的、生物学引导的深度学习框架来整合领域。
知识,例如结构特性和进化关系,并研究几个关键问题
表征蛋白质功能,包括基因组规模的功能注释和变异效应预测。
将先前研究的经典顺序优先范式转变为新的积分范式,并提供
最后,我们将开发一种计算方法。
该平台结合了数据高效的人工智能模型、不确定性引导的探索算法和深度学习——
基于人工智能辅助的蛋白质定向进化和序列结构协同设计的生成模型,
总体而言,该提案将有助于并加速功能蛋白质的发现和设计。
从整合的角度研究蛋白质的序列-结构-功能关系,提供了新的最先进技术
人工智能算法在理解蛋白质功能和人类疾病的基本问题中的应用,
并为发现和设计新型功能蛋白产生新的可操作的生物学假设。
由此产生的软件和数据资源将通过开放获取平台公开提供。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yunan Luo其他文献
Yunan Luo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
基于增广拉格朗日函数的加速分裂算法及其应用研究
- 批准号:12371300
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于任意精度计算架构的量子信息处理算法硬件加速技术研究
- 批准号:62304037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
非凸约束下大规模稀疏优化问题的加速随机梯度算法研究
- 批准号:12301405
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
加速器磁铁励磁电源扰动物理机制和观测抑制算法的研究
- 批准号:12305170
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
A data science framework for transforming electronic health records into real-world evidence
将电子健康记录转化为现实世界证据的数据科学框架
- 批准号:
10664706 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别:
High-Resolution Lymphatic Mapping of the Upper Extremities with MRI
使用 MRI 进行上肢高分辨率淋巴图谱分析
- 批准号:
10663718 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别:
A computational model for prediction of morphology, patterning, and strength in bone regeneration
用于预测骨再生形态、图案和强度的计算模型
- 批准号:
10727940 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别:
Hybrid Model-Based and Data-Driven Frameworks for High-Resolution Tomographic Imaging
基于混合模型和数据驱动的高分辨率断层成像框架
- 批准号:
10714540 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别:
Deep-Learning-Augmented Quantitative Gradient Recalled Echo (DLA-qGRE) MRI for in vivo Clinical Evaluation of Brain Microstructural Neurodegeneration in Alzheimer Disease
深度学习增强定量梯度回忆回波 (DLA-qGRE) MRI 用于阿尔茨海默病脑微结构神经变性的体内临床评估
- 批准号:
10659833 - 财政年份:2023
- 资助金额:
$ 36.29万 - 项目类别: