Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
基本信息
- 批准号:2310113
- 负责人:
- 金额:$ 59.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-08-01 至 2026-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Large language models are massive neural networks that learn rich contextual representations of words and use such representations to address a variety of tasks in natural language processing (NLP). These models are a prominent example of generative artificial intelligence and are emerging as promising approaches for distilling and organizing the content of massive biological databases and for predicting a wide range of molecular bio-properties. Yet, we know surprisingly little about what these models capture in their learned representations, why they perform well on some tasks and not on others, and how they can produce deep insight into the relationships describing the biological space. If progress in NLP is any indication, the current trend of improving the performance of language models by drastically increasing the number of their trainable parameters is unsustainable both for our carbon footprint and for ensuring equity/accessibility of research and scholarship in the academic setting. This project advances algorithmic research at the intersection of information integration and informatics using principled protein language models (PLMs) as computational vehicles for deeper insight into the structural, functional, and evolutionary organization across protein space at varying levels of detail and scale. It also aims to do so in a way that is resource-aware, sustainable, and accessible to all researchers. The research activities are organized in three thrusts: (1) encoding prior biological knowledge in PLMs for joint and resource-aware learning in composite spaces, (2) revealing fundamental properties and organizing the learned representation space to inform and connect what is captured with properties of interest, and (3) enabling PLMs to capture diverse contexts for deeper exploration of the structural, functional, and evolutionary organization across protein space. This interdisciplinary approach contributes to the fields of machine learning, bioinformatics, and molecular biology and provides opportunities at the interface of these disciplines for training under-represented students of all levels. The investigators are determined to bridge communities and disciplines, and they have planned activities to build and galvanize a trans-disciplinary community to further advance their research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大型语言模型是庞大的神经网络,可以学习文字的丰富上下文表示,并使用此类表示来解决自然语言处理(NLP)的各种任务。这些模型是生成人工智能的重要例子,并且正在成为蒸馏和组织大量生物学数据库含量并预测广泛的分子生物培训的有前途的方法。但是,我们对这些模型在他们所学的表示中所捕获的内容,为什么在某些任务而不是在其他任务上表现良好,以及他们如何对描述生物学空间的关系产生深刻的见解,这一点却一无所知。如果NLP的进展是任何迹象,那么通过大幅度增加其可训练参数的数量来改善语言模型的表现的当前趋势对于我们的碳足迹而言是不可持续的,并且可以在学术环境中确保研究和奖学金的公平性/可及性。该项目使用原则性的蛋白质语言模型(PLM)作为计算工具,在信息整合和信息学的交集中推进算法研究,以更深入地了解跨蛋白质空间的结构,功能和进化组织,这些蛋白质在各种详细水平和规模的水平上。它还旨在以所有研究人员的资源感知,可持续和访问的方式这样做。研究活动有三个力量组织:(1)编码PLM中的先前的生物学知识,用于复合空间中的联合和资源感知学习,(2)揭示基本属性并组织学习的代表空间,以将捕获的属性告知和连接与兴趣的属性,(3)使PLMS捕捉到具有更深层探索的多样性的构造的概念,并构成了构造的组织,并进行了努力,并进行了效率,并进行了效率,并浏览了效率,并进行了效率。这种跨学科的方法有助于机器学习,生物信息学和分子生物学的领域,并在这些学科的界面上提供了机会,以培训各个层次的代表性不足的学生。调查人员决心弥合社区和学科,他们计划进行跨学科社区建立和激发其进一步促进其研究的活动。该奖项反映了NSF的法定使命,并被认为是值得通过基金会的知识分子优点和更广泛的审查标准通过评估来进行评估的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Amarda Shehu其他文献
On the characterization of protein native state ensembles.
关于蛋白质天然状态整体的表征。
- DOI:
10.1529/biophysj.106.094409 - 发表时间:
2007 - 期刊:
- 影响因子:3.4
- 作者:
Amarda Shehu;L. Kavraki;C. Clementi - 通讯作者:
C. Clementi
From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes
从优化到映射:蛋白质能量景观的进化算法
- DOI:
10.1109/tcbb.2016.2628745 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Emmanuel Sapin;K. De Jong;Amarda Shehu - 通讯作者:
Amarda Shehu
Reconstructing and mining protein energy landscape to understand disease
重建和挖掘蛋白质能量景观以了解疾病
- DOI:
10.1109/bibm.2017.8217619 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Wanli Qiao;T. Maximova;X. Fang;E. Plaku;Amarda Shehu - 通讯作者:
Amarda Shehu
Molecules in motion: Computing structural flexibility
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Amarda Shehu - 通讯作者:
Amarda Shehu
An Evolutionary Search Algorithm to Guide Stochastic Search for Near-Native Protein Conformations with Multiobjective Analysis
一种进化搜索算法,通过多目标分析指导随机搜索近天然蛋白质构象
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Brian S. Olson;Amarda Shehu - 通讯作者:
Amarda Shehu
Amarda Shehu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Amarda Shehu', 18)}}的其他基金
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411529 - 财政年份:2024
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIBR: Innovation: Bioinformatics: Linking Chemical and Biological Space: Deep Learning and Experimentation for Property-Controlled Molecule Generation
合作研究:IIBR:创新:生物信息学:连接化学和生物空间:属性控制分子生成的深度学习和实验
- 批准号:
2318829 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Continuing Grant
Intergovernmental Personnel Act
政府间人事法
- 批准号:
1948645 - 财政年份:2019
- 资助金额:
$ 59.99万 - 项目类别:
Intergovernmental Personnel Award
Collaborative: SI2-SSE - A Plug-and-Play Software Platform of Robotics-Inspired Algorithms for Modeling Biomolecular Structures and Motions
协作:SI2-SSE - 用于生物分子结构和运动建模的机器人启发算法的即插即用软件平台
- 批准号:
1440581 - 财政年份:2015
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Travel Awards for 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM-2015)
2015 年 IEEE 国际生物信息学和生物医学会议 (BIBM-2015) 旅行奖
- 批准号:
1543744 - 财政年份:2015
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
CCF: AF: Small: Novel Stochastic Optimization Algorithms to Advance the Treatment of Dynamic Molecular Systems
CCF:AF:Small:新型随机优化算法推进动态分子系统的治疗
- 批准号:
1421001 - 财政年份:2014
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Workshop: 2014 NSF CISE CAREER Proposal Writing Workshop
研讨会:2014 NSF CISE CAREER 提案写作研讨会
- 批准号:
1415210 - 财政年份:2013
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
CAREER: Probabilistic Methods for Addressing Complexity and Constraints in Protein Systems
职业:解决蛋白质系统复杂性和约束的概率方法
- 批准号:
1144106 - 财政年份:2012
- 资助金额:
$ 59.99万 - 项目类别:
Continuing Grant
AF: Small: A Unified Computational Framework to Enhance the Ab-Initio Sampling of Native-Like Protein Conformations
AF:小型:增强类天然蛋白质构象从头开始采样的统一计算框架
- 批准号:
1016995 - 财政年份:2010
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
相似国自然基金
基于Wnt通路探讨装载淫羊藿次苷II外泌体靶向骨髓间充质干细胞成骨分化抗骨质疏松治疗的机制研究
- 批准号:82374164
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
叶绿体蛋白SQE1调控水稻非光化学淬灭和光系统II损伤修复的分子机制研究
- 批准号:32370260
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
KIF17介导肿瘤细胞MHC-II胞膜定位促进乳腺癌抗原提呈及免疫应答的机制研究
- 批准号:82372781
- 批准年份:2023
- 资助金额:46 万元
- 项目类别:面上项目
II-VI族胶体半导体量子点二步合成法与低温成核机理研究
- 批准号:22305162
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
耐受性DC通过Wnt5a信号通路调控MHC-II+CD8+Treg诱导免疫耐受的机制研究
- 批准号:82370665
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: DESC: Type II: REFRESH: Revisiting Expanding FPGA Real-estate for Environmentally Sustainability Heterogeneous-Systems
合作研究:DESC:类型 II:REFRESH:重新审视扩展 FPGA 空间以实现环境可持续性异构系统
- 批准号:
2324865 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS Core: Small: World Values of Conversational AI and the Consequences for Human-AI Interaction
协作研究:IIS 核心:小:对话式 AI 的世界价值以及人机交互的后果
- 批准号:
2230466 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: Enhancing Chemoselectivity and Efficiency Through Control of Axial Coordination in Rh(II) Complexes: An Experimental and Computational Approach
合作研究:通过控制 Rh(II) 配合物的轴向配位提高化学选择性和效率:实验和计算方法
- 批准号:
2247836 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IIS Core: Small: World Values of Conversational AI and the Consequences for Human-AI Interaction
协作研究:IIS 核心:小:对话式 AI 的世界价值以及人机交互的后果
- 批准号:
2230467 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant
Collaborative Research: IRES Track II: Short Courses on Manufacturing Frontiers Leveraging Unique Facilities in Italy
合作研究:IRES Track II:利用意大利独特设施的制造前沿短期课程
- 批准号:
2246809 - 财政年份:2023
- 资助金额:
$ 59.99万 - 项目类别:
Standard Grant