GOALI: Frameworks: At-Scale Heterogeneous Data based Adaptive Development Platform for Machine-Learning Models for Material and Chemical Discovery
GOALI:框架:基于大规模异构数据的自适应开发平台,用于材料和化学发现的机器学习模型
基本信息
- 批准号:2311632
- 负责人:
- 金额:$ 450万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2028-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project seeks to establish a new technological paradigm and the software infrastructure necessary for the development of Machine Learning (ML) models capable of predicting the properties of unseen molecular and materials systems/structures, thus enabling modeling of atomic behavior and the computational discovery of new molecules and materials at significantly higher throughput than afforded by existing first principles (quantum) methods. ML-enabled materials discovery is poised to play a critical role in addressing modern societal challenges such as energy sustainability and, as such, the technology and infrastructure developed by this project are expected to have a transformative impact across many scientific and engineering domains. The platform facilitates access, sharing, and discovery of vast amounts of first principles and experimental data, removing inefficiencies and accelerating scientific discovery by enabling the development of ML models on a scale previously inaccessible. To achieve these goals, this project is carried out in partnership with Amazon Web Services (AWS), providing the necessary know-how for the development of specialized open-source tools for training ML models at scale. This project is committed to the advancement of diversity, equity and inclusiveness in higher education, and as such it incorporates a variety of mechanisms to include underrepresented and low-income students (high-school and undergraduate) in its research activities across the four participating universities (New York University, University of Minnesota, University of Florida, and Brigham Young University), in addition to the mentoring of graduate students, the development of teaching materials, and workshops aimed at industrial outreach and training. To assure alignment between the platform/software and community needs, this project is supported by an Advisory Board of experts in cyberinfrastructure development, machine learning, material and chemical sciences, and STEM outreach who evaluate and provide strategic advice to the PIs.The key technological advance that serves as the basis of this work are "foundation models", an approach for building ML systems in which a model trained on extremely large amounts of diverse and easily available data can be adapted to diverse applications with a small amount of additional model fitting (fine-tuning). This project thus focuses on the development of a foundation model, called FERMat, for molecular and material property prediction, and ML interatomic potentials for modeling atomic behavior. FERMat is to be delivered via an integrated adaptive platform in the form of a software package and an online framework for developing and deploying specialized ML models for materials and chemistry applications, called "FERMat Apps". In collaboration with AWS this project seeks to develop open-source software for training foundation models like FERMat at scale on large amounts of highly heterogeneous and multi-modal data. The high data needs will be met by leveraging and significantly expanding the ColabFit Exchange, an online repository of first principles and experimental data optimized for training of ML models, in cooperation with a large number of materials and molecular data repositories, standards organizations, and existing cyberinfrastructures. FERMat and any ML model derived from it is designed to support uncertainty quantification (based on information geometry, Bayesian, and frequentist approaches) to ensure the robustness of predictions. As guiding target applications, this project considers two problems of scientific interest: 2D material driven catalysis and the prediction of molecular crystal polymorphs.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目旨在建立一种新的技术范式和开发机器学习(ML)模型所需的软件基础设施,能够预测看不见的分子和材料系统/结构的特性,从而实现原子行为的建模和新的计算发现分子和材料的通量明显高于现有第一原理(量子)方法所提供的通量。支持机器学习的材料发现将在解决能源可持续性等现代社会挑战方面发挥关键作用,因此,该项目开发的技术和基础设施预计将对许多科学和工程领域产生变革性影响。该平台有助于访问、共享和发现大量第一原理和实验数据,通过以前所未有的规模开发机器学习模型,消除效率低下并加速科学发现。为了实现这些目标,该项目与 Amazon Web Services (AWS) 合作开展,为开发用于大规模训练 ML 模型的专用开源工具提供必要的专业知识。该项目致力于促进高等教育的多样性、公平性和包容性,因此它采用了多种机制,将代表性不足和低收入学生(高中和本科生)纳入四所参与大学的研究活动中(纽约大学、明尼苏达大学、佛罗里达大学和杨百翰大学),除了指导研究生、开发教材以及举办旨在工业推广和培训的研讨会之外。为了确保平台/软件与社区需求之间的一致性,该项目得到了由网络基础设施开发、机器学习、材料和化学科学以及 STEM 外展专家组成的咨询委员会的支持,他们评估并向 PI 提供战略建议。作为这项工作基础的先进技术是“基础模型”,这是一种构建机器学习系统的方法,其中在大量多样化且易于获得的数据上训练的模型可以通过少量的额外模型拟合来适应不同的应用程序 (微调)。因此,该项目的重点是开发一个名为 FERMat 的基础模型,用于分子和材料属性预测,以及用于建模原子行为的 ML 原子间势。 FERMat 将通过集成自适应平台以软件包和在线框架的形式提供,用于开发和部署材料和化学应用的专用机器学习模型(称为“FERMat Apps”)。该项目与 AWS 合作,旨在开发开源软件,用于在大量高度异构和多模态数据上大规模训练 FERMat 等基础模型。高数据需求将通过利用和显着扩展 ColabFit Exchange 来满足,ColabFit Exchange 是一个针对机器学习模型训练而优化的第一原理和实验数据的在线存储库,与大量材料和分子数据存储库、标准组织和现有的合作网络基础设施。 FERMat 以及从其派生的任何 ML 模型旨在支持不确定性量化(基于信息几何、贝叶斯和频率论方法),以确保预测的稳健性。作为指导目标应用,该项目考虑了两个具有科学意义的问题:二维材料驱动催化和分子晶体多晶型的预测。该奖项由高级网络基础设施办公室颁发,并得到数学和物理理事会材料研究部的共同支持科学。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Stefano Martiniani其他文献
Energy landscapes for machine learning
- DOI:
10.1039/c7cp01108c - 发表时间:
2017-04 - 期刊:
- 影响因子:3.3
- 作者:
Andrew J. Ballard;Ritankar Das;Stefano Martiniani;Dhagash Mehta;Levent Sagun;Jacob D. Stevenson;David J. Wales - 通讯作者:
David J. Wales
Correlation Lengths in the Language of Computable Information.
可计算信息语言中的相关长度。
- DOI:
10.1103/physrevlett.125.170601 - 发表时间:
2020-04-07 - 期刊:
- 影响因子:8.6
- 作者:
Stefano Martiniani;Yuval Lemberg;P. Chaikin;D. Levine - 通讯作者:
D. Levine
Energy landscapes for machine learning.
机器学习的能源景观。
- DOI:
10.1039/c7cp01108c - 发表时间:
2017-05-24 - 期刊:
- 影响因子:0
- 作者:
A. J. Ballard;R. Das;Stefano Martiniani;D. Mehta;Levent Sagun;J. Stevenson;D. Wales - 通讯作者:
D. Wales
Exploiting the potential energy landscape to sample free energy
利用势能景观来采样自由能
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
A. J. Ballard;Stefano Martiniani;J. Stevenson;Sandeep Somani;D. Wales - 通讯作者:
D. Wales
Transport and Energetics of Bacterial Rectification
细菌整流的运输和能量学
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Satyam Anand;Xiaolei Ma;Shuo Guo;Stefano Martiniani;Xiang Cheng - 通讯作者:
Xiang Cheng
Stefano Martiniani的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Stefano Martiniani', 18)}}的其他基金
EAGER: Quantifying the error landscape of deep neural networks
EAGER:量化深度神经网络的错误情况
- 批准号:
2226387 - 财政年份:2022
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
EAGER: Quantifying the error landscape of deep neural networks
EAGER:量化深度神经网络的错误情况
- 批准号:
2132995 - 财政年份:2021
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
相似国自然基金
基于大规模监测的教育结果公平的测评框架构建与实证研究
- 批准号:72304037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向大规模深度神经网络的云边端协同并行处理框架与体系结构研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大规模框架式软件组合符号化分析研究
- 批准号:
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
大规模遥感影像样本库构建及开源遥感深度网络框架模型研究
- 批准号:92038301
- 批准年份:2020
- 资助金额:300.0 万元
- 项目类别:重大研究计划
面向大规模图查询问题的通用无损压缩框架研究
- 批准号:
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
相似海外基金
Frameworks: arXiv as an accessible large-scale open research platform
框架:arXiv 作为一个可访问的大型开放研究平台
- 批准号:
2311521 - 财政年份:2024
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
CAREER: Novel Parallelization Frameworks for Large-Scale Network Optimization with Combinatorial Requirements: Solution Methods and Applications
职业:具有组合要求的大规模网络优化的新型并行化框架:解决方法和应用
- 批准号:
2338641 - 财政年份:2024
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)
协作研究:框架:分布式和超大规模系统的可扩展性能和准确性分析 (SPADE)
- 批准号:
2311709 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Standard Grant
Multiscale computational frameworks for integrating large-scale cortical dynamics, connectivity, and behavior
用于集成大规模皮层动力学、连接性和行为的多尺度计算框架
- 批准号:
10840682 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)
协作研究:框架:分布式和超大规模系统的可扩展性能和准确性分析 (SPADE)
- 批准号:
2311708 - 财政年份:2023
- 资助金额:
$ 450万 - 项目类别:
Standard Grant