CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
基本信息
- 批准号:2340011
- 负责人:
- 金额:$ 59.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-07-01 至 2029-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Deep learning (DL) methods, especially the large foundational models, enable exciting new approaches to problems in many science and engineering disciplines, such as genomics, bioinformatics, meteorology, and natural language processing. Training foundational models at extreme scales is time-consuming, prone to low utilization with limited scalability, and human-effort demanding. This NSF CAREER project addresses the convergence, performance, and scalability gaps of large foundational model pre-training on supercomputers with innovative algorithms, systems, and interface design. In addition to the algorithm and computer system innovation, this project contributes to translational computer science by lowering the barrier of sizeable foundational model training and the time consumption of scientific deep learning, thus enabling significantly more scientific research to embrace large foundational models. The research results will be publicly available as open-source software to the broader community, with comprehensive documentation on the design and usage to help users from all domains.Technically, this NSF CAREER project has four research and educational thrusts: The first thrust focuses on new optimization techniques such as first-, second-, and mixed-order optimizers with potential approximation techniques to enhance time-to-convergence. The second thrust aims to enhance the scaling efficiency by designing novel sparsification algorithms that leverage the spatial and temporal patterns of gradients. The third thrust considers a new complex parallelism abstraction that transparently deploys large models across processors with near-optimal performance given the present capability of compute, interconnect, and I/O on a supercomputer. The fourth thrust designs educational activities, including a distributed DL system course, a DL tutorial, and a DL bootcamp targeting students and practitioners with different levels of expertise.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度学习 (DL) 方法,尤其是大型基础模型,为许多科学和工程学科(例如基因组学、生物信息学、气象学和自然语言处理)中的问题提供了令人兴奋的新方法。在极端规模下训练基础模型非常耗时,容易导致利用率低、可扩展性有限,并且需要人力。该 NSF CAREER 项目通过创新的算法、系统和接口设计,解决了超级计算机上大型基础模型预训练的收敛性、性能和可扩展性方面的差距。除了算法和计算机系统的创新之外,该项目还通过降低大规模基础模型训练的障碍和科学深度学习的时间消耗,为转化计算机科学做出贡献,从而使更多的科学研究能够拥抱大型基础模型。研究结果将作为开源软件向更广泛的社区公开,并提供有关设计和使用的全面文档,以帮助所有领域的用户。从技术上讲,这个 NSF 职业项目有四个研究和教育重点:第一个重点是新的优化技术,例如一阶、二阶和混合阶优化器,以及潜在的近似技术,可缩短收敛时间。第二个重点旨在通过设计利用梯度的空间和时间模式的新颖的稀疏化算法来提高缩放效率。第三个推力考虑了一种新的复杂并行抽象,考虑到超级计算机上当前的计算、互连和 I/O 能力,该抽象可以跨处理器透明地部署大型模型,并具有接近最佳的性能。第四个重点是设计教育活动,包括分布式深度学习系统课程、深度学习教程以及针对不同专业水平的学生和从业者的深度学习训练营。该奖项反映了 NSF 的法定使命,并通过使用基金会的评估进行评估,认为值得支持。智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhao Zhang其他文献
The genomic history of the Iberian Peninsula over the past 8000 years
伊比利亚半岛过去8000年的基因组历史
- DOI:
10.4236/jbbs.2019.96018 - 发表时间:
2024-09-14 - 期刊:
- 影响因子:0
- 作者:
I. Olalde;Swapan Mallick;Nick Patterson;N. Rohl;Mouco;Marina Silva;Katharina Dulias;C. Edwards;Francesca G;ini;ini;Maria;Pala;Pedro;Soares;Manuel;Ferr;o;o;Nicole;Adamski;Broom;khoshbacht;khoshbacht;O. Cheronet;B. Culleton;Daniel Fern;es;es;Marie Lawson;Matthew Mah;Jonas Oppenheimer;Kristin Stewardson;Zhao Zhang;Juan Manuel Jiménez Arenas;Isidro Jorge Toro Moyano;Domingo C. Salazar;P. Castanyer;Marta Santos;J. Tremoleda;Marina Lozano;Pablo García;Borja;J. Fernández;J. A. Mujika;Cecilio Barroso;J. Bermúdez;E. Mínguez;Josep Burch;Neus Coromina;David Vivó;A. Cebrià;Josep Maria Fullola;Oreto García‐Puchol;J. I. Morales;F. Xavier;12;Oms;Tona;Majó;Josep;Vergés;Antònia;Díaz;Imma;13;Castanyer;F. J. López;A. M. Silva;C. Alonso;Germán;Delibes;de;Castro;Javier;Jiménez;Echevarría;Adolfo;Moreno;Guillermo Pascual Berlanga;Pablo Ramos;José Ramos Muñoz;E. Vij;e;e;16;Vila;Gustau Aguilella Arzo;Ángel Esparza Arroyo;K. Lillios;Jennifer Mack;J. Velasco;A. Waterman;Luis Benítez de Lugo Enrich;María Benito;18;Sánchez;B. Agustí;F. Codina;Gabriel de Prado;A. Estalrrich;Álvaro;Fernández;Flores;Clive;Finlayson;Geraldine;Stewart;20;Francisco Giles;Antonio Rosas;V. González;Gabriel García Atiénzar;M. S. H. Pérez;Arm;o Llanos;o;Carrión Marco;Isabel Beneyto;David López;Mar Tormo;A. C. Valera;C. Blasco;Corina Liesau;Patricia Ríos;Joan Daura;Jesús de Pedro Michó;Agustín A Diez Castillo;R. F. Fernández;R. Garrido;V. S. Gonçalves;E. Guerra;Ana Mercedes;26;Herrero;Joaquim Juan;Dani López;S. McClure;Merino Pérez;Arturo Oliver Foix;Montse Borràs;A. Sousa;Manuel Vidal Encinas;D. Kennett;Martin B. Richards;K. Alt;W. Haak;R. Pinhasi;C. Lalueza;David Reich - 通讯作者:
David Reich
Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
Hawkeye:基于深度强化学习的 Android 应用变更目标测试
- DOI:
10.1145/3639477.3639749 - 发表时间:
2023-09-04 - 期刊:
- 影响因子:0
- 作者:
Chao Peng;Zhengwei Lv;Jiarong Fu;Jiayuan Liang;Zhao Zhang;Ajitha Rajan;Ping Yang - 通讯作者:
Ping Yang
Identification of microenvironment‐related genes with prognostic value in clear cell renal cell carcinoma
鉴定对透明细胞肾细胞癌具有预后价值的微环境相关基因
- DOI:
10.1002/jcb.29654 - 发表时间:
2020-01-21 - 期刊:
- 影响因子:4
- 作者:
Zhao Zhang;Zeyan Li;Zhao Liu;Xiang Zhang;Nengwang Yu;Zhonghua Xu - 通讯作者:
Zhonghua Xu
A performance comparison of DRAM memory system optimizations for SMT processors
SMT 处理器的 DRAM 内存系统优化的性能比较
- DOI:
10.1109/hpca.2005.2 - 发表时间:
2005-02-12 - 期刊:
- 影响因子:0
- 作者:
Zhichun Zhu;Zhao Zhang - 通讯作者:
Zhao Zhang
Association Between Sex and Immune-Related Adverse Events During Immune Checkpoint Inhibitor Therapy.
免疫检查点抑制剂治疗期间性别与免疫相关不良事件之间的关联。
- DOI:
10.1093/jnci/djab035 - 发表时间:
2021-03-10 - 期刊:
- 影响因子:0
- 作者:
Ying Jing;Yongchang Zhang;Jing Wang;Kunyan Li;Xue Chen;Jianfu Heng;Qian Gao;Youqiong Ye;Zhao Zhang;Yaoming Liu;Y. Lou;Steven H. Lin;L. Diao;Hong Liu;Xiang Chen;G. Mills;Leng Han - 通讯作者:
Leng Han
Zhao Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhao Zhang', 18)}}的其他基金
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
- 批准号:
2411294 - 财政年份:2024
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2401245 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2311766 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2312689 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2401246 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2401244 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2106661 - 财政年份:2021
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing
合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享
- 批准号:
2008388 - 财政年份:2020
- 资助金额:
$ 59.97万 - 项目类别:
Standard Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1643271 - 财政年份:2016
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1514229 - 财政年份:2015
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
相似国自然基金
基于随机化的高效可扩展深度学习算法研究
- 批准号:62376131
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
区块链可扩展存储和高频运算高效算法的研究
- 批准号:62072326
- 批准年份:2020
- 资助金额:56 万元
- 项目类别:面上项目
全球数值天气预报谱模式的高效可扩展并行计算技术研究
- 批准号:41875121
- 批准年份:2018
- 资助金额:62.0 万元
- 项目类别:面上项目
大规模网络下面向复杂DoS攻击的可扩展性增强的高效防御方法研究
- 批准号:61601107
- 批准年份:2016
- 资助金额:19.0 万元
- 项目类别:青年科学基金项目
几类具有良好可扩展性的高效并行自适应组合型GAMG法
- 批准号:11571293
- 批准年份:2015
- 资助金额:50.0 万元
- 项目类别:面上项目
相似海外基金
CAREER: Multi-Dimensional Photonic Accelerators for Scalable and Efficient Computing
职业:用于可扩展和高效计算的多维光子加速器
- 批准号:
2337674 - 财政年份:2024
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
CAREER: Scalable and Adaptable Sparsity-driven Methods for more Efficient AI Systems
职业:可扩展且适应性强的稀疏驱动方法,可实现更高效的人工智能系统
- 批准号:
2238291 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
CAREER: Towards Efficient and Scalable Zero-Knowledge Proofs
职业:迈向高效且可扩展的零知识证明
- 批准号:
2401481 - 财政年份:2023
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
CAREER: Towards Efficient and Scalable Zero-Knowledge Proofs
职业:迈向高效且可扩展的零知识证明
- 批准号:
2144625 - 财政年份:2022
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant
CAREER: System Support for Scalable, Fast, and Power-Efficient Genome Sequencing
职业:对可扩展、快速且节能的基因组测序的系统支持
- 批准号:
2143120 - 财政年份:2022
- 资助金额:
$ 59.97万 - 项目类别:
Continuing Grant