CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
基本信息
- 批准号:2339084
- 负责人:
- 金额:$ 88.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-05-01 至 2029-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Artificial intelligence (AI) has entered the "age of scale". Huge amounts of training data are being used to train enormous deep neural networks (DNNs) on large-scale computers as epitomized by the rise of large language models (LLMs). The extremely high demand for this technology is clearly evident, as recently exemplified by ChatGPT: an LLM chatbot that garnered 100 million active users merely two months post-release, setting a new world record. However, deploying LLMs can be quite costly, given that their memory footprint can extend to terabytes of data while also demanding high computational resources. Consequently, large-scale distributed computers have become essential, particularly to meet the performance required for interactive applications. To improve efficiency, this project tackles new challenges that are specific to LLMs, including their large memory footprint, varying computational demands, and distributed computing. This is critical to make LLMs more accessible and sustainable for widespread use. Concurrently, this award seeks to develop a diverse AI workforce proficient in algorithms, hardware, and software, achieved through a large-scale AI course for diverse student population at public universities, comprehensive curriculum integration, and student mentorship at both graduate and undergraduate levels.This project will enable the codesign of LLMs and distributed computing platforms, divided into three major thrusts that correspond to three levels of the computing stack: software, hardware, and algorithms. Initially, the project will focus on automated partitioning and mapping algorithms, as these form the foundations by which LLMs can be deployed and optimized on both existing and new distributed computing platforms. Key to this research thrust is the development of an extensible hardware performance estimator that can model current GPU-based systems alongside new distributed computing approaches. In particular, the second thrust investigates the use of in-network and near-storage FPGAs within distributed systems to speed up LLM inference. The final thrust investigates platform-aware compression for LLMs, including mixed-precision quantization and low-rank approximation. In addition to improving LLM efficiency across the computing stack, this project will develop a research framework to synergistically co-optimize LLMs and distributed hardware platforms, resulting in new optimized LLM computing systems and implementation methodologies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人工智能(AI)已进入“规模时代”。大量的训练数据被用来在大型计算机上训练巨大的深度神经网络(DNN),大型语言模型(LLM)的兴起就是一个缩影。对这项技术的极高需求是显而易见的,最近的 ChatGPT 就是一个例子:一款 LLM 聊天机器人在发布仅两个月后就获得了 1 亿活跃用户,创下了新的世界纪录。然而,部署 LLM 的成本可能相当高,因为它们的内存占用可能会扩展到 TB 级的数据,同时还需要大量的计算资源。因此,大规模分布式计算机变得至关重要,特别是为了满足交互式应用程序所需的性能。为了提高效率,该项目解决了法学硕士特有的新挑战,包括大内存占用、不同的计算需求和分布式计算。这对于使法学硕士更容易获得和可持续广泛使用至关重要。同时,该奖项旨在通过针对公立大学多元化学生群体的大规模人工智能课程、全面的课程整合以及研究生和本科生阶段的学生指导来培养精通算法、硬件和软件的多元化人工智能劳动力。该项目将实现法学硕士和分布式计算平台的协同设计,分为三个主要方向,对应计算堆栈的三个级别:软件、硬件和算法。最初,该项目将重点关注自动分区和映射算法,因为这些算法构成了 LLM 在现有和新的分布式计算平台上部署和优化的基础。这项研究的关键是开发可扩展的硬件性能估计器,该估计器可以对当前基于 GPU 的系统以及新的分布式计算方法进行建模。特别是,第二个重点研究了分布式系统中网络内和近存储 FPGA 的使用,以加速 LLM 推理。最后的重点是研究 LLM 的平台感知压缩,包括混合精度量化和低秩近似。除了提高整个计算堆栈的 LLM 效率外,该项目还将开发一个研究框架,以协同优化 LLM 和分布式硬件平台,从而产生新的优化的 LLM 计算系统和实施方法。该奖项反映了 NSF 的法定使命,并被视为值得通过使用基金会的智力优点和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mohamed Abdelfattah其他文献
Addressing rural health disparity with a novel hospital sleep apnea screening: Precision of a high-resolution pulse oximeter in screening for sleep-disordered breathing
通过新型医院睡眠呼吸暂停筛查解决农村健康差距:高分辨率脉搏血氧仪筛查睡眠呼吸障碍的精确度
- DOI:
10.1007/s11325-021-02559-x - 发表时间:
2022 - 期刊:
- 影响因子:2.5
- 作者:
R. Stansbury;V. Badami;E. Rojas;S. Naqvi;Joshua Easterling;Mohamed Abdelfattah;S. Quan;Sunil Sharma - 通讯作者:
Sunil Sharma
Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel
探索每像素微比特语义图像压缩的极限
- DOI:
10.48550/arxiv.2402.13536 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jordan Dotzel;Bahaa Kotb;James Dotzel;Mohamed Abdelfattah;Zhiru Zhang - 通讯作者:
Zhiru Zhang
A Novel Hybrid Binarization Technique for Images of Historical Arabic Manuscripts
一种新颖的阿拉伯历史手稿图像混合二值化技术
- DOI:
10.24846/v24i3y201504 - 发表时间:
2015 - 期刊:
- 影响因子:1.6
- 作者:
A. Hassanien;Mohamed Abdelfattah;K. M. Amin;Sherihan Mohamed - 通讯作者:
Sherihan Mohamed
New glycosylated phenazine derivatives from the actinomycete CKK613
来自放线菌 CKK613 的新型糖基化吩嗪衍生物
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
安川智之;ら;Mohamed Abdelfattah - 通讯作者:
Mohamed Abdelfattah
Izumiphenazine A, B and C : novel phenazine derivatives isolated from Streptomyces sp.IFM 11204
Izumiphenazine A、B 和 C:从链霉菌属 sp.IFM 11204 中分离出的新型吩嗪衍生物
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:5.1
- 作者:
安川智之;ら;Mohamed Abdelfattah - 通讯作者:
Mohamed Abdelfattah
Mohamed Abdelfattah的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mohamed Abdelfattah', 18)}}的其他基金
SHF: Small: Domain-Specific FPGAs to Accelerate Unrolled DNNs with Fine-Grained Unstructured Sparsity and Mixed Precision
SHF:小型:特定领域 FPGA 加速具有细粒度非结构化稀疏性和混合精度的展开 DNN
- 批准号:
2303626 - 财政年份:2023
- 资助金额:
$ 88.31万 - 项目类别:
Standard Grant
相似国自然基金
高效负载递释中药华蟾素的Pt金属大环基二维超分子组装体的构筑及肝癌联合治疗研究
- 批准号:82304889
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高效能、大反Stokes位移的三重态湮灭上转换发光的研究
- 批准号:22303056
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于自适应代理模型的山区大跨桥梁风致列车运行安全可靠性高效评估
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大产率、高效率电还原CO2制多碳产物的C-C偶联机制研究
- 批准号:22272078
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
氨基酸修饰MOFs(Zn)功能仿生识别材料构筑及对钦州大蚝ACE抑制多肽的高效分离
- 批准号:
- 批准年份:2022
- 资助金额:33 万元
- 项目类别:地区科学基金项目
相似海外基金
CAREER: A Multi-faceted Framework to Enable Computationally Efficient Evaluation and Automatic Design for Large-scale Economics-driven Transmission Planning
职业生涯:一个多方面的框架,可实现大规模经济驱动的输电规划的计算高效评估和自动设计
- 批准号:
2339956 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
- 批准号:
2340011 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Standard Grant
CAREER: Algorithm-Hardware Co-design of Efficient Large Graph Machine Learning for Electronic Design Automation
职业:用于电子设计自动化的高效大图机器学习的算法-硬件协同设计
- 批准号:
2340273 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: Toward Hierarchical Game Theory and Hybrid Learning Framework for Safe, Efficient Large-scale Multi-agent Systems
职业:面向安全、高效的大规模多智能体系统的分层博弈论和混合学习框架
- 批准号:
2144646 - 财政年份:2022
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: A Parallel and Efficient Computational Framework for Unified Volumetric Meshing in Large-Scale 3D/4D Anisotropy
职业生涯:大规模 3D/4D 各向异性中统一体积网格划分的并行高效计算框架
- 批准号:
1845962 - 财政年份:2019
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant