CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
基本信息
- 批准号:2339084
- 负责人:
- 金额:$ 88.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-05-01 至 2029-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Artificial intelligence (AI) has entered the "age of scale". Huge amounts of training data are being used to train enormous deep neural networks (DNNs) on large-scale computers as epitomized by the rise of large language models (LLMs). The extremely high demand for this technology is clearly evident, as recently exemplified by ChatGPT: an LLM chatbot that garnered 100 million active users merely two months post-release, setting a new world record. However, deploying LLMs can be quite costly, given that their memory footprint can extend to terabytes of data while also demanding high computational resources. Consequently, large-scale distributed computers have become essential, particularly to meet the performance required for interactive applications. To improve efficiency, this project tackles new challenges that are specific to LLMs, including their large memory footprint, varying computational demands, and distributed computing. This is critical to make LLMs more accessible and sustainable for widespread use. Concurrently, this award seeks to develop a diverse AI workforce proficient in algorithms, hardware, and software, achieved through a large-scale AI course for diverse student population at public universities, comprehensive curriculum integration, and student mentorship at both graduate and undergraduate levels.This project will enable the codesign of LLMs and distributed computing platforms, divided into three major thrusts that correspond to three levels of the computing stack: software, hardware, and algorithms. Initially, the project will focus on automated partitioning and mapping algorithms, as these form the foundations by which LLMs can be deployed and optimized on both existing and new distributed computing platforms. Key to this research thrust is the development of an extensible hardware performance estimator that can model current GPU-based systems alongside new distributed computing approaches. In particular, the second thrust investigates the use of in-network and near-storage FPGAs within distributed systems to speed up LLM inference. The final thrust investigates platform-aware compression for LLMs, including mixed-precision quantization and low-rank approximation. In addition to improving LLM efficiency across the computing stack, this project will develop a research framework to synergistically co-optimize LLMs and distributed hardware platforms, resulting in new optimized LLM computing systems and implementation methodologies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人工智能(AI)进入了“规模时代”。大量的培训数据被用来训练大型计算机上的巨大深度神经网络(DNN),这是大型语言模型(LLMS)体现的缩影。对这项技术的极高需求显而易见,这是Chatgpt:LLM聊天机器人最近举例说明的,该聊天机器人仅在发行后两个月就获得了1亿活跃用户,从而创造了新的世界纪录。但是,鉴于它们的内存足迹可以扩展到数据的数据,同时还需要高计算资源,因此部署LLM的部署可能会很昂贵。因此,大规模的分布式计算机已成为必不可少的,尤其是满足交互式应用所需的性能。为了提高效率,该项目解决了特定于LLM的新挑战,包括其大型内存足迹,不同的计算需求和分布式计算。这对于使LLMS更容易访问和可持续发展至关重要。同时,该奖项旨在通过大型AI课程来开发熟练熟练,硬件和软件的多元化AI劳动力,该课程为公立大学的多元化学生群体,全面的课程融合,以及在研究生和本科层面上的学生指导,以启用三个级别的计算级别的计算级别的计算,并分配了三个级别的计算级别的计算。堆栈:软件,硬件和算法。最初,该项目将专注于自动分区和映射算法,因为这些构成了可以在现有和新的分布式计算平台上部署和优化LLMS的基础。这项研究推力的关键是开发可扩展的硬件性能估计器,该估计器可以将基于GPU的当前系统与新的分布式计算方法建模。特别是,第二个推力研究了在分布式系统中使用网络内和近存储FPGA的使用来加快LLM推断。最终推力研究了LLM的平台感知压缩,包括混合精确量化和低级别近似。除了提高计算堆栈中的LLM效率外,该项目还将开发一个研究框架,以协同挑选LLM和分布式硬件平台,从而获得新的优化的LLM计算系统和实施方法。该奖项颁发奖项,反映了NSF的法定任务,并通过评估了基金会的范围,并通过评估了基金会的范围,并已通过评估范围进行了支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mohamed Abdelfattah其他文献
Addressing rural health disparity with a novel hospital sleep apnea screening: Precision of a high-resolution pulse oximeter in screening for sleep-disordered breathing
通过新型医院睡眠呼吸暂停筛查解决农村健康差距:高分辨率脉搏血氧仪筛查睡眠呼吸障碍的精确度
- DOI:
10.1007/s11325-021-02559-x - 发表时间:
2022 - 期刊:
- 影响因子:2.5
- 作者:
R. Stansbury;V. Badami;E. Rojas;S. Naqvi;Joshua Easterling;Mohamed Abdelfattah;S. Quan;Sunil Sharma - 通讯作者:
Sunil Sharma
Investigation and monitoring of rotational landslides in El Mokkattam plateau Egypt, using integrated geological and geophysical techniques
- DOI:
10.1016/j.heliyon.2024.e36545 - 发表时间:
2024-09-15 - 期刊:
- 影响因子:
- 作者:
Mohamed A. Gamal;Mohamed Abdelfattah;George Maher - 通讯作者:
George Maher
Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel
探索每像素微比特语义图像压缩的极限
- DOI:
10.48550/arxiv.2402.13536 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jordan Dotzel;Bahaa Kotb;James Dotzel;Mohamed Abdelfattah;Zhiru Zhang - 通讯作者:
Zhiru Zhang
Izumiphenazine A, B and C : novel phenazine derivatives isolated from Streptomyces sp.IFM 11204
Izumiphenazine A、B 和 C:从链霉菌属 sp.IFM 11204 中分离出的新型吩嗪衍生物
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:5.1
- 作者:
安川智之;ら;Mohamed Abdelfattah - 通讯作者:
Mohamed Abdelfattah
Versatility of the Osseodensified Crestal Sinus Lifting Technique as Alternative Procedure for the Lateral Sinus Technique with Simultaneous Implant Placement
骨致密化牙槽嵴窦提升技术作为同步种植体植入的侧窦技术替代手术的多功能性
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Elhussieny Mohammed Ahmed Ahmed;A. Abdullah;Ashraf Mahmoud;Noha Abd El Aziz;Mohamed Abdelfattah - 通讯作者:
Mohamed Abdelfattah
Mohamed Abdelfattah的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mohamed Abdelfattah', 18)}}的其他基金
SHF: Small: Domain-Specific FPGAs to Accelerate Unrolled DNNs with Fine-Grained Unstructured Sparsity and Mixed Precision
SHF:小型:特定领域 FPGA 加速具有细粒度非结构化稀疏性和混合精度的展开 DNN
- 批准号:
2303626 - 财政年份:2023
- 资助金额:
$ 88.31万 - 项目类别:
Standard Grant
相似国自然基金
高效能、大反Stokes位移的三重态湮灭上转换发光的研究
- 批准号:22303056
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高效负载递释中药华蟾素的Pt金属大环基二维超分子组装体的构筑及肝癌联合治疗研究
- 批准号:82304889
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于大光学厚度及光晶格技术的高效率长寿命量子存储器
- 批准号:62305104
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
基于主动学习代理模型的无砟轨道大跨度斜拉桥轨道实时平顺状态可靠性高效评估方法研究
- 批准号:52308223
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
高效大容量多相永磁直驱电力推进系统的关键科学问题研究
- 批准号:U22A20219
- 批准年份:2022
- 资助金额:255.00 万元
- 项目类别:联合基金项目
相似海外基金
CAREER: A Multi-faceted Framework to Enable Computationally Efficient Evaluation and Automatic Design for Large-scale Economics-driven Transmission Planning
职业生涯:一个多方面的框架,可实现大规模经济驱动的输电规划的计算高效评估和自动设计
- 批准号:
2339956 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
- 批准号:
2340011 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Standard Grant
CAREER: Algorithm-Hardware Co-design of Efficient Large Graph Machine Learning for Electronic Design Automation
职业:用于电子设计自动化的高效大图机器学习的算法-硬件协同设计
- 批准号:
2340273 - 财政年份:2024
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: Toward Hierarchical Game Theory and Hybrid Learning Framework for Safe, Efficient Large-scale Multi-agent Systems
职业:面向安全、高效的大规模多智能体系统的分层博弈论和混合学习框架
- 批准号:
2144646 - 财政年份:2022
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant
CAREER: A Parallel and Efficient Computational Framework for Unified Volumetric Meshing in Large-Scale 3D/4D Anisotropy
职业生涯:大规模 3D/4D 各向异性中统一体积网格划分的并行高效计算框架
- 批准号:
1845962 - 财政年份:2019
- 资助金额:
$ 88.31万 - 项目类别:
Continuing Grant