CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware

职业：通过协同设计进行高效的大型语言模型推理：适应性软件分区和基于 FPGA 的分布式硬件

基本信息

批准号：
2339084
负责人：
Mohamed Abdelfattah
金额：
$ 88.31万
依托单位：
Cornell University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-05-01 至 2029-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339084&HistoricalAwards=false
关键词：
CAREER Efficient Large Language Model

项目摘要

Artificial intelligence (AI) has entered the "age of scale". Huge amounts of training data are being used to train enormous deep neural networks (DNNs) on large-scale computers as epitomized by the rise of large language models (LLMs). The extremely high demand for this technology is clearly evident, as recently exemplified by ChatGPT: an LLM chatbot that garnered 100 million active users merely two months post-release, setting a new world record. However, deploying LLMs can be quite costly, given that their memory footprint can extend to terabytes of data while also demanding high computational resources. Consequently, large-scale distributed computers have become essential, particularly to meet the performance required for interactive applications. To improve efficiency, this project tackles new challenges that are specific to LLMs, including their large memory footprint, varying computational demands, and distributed computing. This is critical to make LLMs more accessible and sustainable for widespread use. Concurrently, this award seeks to develop a diverse AI workforce proficient in algorithms, hardware, and software, achieved through a large-scale AI course for diverse student population at public universities, comprehensive curriculum integration, and student mentorship at both graduate and undergraduate levels.This project will enable the codesign of LLMs and distributed computing platforms, divided into three major thrusts that correspond to three levels of the computing stack: software, hardware, and algorithms. Initially, the project will focus on automated partitioning and mapping algorithms, as these form the foundations by which LLMs can be deployed and optimized on both existing and new distributed computing platforms. Key to this research thrust is the development of an extensible hardware performance estimator that can model current GPU-based systems alongside new distributed computing approaches. In particular, the second thrust investigates the use of in-network and near-storage FPGAs within distributed systems to speed up LLM inference. The final thrust investigates platform-aware compression for LLMs, including mixed-precision quantization and low-rank approximation. In addition to improving LLM efficiency across the computing stack, this project will develop a research framework to synergistically co-optimize LLMs and distributed hardware platforms, resulting in new optimized LLM computing systems and implementation methodologies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人工智能（AI）进入了“规模时代”。大量的培训数据被用来训练大型计算机上的巨大深度神经网络（DNN），这是大型语言模型（LLMS）体现的缩影。对这项技术的极高需求显而易见，这是Chatgpt：LLM聊天机器人最近举例说明的，该聊天机器人仅在发行后两个月就获得了1亿活跃用户，从而创造了新的世界纪录。但是，鉴于它们的内存足迹可以扩展到数据的数据，同时还需要高计算资源，因此部署LLM的部署可能会很昂贵。因此，大规模的分布式计算机已成为必不可少的，尤其是满足交互式应用所需的性能。为了提高效率，该项目解决了特定于LLM的新挑战，包括其大型内存足迹，不同的计算需求和分布式计算。这对于使LLMS更容易访问和可持续发展至关重要。同时，该奖项旨在通过大型AI课程来开发熟练熟练，硬件和软件的多元化AI劳动力，该课程为公立大学的多元化学生群体，全面的课程融合，以及在研究生和本科层面上的学生指导，以启用三个级别的计算级别的计算级别的计算，并分配了三个级别的计算级别的计算。堆栈：软件，硬件和算法。最初，该项目将专注于自动分区和映射算法，因为这些构成了可以在现有和新的分布式计算平台上部署和优化LLMS的基础。这项研究推力的关键是开发可扩展的硬件性能估计器，该估计器可以将基于GPU的当前系统与新的分布式计算方法建模。特别是，第二个推力研究了在分布式系统中使用网络内和近存储FPGA的使用来加快LLM推断。最终推力研究了LLM的平台感知压缩，包括混合精确量化和低级别近似。除了提高计算堆栈中的LLM效率外，该项目还将开发一个研究框架，以协同挑选LLM和分布式硬件平台，从而获得新的优化的LLM计算系统和实施方法。该奖项颁发奖项，反映了NSF的法定任务，并通过评估了基金会的范围，并通过评估了基金会的范围，并已通过评估范围进行了支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Mohamed Abdelfattah其他文献

Addressing rural health disparity with a novel hospital sleep apnea screening: Precision of a high-resolution pulse oximeter in screening for sleep-disordered breathing

通过新型医院睡眠呼吸暂停筛查解决农村健康差距：高分辨率脉搏血氧仪筛查睡眠呼吸障碍的精确度

DOI：
10.1007/s11325-021-02559-x
发表时间：
2022
期刊：
Sleep and Breathing
影响因子：
2.5
作者：
R. Stansbury;V. Badami;E. Rojas;S. Naqvi;Joshua Easterling;Mohamed Abdelfattah;S. Quan;Sunil Sharma
通讯作者：
Sunil Sharma

Investigation and monitoring of rotational landslides in El Mokkattam plateau Egypt, using integrated geological and geophysical techniques

DOI：
10.1016/j.heliyon.2024.e36545
发表时间：
2024-09-15
期刊：
Research article
影响因子：
作者：
Mohamed A. Gamal;Mohamed Abdelfattah;George Maher
通讯作者：
George Maher

Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel

探索每像素微比特语义图像压缩的极限

DOI：
10.48550/arxiv.2402.13536
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Jordan Dotzel;Bahaa Kotb;James Dotzel;Mohamed Abdelfattah;Zhiru Zhang
通讯作者：
Zhiru Zhang

Izumiphenazine A, B and C : novel phenazine derivatives isolated from Streptomyces sp.IFM 11204

Izumiphenazine A、B 和 C：从链霉菌属 sp.IFM 11204 中分离出的新型吩嗪衍生物

DOI：
发表时间：
2010
期刊：
Journal of Natural Products
影响因子：
5.1
作者：
安川智之;ら;Mohamed Abdelfattah
通讯作者：
Mohamed Abdelfattah

Versatility of the Osseodensified Crestal Sinus Lifting Technique as Alternative Procedure for the Lateral Sinus Technique with Simultaneous Implant Placement

骨致密化牙槽嵴窦提升技术作为同步种植体植入的侧窦技术替代手术的多功能性

DOI：
发表时间：
2023
期刊：
Al-Azhar Assiut Dental Journal
影响因子：
0
作者：
Elhussieny Mohammed Ahmed Ahmed;A. Abdullah;Ashraf Mahmoud;Noha Abd El Aziz;Mohamed Abdelfattah
通讯作者：
Mohamed Abdelfattah