Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
基本信息
- 批准号:2230945
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2023-11-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
As Machine Learning (ML), and especially Deep Neural Network (DNN) workloads have rapidly become prominent, many existing architectures have been enriched with instructions and/or processing capabilities targeting these workloads. Examples of these instructions include AMX instructions from Intel, Tensor cores from NVIDIA, DOT instructions from AMD, and many others. The emergence of such tensorized instructions is leading to many common and related challenges regarding how they can be used for production-level modern DNNs. The current state-of-the-art for exploiting these instruction sets for DNN workloads is very limited, with existing systems either completely lacking attention on these, not addressing global optimizations for complex DNNs, or being limited in other ways. The premise of our work is that a compilation system that is cognizant of the latest DNN trends and can optimize across different tensorized instruction sets, will provide large efficiency gains for modern ML computations. The resulting agenda will likely result in significant technical, economic, and societal impacts. From the technical side, the work impacts areas like High-Performance Computing (HPC), Compilers, and systems supporting AI/ML workloads. As DNNs are becoming an integral part of applications that most humans use, this work is poised to have a large economic and societal impact. On the education side, the research at the intersection of systems and ML will be incorporated into multiple courses and help to increase diversity at all levels in computing education and research, particularly by involving members from underrepresented groups.This project addresses the following challenges associated with modern DNNs and recent and emerging tensorized instructions: 1) Local Instruction Selection for Dense Models -- To improve the execution efficiency of each operator, a critical first issue is selecting tensorized instructions (and associated data layouts), which will be addressed for arbitrary shapes of operators. 2) Global Optimizations for DNNs -- After local operator optimizations, each operator may prefer its own tensorized instruction and data layout, thus incurring significant data layout transformation costs during the execution of an entire DNN. This project formulates and solves a global optimization problem that chooses the right trade-off between the local operator execution and data transformation costs. 3) Optimizations for Dynamic DNNs -- This project also considers various forms of dynamism in modern DNN models including dynamic input shapes, dynamic control flows, and dynamic data structures. It proposes new optimizations such as those for effective memory management, while revisiting others like local and global instruction selection, in the presence of these forms of dynamism. 4) Mapping Sparse Models to Emerging Instructions -- This project also plans to improve the efficiency of using various types of tensorized instructions when sparsity is involved, building on top of earlier work for optimizing kernels like SpMM (and other sparse computations) on GPUs and SIMD instruction sets. 5) (Semi-) Automatic Support for New Instructions -- To minimize the optimization and programming effort, this proposal also introduces a module to automatically optimize DNN computations with new tensorized instructions or features. Besides addressing the above problems, one critical component of this project will be incorporating their implementations, together with code generation for multiple back-ends, in a reusable system. This system will take as the input the Computational Graph representation, and output Tensor and LLVM IRs, thus building around three representations widely used in the industry.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着机器学习(ML),尤其是深度神经网络(DNN)的工作负载迅速变得突出,许多现有的架构都具有针对这些工作负载的指令和/或处理功能。这些说明的示例包括来自Intel的AMX说明,NVIDIA的张量核心,AMD的DOT说明等等。这样的加电说明的出现导致有关如何将它们用于生产级现代DNN的许多常见和相关挑战。为DNN工作负载利用这些指令集的当前最新技术是非常有限的,现有系统要么完全缺乏对这些工作的关注,因此无法解决复杂DNN的全球优化,或者以其他方式受到限制。我们工作的前提是,汇编系统认识到最新的DNN趋势并可以在不同的张力指令集中进行优化,将为现代ML计算提供巨大的效率提高。由此产生的议程可能会产生重大的技术,经济和社会影响。从技术方面,工作会影响高性能计算(HPC),编译器和支持AI/ML工作负载的系统。随着DNN成为大多数人类使用的应用的组成部分,这项工作有望产生巨大的经济和社会影响。 On the education side, the research at the intersection of systems and ML will be incorporated into multiple courses and help to increase diversity at all levels in computing education and research, particularly by involving members from underrepresented groups.This project addresses the following challenges associated with modern DNNs and recent and emerging tensorized instructions: 1) Local Instruction Selection for Dense Models -- To improve the execution efficiency of each operator, a critical first issue is selecting张开指令(以及关联的数据布局),将针对操作员的任意形状进行解决。 2)DNNS的全局优化 - 在本地操作员优化之后,每个操作员都可能喜欢其自己的张力指令和数据布局,从而在执行整个DNN期间会产生重要的数据布局转换成本。该项目制定并解决了一个全球优化问题,该问题选择了本地运营商执行和数据转换成本之间的正确权衡。 3)对动态DNNS的优化 - 该项目还考虑了现代DNN模型中的各种动态,包括动态输入形状,动态控制流和动态数据结构。它提出了新的优化,例如用于有效记忆管理的新优化,同时在存在这些动态形式的情况下重新审视本地和全球教学选择等其他人。 4)将稀疏模型映射到新兴指令 - 该项目还计划提高涉及稀疏性时使用各种张力指令的效率,以较早的工作为基础,以优化诸如GPUS和SIMD指令之类的SPMM(和其他稀疏计算)等内核。 5)(半)自动支持新说明 - 为了最大程度地减少优化和编程工作,该提案还引入了一个模块,以自动通过新的张力指令或功能自动优化DNN计算。除了解决上述问题之外,该项目的一个关键组成部分将在可重复使用的系统中纳入其实现以及多个后端的代码生成。该系统将作为输入计算图表示,输出张量和LLVM IRS,从而构建了该行业中广泛使用的三种表示形式。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛影响的评估标准通过评估来获得支持的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Gagan Agrawal其他文献
Organizing Records for Retrieval in Multi-Dimensional Range Searchable Encryption
多维范围可搜索加密中组织检索记录
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Mahdieh Heidaripour;Ladan Kian;Maryam Rezapour;Mark Holcomb;Benjamin Fuller;Gagan Agrawal;Hoda Maleki - 通讯作者:
Hoda Maleki
CML-062 Define the Vulnerable - Social Determinants of Health Impact on Hematological Malignancies Affecting Children, Adolescents, and Young Adults: Systematic Review and Meta-Analysis
- DOI:
10.1016/s2152-2650(23)01122-9 - 发表时间:
2023-09-01 - 期刊:
- 影响因子:
- 作者:
Muhannad Sharara;Kellen Cristine Tjioe;Marisol Miranda Galvis;Gagan Agrawal;Andrew Balas;Jorge Cortes - 通讯作者:
Jorge Cortes
MMIS-07, 08: Mining Multiple Information Sources Workshop Report
MMIS-07, 08:挖掘多信息源研讨会报告
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
朱兴全;Gagan Agrawal;Yuri Breitbart;Ruoming Jin - 通讯作者:
Ruoming Jin
<strong>POSTER:</strong> MDS-044 Cancer Disparities in Survival of Patients With Hematologic Malignancies in the Context of Social Determinants of Health: A Systematic Review
- DOI:
10.1016/s2152-2650(23)00577-3 - 发表时间:
2023-09-01 - 期刊:
- 影响因子:
- 作者:
Marisol Miranda-Galvis;Kellen Tjioe;Andrew Balas;Gagan Agrawal;Jorge Cortes - 通讯作者:
Jorge Cortes
Middleware for data mining applications on clusters and grids
- DOI:
10.1016/j.jpdc.2007.06.007 - 发表时间:
2008-01-01 - 期刊:
- 影响因子:
- 作者:
Leonid Glimcher;Ruoming Jin;Gagan Agrawal - 通讯作者:
Gagan Agrawal
Gagan Agrawal的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Gagan Agrawal', 18)}}的其他基金
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
- 批准号:
2341378 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2333899 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SHF: Small: K-Way Speculation for Mapping Applications with Dependencies on Modern HPC Systems
SHF:小型:依赖现代 HPC 系统的地图应用程序的 K-Way 推测
- 批准号:
2334273 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SHF:SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复 (EASER) 范式
- 批准号:
2333895 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SHF:SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复 (EASER) 范式
- 批准号:
2146852 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2007775 - 财政年份:2020
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2034850 - 财政年份:2020
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SHF: Small: K-Way Speculation for Mapping Applications with Dependencies on Modern HPC Systems
SHF:小型:依赖于现代 HPC 系统的地图应用程序的 K-Way 推测
- 批准号:
2007793 - 财政年份:2020
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
II-New: Infrastructure for Energy-Aware High Performance Computing (HPC) and Data Analytics on Heterogeneous Systems
II-新:异构系统上的能源感知高性能计算 (HPC) 和数据分析基础设施
- 批准号:
1513120 - 财政年份:2015
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SI2-SSE: Collaborative Research: Software Elements for Transfer and Analysis of Large-Scale Scientific Data
SI2-SSE:协作研究:用于大规模科学数据传输和分析的软件元素
- 批准号:
1339757 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
IL-17A通过STAT5影响CNS2区域甲基化抑制调节性T细胞功能在银屑病发病中的作用和机制研究
- 批准号:82304006
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
miR-20a通过调控CD4+T细胞焦亡促进CNS炎性脱髓鞘疾病的发生及机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
miR-20a通过调控CD4+T细胞焦亡促进CNS炎性脱髓鞘疾病的发生及机制研究
- 批准号:82201491
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
血浆CNS来源外泌体中寡聚磷酸化α-synuclein对PD病程的提示研究
- 批准号:82101506
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于脑微血管内皮细胞模型的毒力岛4在单增李斯特菌CNS炎症中的作用及机制研究
- 批准号:32160834
- 批准年份:2021
- 资助金额:35 万元
- 项目类别:地区科学基金项目
相似海外基金
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
- 批准号:
2406598 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
- 批准号:
2418188 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
- 批准号:
2345339 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
- 批准号:
2225578 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: Creating An Extensible Internet Through Interposition
合作研究:CNS核心:小:通过介入创建可扩展的互联网
- 批准号:
2242503 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant