Collaborative Research: SHF:SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity

合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复 (EASER) 范式

基本信息

  • 批准号:
    2146852
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-15 至 2023-07-31
  • 项目状态:
    已结题

项目摘要

Heterogeneity in computing refers to having a variety of devices present within one computing system or even within one node of a cluster. A number of technological trends are making a high degree of heterogeneity inevitable in High Performance Computing (HPC), leading to research along many directions. The traditional scheduling problem, which refers to taking a set of programs to be executed and mapping them to the available resources, becomes more complicated in the presence of such heterogeneity, as the schedulers need to interact with the compiler also. The goal of this project is to consider new paradigms for application execution in view of these developments and conduct research in developing predictions of execution times, compilation, parallelization, and scheduling. Traditionally, deciding (likely manually) how an application is to be parallelized, compilation, and cluster-level scheduling are done sequentially and independently. The investigators posit that their isolated treatment is not going to be acceptable when one tries to optimize for multi-tenant heterogeneous clusters. Instead, the investigators envision a requirement that can be referred to as EASER -- compilE-pArallelize-Schedule-rEtarget-Repeat. To elaborate on the vision, in the EASER paradigm the compiler first maps the core functions to a specific device, generating predictions of execution time that are input to the parallelization approach selection module, and together they produce a final executable. Subsequently, this binary is presented to the scheduler, which assesses the job queue and might suggest alternative configuration(s)/device(s). If so, a retargeting module is to be invoked, leading to a potential repetition of the above steps. This project develops, supports, and evaluates the EASER framework in the context of a cluster that executes emerging machine learning (ML) workloads. Research is proposed in the following areas: 1) Compiler-Driven Performance Prediction -- It includes a novel strategy that comprises a general model for predicting SIMD/VLIW performance and an operator classification based approach to developing a memory hierarchy performance model. 2) Integrated Job Scheduling and Parallelization Strategy Selection -- Building on the performance prediction models, these two (conventionally independent) modules are integrated, by including parameterized and incremental parallelization strategy selection methods and aggressively reducing the search space in scheduling methods. 3) Retargeting Compiler -- By classifying optimizations as either architecture-dependent or independent, a retargeting compiler for ML workloads will be developed. This project will also make several contributions to education and human resource development. Both investigators will be introducing course(s) (material) at the intersection of computer systems and machine learning, bringing attention to ML-related workloads in computer systems education. A majority of funds at each University will be used to support Ph.D. students in their research, who will be trained to work across traditional (sub-) areas. Both investigators are strongly committed to increasing diversity in computing fields and have a strong record of supervising members of underrepresented groups in their research programs. Building on their Universities' existing connections, they will be further working on improving diversity at all levels.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
计算中的异构性是指在一个计算系统内甚至在集群的一个节点内存在多种设备。许多技术趋势使得高性能计算 (HPC) 中的高度异构性不可避免,从而导致了多个方向的研究。 传统的调度问题是指获取一组要执行的程序并将它们映射到可用资源,在存在这种异构性的情况下变得更加复杂,因为调度程序还需要与编译器交互。该项目的目标是根据这些发展考虑应用程序执行的新范例,并在开发执行时间、编译、并行化和调度的预测方面进行研究。 传统上,决定(可能手动)应用程序如何并行化、编译和集群级调度是按顺序独立完成的。研究人员认为,当试图优化多租户异构集群时,他们的孤立处理方法是不可接受的。相反,研究人员设想了一个可以称为“EASER”的要求——compilE-pArallelize-Schedule-rEtarget-Repeat。为了详细说明这一愿景,在 EASER 范例中,编译器首先将核心功能映射到特定设备,生成执行时间的预测,并将其输入到并行化方法选择模块,然后它们一起生成最终的可执行文件。随后,该二进制文件被提交给调度程序,调度程序评估作业队列并可能建议替代配置/设备。如果是这样,则将调用重定向模块,从而导致可能重复上述步骤。 该项目在执行新兴机器学习 (ML) 工作负载的集群环境中开发、支持和评估 EASER 框架。提出了以下领域的研究: 1) 编译器驱动的性能预测——它包括一种新颖的策略,其中包括用于预测 SIMD/VLIW 性能的通用模型和用于开发内存层次结构性能模型的基于运算符分类的方法。 2)集成的作业调度和并行化策略选择——在性能预测模型的基础上,通过包含参数化和增量并行化策略选择方法并积极减少调度方法中的搜索空间,集成了这两个(通常是独立的)模块。 3) 重定向编译器——通过将优化分类为依赖于架构或独立的优化,将开发用于 ML 工作负载的重定向编译器。 该项目还将为教育和人力资源开发做出一些贡献。两位研究人员将介绍计算机系统和机器学习交叉领域的课程(材料),引起人们对计算机系统教育中与 ML 相关的工作量的关注。每所大学的大部分资金将用于支持博士学位。从事研究的学生,他们将接受跨传统(子)领域工作的培训。两位研究人员都坚定致力于增加计算领域的多样性,并在监督其研究项目中代表性不足群体的成员方面拥有良好的记录。他们将在大学现有联系的基础上,进一步致力于提高各个层面的多样性。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
End-to-End LU Factorization of Large Matrices on GPUs
GPU 上大型矩阵的端到端 LU 分解
  • DOI:
  • 发表时间:
    2023-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yang Xia; Peng Jiang
  • 通讯作者:
    Peng Jiang
GPU Adaptive In-situ Parallel Analytics (GAP)
GPU 自适应原位并行分析 (GAP)
  • DOI:
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haoyuan Xing; Gagan Agrawal
  • 通讯作者:
    Gagan Agrawal
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Gagan Agrawal其他文献

Organizing Records for Retrieval in Multi-Dimensional Range Searchable Encryption
多维范围可搜索加密中组织检索记录
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mahdieh Heidaripour;Ladan Kian;Maryam Rezapour;Mark Holcomb;Benjamin Fuller;Gagan Agrawal;Hoda Maleki
  • 通讯作者:
    Hoda Maleki
SecFob: A Remote Keyless Entry Security Solution
SecFob:远程无钥匙进入安全解决方案
Scalable Deep Graph Clustering with Random-walk based Self-supervised Learning
具有基于随机游走的自监督学习的可扩展深度图聚类
  • DOI:
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiang Li;Dongxu Li;Ruoming Jin;Gagan Agrawal;R. Ramnath
  • 通讯作者:
    R. Ramnath
Effect of insulation thickness on pressure evolution and thermal stratification in a cryogenic tank
绝热厚度对低温储罐中压力演变和热分层的影响
  • DOI:
    10.1016/j.applthermaleng.2016.07.015
  • 发表时间:
    2017-01-25
  • 期刊:
  • 影响因子:
    6.4
  • 作者:
    Jeswin Joseph;Gagan Agrawal;Deepak Kumar Agarwal;J. C. Pisharady;S. Sunil Kumar
  • 通讯作者:
    S. Sunil Kumar
2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19-23, 2014
2014 IEEE 第 28 届国际并行和分布式处理研讨会,美国亚利桑那州菲尼克斯,2014 年 5 月 19-23 日
  • DOI:
    10.1109/ipdps30335.2014
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yutong Lu;Mehmet Deveci;S. Rajamanickam;V. Leung;Kevin Pedretti;Stephen L. Olivier;David P. Bunde;Umit V. Çatalyürek;L. Peh;Gagan Agrawal;Marcelo Veiga Neves;César A.F. De Rose;K. Katrinis;Hubertus Franke;Yi Yang;Ping Xiang;Michael Mantor;Norman Rubin;Lisa Hsu;Qunfeng Dong;Yuki Abe;Hiroshi Sasaki;Shinpei Kato;Koji Inoue;Alex Ramirez;Jian Huang;Xuechen Zhang;G. Eisenhauer;Karsten Schwan;Matt Wolf;Stephane Ethier;B. Ravindran
  • 通讯作者:
    B. Ravindran

Gagan Agrawal的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Gagan Agrawal', 18)}}的其他基金

SHF: Small: K-Way Speculation for Mapping Applications with Dependencies on Modern HPC Systems
SHF:小型:依赖现代 HPC 系统的地图应用程序的 K-Way 推测
  • 批准号:
    2334273
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2230945
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF:SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复 (EASER) 范式
  • 批准号:
    2333895
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2333899
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2341378
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2007775
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
SHF: Small: K-Way Speculation for Mapping Applications with Dependencies on Modern HPC Systems
SHF:小型:依赖于现代 HPC 系统的地图应用程序的 K-Way 推测
  • 批准号:
    2007793
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2034850
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
II-New: Infrastructure for Energy-Aware High Performance Computing (HPC) and Data Analytics on Heterogeneous Systems
II-新:异构系统上的能源感知高性能计算 (HPC) 和数据分析基础设施
  • 批准号:
    1513120
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: Software Elements for Transfer and Analysis of Large-Scale Scientific Data
SI2-SSE:协作研究:用于大规模科学数据传输和分析的软件元素
  • 批准号:
    1339757
  • 财政年份:
    2013
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant

相似国自然基金

面向5G通信的超高频FBAR耗散机理和耗散稳定性研究
  • 批准号:
    12302200
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
衔接蛋白SHF负向调控胶质母细胞瘤中EGFR/EGFRvIII再循环和稳定性的功能及机制研究
  • 批准号:
    82302939
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
宽运行范围超高频逆变系统架构拓扑与调控策略研究
  • 批准号:
    52377175
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
超高频同步整流DC-DC变换器效率优化关键技术研究
  • 批准号:
    62301375
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
超高频光声频谱渐进式调制下的光声显微成像轴向分辨率提升研究
  • 批准号:
    62265011
  • 批准年份:
    2022
  • 资助金额:
    34 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
  • 批准号:
    2402804
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
  • 批准号:
    2331301
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
  • 批准号:
    2402806
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
  • 批准号:
    2412357
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
  • 批准号:
    2402805
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了