Collaborative Research: SHF: MEDIUM: Smart Integrated Tuning of Parallel Code for Multicore and Manycore Systems

合作研究：SHF：MEDIUM：多核和众核系统并行代码的智能集成调整

基本信息

批准号：
2211982
负责人：
Ali Jannesari
金额：
$ 50.23万
依托单位：
Iowa State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2211982&HistoricalAwards=false
关键词：
Collaborative Research SHF MEDIUM Smart

项目摘要

High Performance Computing (HPC) entails executing code on multicore and manycore architectures. To better utilize multicore/manycore architectures, parallel programming models have emerged. But often using these parallel models naively will not be able to scratch the surface of the potential performance gains such systems can provide. A common technique for improving performance is to add more hardware resources. However, this is expensive and system integration is usually an onerous task. To this end, the investigators propose a framework of improving performance by better utilization of the available resource and identifying near-optimal configuration. These configurations can take the form of code optimizations, as well as intelligent resource mapping and utilization. Specifically, this project is concerned with identifying code optimizations and runtime configurations that can potentially speed up executions manifold. Faster executions can also implicitly lead to reduced power consumption. Additionally, for situations where existing execution performance is acceptable, the proposed approach can also be extended to optimize for other performance metrics such as power. Power consumption is usually a huge bottleneck for HPC systems, and is a source of concern for organizations that deploy such systems; these concerns are both fiscal and environmental. The investigators posit that the framework outlined in this project can also be extended to optimize for power consumption without compromising execution performance.The investigators’ aim is to provide such an AI-assisted framework that can automatically configure parallel code considering the underlying hardware architecture. The steps necessary to build such a framework lie at the convergence of compiler technologies, performance analysis and modeling, and deep learning. A primary driver of this project will be developing a program representation technique targeted towards parallel code. Existing representations target mostly serial code and cannot fully encapsulate the interactions and complexities of parallel code. Such a code representation technique is highly suited to analyses using deep learning. A means of representing parallel code in a machine learning friendly format will be very beneficial to the overall program analysis community. The proposed code representation will take the form of a graph, in order to correctly typify the inherent structure present in code. The investigators propose modeling this code representation using state-of-the-art Graph Neural Network (GNN) techniques. The modeled embeddings will be used in conjunction with task specific features in order to identify near optimum configurations for improved performance. The overall scale of this project will span the entire “source code to execution” pipeline that most HPC workloads follow. The aim of this project is to optimize each optimizable step in the pipeline. A sample optimization pipeline can take the following form: given a parallel code, our GNN-based code optimization model will predict the best optimizations for the given code, followed by identifying the best device (CPU, GPU, and others) for executing the optimized code. Further downstream, our framework will identify the optimum runtime configurations appropriate for the device under consideration. The ideas presented in this project can have the potential effect of increased hardware utilization and reduced future hardware commissioning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

高性能计算（HPC）需要在多核心和许多核心体系结构上执行代码。为了更好地利用多核/多核体系结构，已经出现了并行编程模型。但是通常，自然使用这些平行模型将无法刮擦这种系统可以提供的潜在性能增益的表面。提高性能的常见技术是添加更多的硬件资源。但是，这很昂贵，系统集成通常是一项繁重的任务。为此，调查人员提出了一个通过更好地利用可用资源并识别近乎最佳配置来提高性能的框架。这些配置可以采用代码优化的形式，以及智能资源映射和利用率。具体而言，该项目与识别可以加快执行歧视的代码优化和运行时配置有关。更快的执行速度还可能导致功耗减少。此外，对于可以接受现有执行绩效的情况，还可以扩展所提出的方法以优化其他性能指标，例如Power。功耗通常是HPC系统的巨大瓶颈，并且是部署此类系统的组织的关注来源。这些担忧既是财政和环境。调查人员的批准也可以扩展该项目中概述的框架以优化以在不妥协执行绩效的情况下进行功耗。研究人员的目的是提供这样的AI辅助框架，该框架可以自动配置并行代码考虑基础硬件架构。建立这样的框架所需的步骤在于编译器技术，绩效分析和建模以及深度学习的融合。该项目的主要驱动力是开发针对并行代码的程序表示技术。现有表示形式主要针对序列号，并且无法完全封装并行代码的交互和复杂性。这种代码表示技术非常适合使用深度学习进行分析。用机器学习友好格式代表并行代码的一种方法将对整个程序分析社区非常有益。提出的代码表示形式将采用图的形式，以正确地代表代码中存在的内部结构。研究人员建议使用最先进的图形神经网络（GNN）技术对此代码表示进行建模。建模的嵌入将与任务特定功能一起使用，以确定近乎优化配置以提高性能。该项目的整体规模将涵盖大多数HPC工作负载的整个“源代码到执行”管道。该项目的目的是优化管道中的每个优化步骤。示例优化管道可以采用以下表格：给定平行代码，我们的基于GNN的代码优化模型将预测给定代码的最佳优化，然后确定执行优化代码的最佳设备（CPU，GPU和其他）。进一步的下游，我们的框架将确定适用于所考虑设备的最佳运行时配置。该项目中提出的想法可以具有增加硬件利用率的潜在影响并减少未来的硬件调试。该奖项反映了NSF的法定任务，并使用基金会的知识分子优点和更广泛的影响审查标准，被认为是珍贵的支持。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Power Constrained Autotuning using Graph Neural Networks

使用图神经网络进行功率约束自动调整

DOI：
10.1109/ipdps54959.2023.00060
发表时间：
2023
期刊：
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS
影响因子：
0
作者：
Dutta, Akash;Choi, Jee;Jannesari, Ali
通讯作者：
Jannesari, Ali

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

DOI：
10.1145/3588195.3592984
发表时间：
2023-04
期刊：
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
影响因子：
0
作者：
Akashnil Dutta;J. Alcaraz;Ali TehraniJamsaz;Eduardo César;A. Sikora;A. Jannesari
通讯作者：
Akashnil Dutta;J. Alcaraz;Ali TehraniJamsaz;Eduardo César;A. Sikora;A. Jannesari