SHF: Small: Empirical Autotuning of Parallel Computation for Scalable Hybrid Systems

SHF:小型:可扩展混合系统并行计算的经验自动调整

基本信息

  • 批准号:
    1527706
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-07-15 至 2019-06-30
  • 项目状态:
    已结题

项目摘要

Today, scientific and engineering computing is synonymous with parallel computing, and applications such as climate modeling, drug design, aircraft design, etc. utilize very large supercomputer installations, with power consumption measured in MegaWatts, and the cost of electricity measured in millions of dollars. At the same time, every parallel application requires some level of tuning to ensure that the software is mapped appropriately to the hardware. Otherwise, suboptimal performance can lead to lost cycles, kilowatt-hours, and, ultimately, dollars. Tuning the application by making repeated runs is also a wasteful option at very large scale. The DARE project addresses this problem by tuning the application through modeling and simulation of its behavior at very large scale, rather than actually running it. Therefore, resources required for tuning are marginal compared to those consumed in production runs. DARE is based on the observation that the same approach that replaces a wind tunnel with a computer simulation of the airfoil can be applied to the software itself. Two aspects of today's high-end computing landscape make the DARE work unique: 1) the prevalence of hardware accelerators, such as Graphics Processing Units and Xeon Phi co-processors, and 2) adoption of task-based, dynamic, work scheduling systems as an alternative to traditional, lock-step parallel programming models. In particular, DARE combines three components into a refinement loop: a hardware analysis component, a kernel modeling component, and a workload simulation component. The role of the hardware analysis component is to extract the basic hardware information, such as processing power and data link speed. The role of the kernel modeling component is to provide performance models of the serial kernels that constitute the building blocks of the parallel program. Finally, the role of the simulation component is to simulate large-scale parallel workloads.The hardware analysis component gathers the basic knowledge about the system, such as: the number of CPU sockets per shared memory node, the number of CPU cores in each socket, the cache hierarchy, existence of hyper-threading, number of NUMA nodes and proximity of CPUs to NUMA nodes, number of GPU accelerators or Xeon Phi co-processors and capacities of their device memories, and the topology and bandwidth of data links, both within each node (busses), and between nodes (network switches). Part of this knowledge can be gathered by using appropriate query APIs, such as hwloc, netloc, PAPI, and those provided in the CUDA SDK, OpenCL SDK, and Xeon Phi SDK. Synthetic tests can be used for parameters that cannot be established in this manner.Kernels are essentially the serial building blocks of parallel problems. Although kernels are usually characterized by serial control flow, most of the time they already rely on a high degree of data parallelism. Today's CPUs get most of their performance from SIMD parallelism, and GPUs get their performance from massive SIMT parallelism. The role of the kernel modeling component is two-fold: 1) to tune kernels for maximum performance at a given granularity, 2) to provide the kernel performance model as a function of granularity, which is changing to accommodate parallel execution.DARE turns to a stochastic time-stepping simulation in order to predict the performance of a dynamic runtime scheduler for two fundamental reasons: 1) Building good performance models on the basis of benchmarking actual parallel runs requires a significant number of runs with significant problem sizes, which is simply too time consuming. And 2), the impact of many tuning parameters is too complex to be modeled by sparsely sampling the tuning space and fitting simple curves / surfaces to the sample points. The answer to the problem is to replace the run with a time stepping simulation, where a given task-based scheduler is used for assigning tasks to cores, but instead of invoking actual kernel tasks, control is passed to a progress tracking simulation system, which relies on kernel performance models to simulate the execution of the tasks and produce a virtual trace of the simulated execution. The performance advantage is twofold: 1) Simulating a single run is much faster than actually making that run, and 2) Many simulations can be run in parallel allowing for fast sweeps through a large parameter search space.DARE replaces the standard waterfall autotuning process with a process that is incremental and iterative in nature. The power of the DARE approach lies in the mutual refinement loop, where each of the three phases is capable of massively pruning the search space for the other two. As a result, very high quality models can be built for a particular workload, since time is being spent refining the model for the conditions that actually apply, rather than sampling the search space in areas never touched at runtime.
如今,科学和工程计算是并行计算的代名词,以及诸如气候建模,药物设计,飞机设计等的应用。利用非常大的超级计算机安装,并用兆瓦进行了衡量的功耗,以及以百万美元的电力衡量的电力成本。同时,每个并行应用都需要一定级别的调整,以确保软件适当地映射到硬件。否则,次优的性能会导致损失的周期,千瓦时和最终的美元。通过重复运行来调整应用程序也是非常大规模的浪费选择。 DARE项目通过大规模地对其行为进行建模和模拟来调整应用程序,而不是实际运行它。因此,与生产运行中消费的资源相比,调整所需的资源是微不足道的。 DARE基于这样的观察,即可以将机翼的计算机模拟代替风洞的相同方法应用于软件本身。当今的高端计算景观的两个方面使敢于工作的独特之处:1)硬件加速器的普遍性,例如图形处理单元和Xeon Phi辅助处理器,以及2)采用基于任务的动态,工作调度系统作为传统的,锁定的锁定系统的替代方法。尤其是,Dare将三个组件结合到一个改进循环中:硬件分析组件,内核建模组件和工作负载仿真组件。硬件分析组件的作用是提取基本硬件信息,例如处理能力和数据链路速度。内核建模组件的作用是提供构成并行程序构件的串行内核的性能模型。 Finally, the role of the simulation component is to simulate large-scale parallel workloads.The hardware analysis component gathers the basic knowledge about the system, such as: the number of CPU sockets per shared memory node, the number of CPU cores in each socket, the cache hierarchy, existence of hyper-threading, number of NUMA nodes and proximity of CPUs to NUMA nodes, number of GPU其设备记忆的加速器或Xeon Phi协会及其能力,以及每个节点(BUSSES)和节点(网络交换机)之间的数据链接的拓扑和带宽。可以使用适当的查询API(例如HWLOC,NetLoc,Papi)以及Cuda SDK,OpenCL SDK和Xeon Phi SDK中提供的部分来收集这些知识的一部分。合成测试可用于无法以这种方式建立的参数。内凯尔本质上是平行问题的串行构建块。尽管核通常以串行控制流为特征,但大多数时候它们已经依赖于高度数据并行性。当今的CPU从Simd并行性获得了大部分表现,而GPU从大规模的Simt并行性获得了表现。 The role of the kernel modeling component is two-fold: 1) to tune kernels for maximum performance at a given granularity, 2) to provide the kernel performance model as a function of granularity, which is changing to accommodate parallel execution.DARE turns to a stochastic time-stepping simulation in order to predict the performance of a dynamic runtime scheduler for two fundamental reasons: 1) Building good performance models on the basis of benchmarking actual并行运行需要大量的问题,具有显着的问题大小,这太耗时了。 2),许多调整参数的影响太复杂了,无法通过将调谐空间和拟合简单曲线 /表面拟合到样品点来建模。问题的答案是用时间步进模拟替换运行,其中使用给定的基于任务的调度程序将任务分配给内核,但没有调用实际的内核任务,而是传递到进度跟踪模拟系统,该系统依赖于内核性能模型来模拟任务并产生模拟执行的虚拟跟踪。性能优势是双重的:1)模拟单个运行速度要比实际运行的速度快得多,而2)可以并行运行许多模拟,从而可以通过大型参数搜索空间进行快速扫描。 DARE方法的力量在于相互改进的环路,其中三个阶段中的每个阶段都能大大修剪其他两个阶段。结果,可以为特定的工作量构建非常高质量的模型,因为时间是为了完善实际适用条件的模型,而不是在运行时从未触摸的区域中采样搜索空间。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jack Dongarra其他文献

hipMAGMA v1.0
hipMAGMA v1.0
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cade Brown;Ahmad Abdelfattah;Stanimire Tomov;Jack Dongarra
  • 通讯作者:
    Jack Dongarra
Special section: Cluster and computational grids for scientific computing
  • DOI:
    10.1016/j.future.2007.03.005
  • 发表时间:
    2008-01-01
  • 期刊:
  • 影响因子:
  • 作者:
    Jack Dongarra;Bernard Tourancheau
  • 通讯作者:
    Bernard Tourancheau
Special section: Grid computing and the message passing interface
  • DOI:
    10.1016/j.future.2007.06.002
  • 发表时间:
    2008-02-01
  • 期刊:
  • 影响因子:
  • 作者:
    Beniamino Di Martino;Dieter Kranzlmüller;Jack Dongarra
  • 通讯作者:
    Jack Dongarra
The eigenvalue problem for Hermitian matrices with time reversal symmetry
具有时间反演对称性的 Hermitian 矩阵的特征值问题
  • DOI:
    10.1016/0024-3795(84)90068-5
  • 发表时间:
    1984
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Jack Dongarra;J. R. Gabriel;D. D. Koelling;James Hardy Wilkinson
  • 通讯作者:
    James Hardy Wilkinson
Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters
使用分层矩阵分析 BiCGStab 在 GPU 集群上的性能
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ichitaro Yamazaki;Ahmad Abdelfattah;Akihiro Ida;Satoshi Ohshima;Stanimire Tomov;Rio Yokota;Jack Dongarra
  • 通讯作者:
    Jack Dongarra

Jack Dongarra的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jack Dongarra', 18)}}的其他基金

Travel: Workshop on Clusters, Clouds, and Data Analytics for Scientific Computing 2024
旅行:2024 年科学计算集群、云和数据分析研讨会
  • 批准号:
    2336813
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds, and Data Analytics for Scientific Computing
科学计算集群、云和数据分析研讨会
  • 批准号:
    2001329
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds, and Data Analytics in Scientific Computing
科学计算中的集群、云和数据分析研讨会
  • 批准号:
    1800946
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Toward a common digital continuum platform for big data and extreme-scale computing (BDEC2)
迈向大数据和超大规模计算的通用数字连续平台 (BDEC2)
  • 批准号:
    1849625
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: ACI-CDS&E: Highly Parallel Algorithms and Architectures for Convex Optimization for Realtime Embedded Systems (CORES)
合作研究:ACI-CDS
  • 批准号:
    1709069
  • 财政年份:
    2017
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds and Data Analytics in Scientific Computing
科学计算中的集群、云和数据分析研讨会
  • 批准号:
    1606551
  • 财政年份:
    2016
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: EMBRACE: Evolvable Methods for Benchmarking Realism through Application and Community Engagement
合作研究:拥抱:通过应用和社区参与对现实主义进行基准测试的演化方法
  • 批准号:
    1535025
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SI2-SSI: Collaborative Proposal: Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX)
SI2-SSI:协作提案:极端规模环境的性能应用程序编程接口 (PAPI-EX)
  • 批准号:
    1450429
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CSR:Medium:Collaborative Research: SparseKaffe: high-performance, auto-tuned, energy-aware algorithms for sparse direct methods on modern heterogeneous architectures
CSR:Medium:协作研究:SparseKaffe:现代异构架构上稀疏直接方法的高性能、自动调整、能量感知算法
  • 批准号:
    1514286
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
EAGER: Collaborative Research: Memristive Accelerator for Extreme Scale Linear Solvers
EAGER:协作研究:用于超大规模线性求解器的忆阻加速器
  • 批准号:
    1548093
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant

相似国自然基金

新冠疫情下小微企业的经营风险与公共政策效果评估:来自餐饮企业的经验证据
  • 批准号:
    72204045
  • 批准年份:
    2022
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
新冠疫情下小微企业的经营风险与公共政策效果评估:来自餐饮企业的经验证据
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于经验小波变换的流体管网泄漏多方向多模态声发射时频定位方法研究
  • 批准号:
    61703066
  • 批准年份:
    2017
  • 资助金额:
    18.0 万元
  • 项目类别:
    青年科学基金项目
基于声发射信号改进经验小波分析的钢桥面板疲劳裂纹定量监测方法研究
  • 批准号:
    51708164
  • 批准年份:
    2017
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
经验小波变换理论及其在机械故障诊断中的应用研究
  • 批准号:
    51505002
  • 批准年份:
    2015
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Empirical Research on Formation of new HR-Practices in German Firms
德国企业新人力资源实践形成的实证研究
  • 批准号:
    22K01719
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The Empirical Study of Gender (EGEN) Research Network: Small Research Prizes to Graduate Students and Early Career Faculty
性别实证研究 (EGEN) 研究网络:为研究生和早期职业教师提供小型研究奖
  • 批准号:
    2215500
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
An Attempt to Improve Empirical Research in Economics Focusing on Statistical Hypothesis Testing
以统计假设检验为重点改进经济学实证研究的尝试
  • 批准号:
    22K18530
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Comparative Empirical Research on the Economic Effects of the Lehman Brothers Collapse and COVID-19 Pandemic
雷曼兄弟倒闭和 COVID-19 大流行的经济影响的比较实证研究
  • 批准号:
    21K01590
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Empirical Studies on Inclusiveness and Exclusiveness of Sharing of Technologies in East African Small and Medium-sized Manufacturers
东非中小型制造商技术共享包容性与排他性实证研究
  • 批准号:
    21H03706
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了