SHF: Small: Enabling and Analyzing Accuracy-aware Reliable GPU Computing
SHF:小型:启用和分析精度感知的可靠 GPU 计算
基本信息
- 批准号:1717532
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-08-01 至 2021-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Graphics Processing Units (GPUs) are becoming the default choice for general-purpose hardware acceleration because of their ability to enable orders of magnitude faster and energy-efficient execution for large-scale high-performance computing applications. Since the majority of such applications executing on large-scale HPC systems are long-running, it is very important that they cope with a variety of hardware- and software-based faults. Many prior works have shown that real HPC systems are vulnerable to soft errors. An absence of essential protection and checkpointing mechanisms can lead to lower scientific productivity, operational efficiency, and even monetary loss. However, these protection mechanisms (e.g., error correction codes) are themselves not free -- they incur very high performance, energy, and area costs. This project takes a holistic approach to explore the avenues to reduce these protection overheads by taking advantage of the fact that all errors do not lead to an unacceptable loss in the accuracy of application output. Prior results show that GPGPU applications are amenable to such accuracy-aware optimizations. In order to enable these optimizations, this project will address three major research questions: a) What hardware/software support and tools are necessary to determine which instructions are not vulnerable to soft errors, b) Based on this analysis, which hardware component(s) need not be protected and for how long, while not sacrificing application quality beyond the user's quality requirements, and c) What optimizations in terms of resource management and scheduling are necessary to make low-overhead but reliable computation more effective and efficient. These questions will be explored via a variety of GPGPU applications emerging from the areas of high-performance computing (HPC), big-data analytics, machine learning, and graphics. If successful, this project will generate several novel research insights that will play an important role in enabling low-cost reliable GPU computing. The results of this project will be integrated into the existing and new undergraduate and graduate courses on computer architecture and reliability, which will facilitate in training students, including women and students from diverse backgrounds and minority groups.
图形处理单元(GPU)已成为通用硬件加速的默认选择,因为它们能够为大型高性能计算应用程序启用更快和节能执行的阶数。由于大多数在大规模HPC系统上执行的应用程序长期运行,因此应对各种基于硬件和软件的故障非常重要。许多先前的工作表明,实际HPC系统容易受到软错误的影响。没有必要的保护和检查点机制会导致降低科学生产力,运营效率甚至货币损失。但是,这些保护机制(例如,错误校正代码)本身不是免费的 - 它们会产生非常高的性能,能量和面积成本。该项目采用一种整体方法来探索途径,以利用所有错误并没有导致应用程序输出准确性损失的事实来减少这些保护开销。先前的结果表明,GPGPU应用程序可以适应这种准确的优化。为了实现这些优化,该项目将解决三个主要的研究问题:a)确定哪些硬件/软件支持和工具是必要的,以确定哪些说明不容易受到软错误的影响,b)基于此分析,哪些硬件组件不需要保护,并且不需要多长时间,并且不需要多长时间,而在用户的质量要求中不付出更多的计算和时间表,以及crivesting of temimption and the Insport of哪些优化,以及c),以及c),以及c),以及c),以及c),以及c),以及c),以及c)。 高效的。这些问题将通过从高性能计算(HPC),大数据分析,机器学习和图形的各种GPGPU应用程序中探讨。如果成功,该项目将产生一些新颖的研究见解,这些见解将在实现低成本可靠的GPU计算方面发挥重要作用。该项目的结果将集成到有关计算机架构和可靠性的现有和新的本科和研究生课程中,这将有助于培训学生,包括来自潜水员背景和少数群体的妇女和学生。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
SSD failures in the field: symptoms, causes, and prediction models
- DOI:10.1145/3295500.3356172
- 发表时间:2019-11
- 期刊:
- 影响因子:0
- 作者:J. Alter;Ji Xue;Alma Dimnaku;E. Smirni
- 通讯作者:J. Alter;Ji Xue;Alma Dimnaku;E. Smirni
Characterizing Accuracy-Aware Resilience of GPGPU Applications
表征 GPGPU 应用程序的精度感知弹性
- DOI:10.1109/ccgrid49817.2020.00-82
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Nie, Bin;Jog, Adwait;Smirni, Evgenia
- 通讯作者:Smirni, Evgenia
Fault Site Pruning for Practical Reliability Analysis of GPGPU Applications
- DOI:10.1109/micro.2018.00066
- 发表时间:2018-10
- 期刊:
- 影响因子:0
- 作者:Bin Nie;Lishan Yang;Adwait Jog;E. Smirni
- 通讯作者:Bin Nie;Lishan Yang;Adwait Jog;E. Smirni
Enabling Software Resilience in GPGPU Applications via Partial Thread Protection
通过部分线程保护在 GPGPU 应用程序中实现软件弹性
- DOI:10.1109/icse43902.2021.00114
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Yang, Lishan;Nie, Bin;Jog, Adwait;Smirni, Evgenia
- 通讯作者:Smirni, Evgenia
BCoal: Bucketing-based Memory Coalescing for Efficient and Secure GPUs
- DOI:10.1109/hpca47549.2020.00053
- 发表时间:2020-01-01
- 期刊:
- 影响因子:0
- 作者:Kadam, Gurunath;Zhang, Danfeng;Jog, Adwait
- 通讯作者:Jog, Adwait
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Adwait Jog其他文献
Exploiting Core Criticality for Enhanced GPU Performance
利用核心关键性来增强 GPU 性能
- DOI:
10.1145/2896377.2901468 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Adwait Jog;Onur Kayiran;Ashutosh Pattnaik;M. Kandemir;O. Mutlu;R. Iyer;C. Das - 通讯作者:
C. Das
A case for core-assisted bottleneck acceleration in GPUs
GPU 中核心辅助瓶颈加速的案例
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Nandita Vijaykumar;Gennady Pekhimenko;Adwait Jog;A. Bhowmick;Rachata Ausavarungnirun;Chita R. Das;Mahmut Kandemir;T. Mowry;O. Mutlu - 通讯作者:
O. Mutlu
Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer
使用选择性权重转移加速大规模 DNN 架构搜索
- DOI:
10.1109/cluster48925.2021.00051 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Hongyuan Liu;Bogdan Nicolae;S. Di;F. Cappello;Adwait Jog - 通讯作者:
Adwait Jog
Adwait Jog的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Adwait Jog', 18)}}的其他基金
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
- 批准号:
2402805 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
CAREER: Addressing Scalability Challenges in Designing Next-generation GPU-Based Heterogeneous Architectures
职业:解决设计下一代基于 GPU 的异构架构时的可扩展性挑战
- 批准号:
2316694 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Continuing Grant
CAREER: Addressing Scalability Challenges in Designing Next-generation GPU-Based Heterogeneous Architectures
职业:解决设计下一代基于 GPU 的异构架构时的可扩展性挑战
- 批准号:
1750667 - 财政年份:2018
- 资助金额:
$ 45万 - 项目类别:
Continuing Grant
CRII: SHF: Design and Analysis of Processing-Near-Memory Enabled GPU Architecture
CRII:SHF:支持近内存处理的 GPU 架构的设计和分析
- 批准号:
1657336 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
靶向Treg-FOXP3小分子抑制剂的筛选及其在肺癌免疫治疗中的作用和机制研究
- 批准号:32370966
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
化学小分子激活YAP诱导染色质可塑性促进心脏祖细胞重编程的表观遗传机制研究
- 批准号:82304478
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
靶向小胶质细胞的仿生甘草酸纳米颗粒构建及作用机制研究:脓毒症相关性脑病的治疗新策略
- 批准号:82302422
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
HMGB1/TLR4/Cathepsin B途径介导的小胶质细胞焦亡在新生大鼠缺氧缺血脑病中的作用与机制
- 批准号:82371712
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
小分子无半胱氨酸蛋白调控生防真菌杀虫活性的作用与机理
- 批准号:32372613
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SHF: Small: Enabling Efficient 3D Perception: An Architecture-Algorithm Co-Design Approach
协作研究:SHF:小型:实现高效的 3D 感知:架构-算法协同设计方法
- 批准号:
2334624 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Architecture Innovations for Enabling Simultaneous Translation at the Edge
合作研究:SHF:小型:支持边缘同步翻译的架构创新
- 批准号:
2223484 - 财政年份:2022
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Enabling Caches and GPUs for Energy Harvesting Systems
合作研究:SHF:小型:为能量收集系统启用缓存和 GPU
- 批准号:
2153749 - 财政年份:2022
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Architecture Innovations for Enabling Simultaneous Translation at the Edge
合作研究:SHF:小型:支持边缘同步翻译的架构创新
- 批准号:
2223483 - 财政年份:2022
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Enabling Caches and GPUs for Energy Harvesting Systems
合作研究:SHF:小型:为能量收集系统启用缓存和 GPU
- 批准号:
2153748 - 财政年份:2022
- 资助金额:
$ 45万 - 项目类别:
Standard Grant