SHF: Small: Program Analysis-based Makeover for HPC Application Resilience

SHF:小型:基于程序分析的 HPC 应用程序弹性改造

基本信息

  • 批准号:
    1722710
  • 负责人:
  • 金额:
    $ 42万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-01-15 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

HPC resilience in the presence of increased system failures is a major technical hurdle for realizing the vision of the US National Research Council for conducting exascale science. Existing techniques, based primarily on checkpoint and replay, are no longer effective for emerging systems with orders-of-magnitude more hardware and software components. This project aims to overcome the main limitation of existing techniques: the detection and mitigation of silent errors by developing and leveraging automated software analysis and synthesis techniques.The new methods under development can compile a tunable degree of resilience into the application software code, and have potential to transform the development of future generations of HPC applications. By treating the software code as white-boxes, as opposed to black-boxes, these new methods can provide significantly more economical solutions to the HPC resilience problem compared to existing techniques. The project will help realize the US NRC's vision of conducting exascale science, which is crucial for addressing the nation?s urgent needs in frontiers such as new energy, health care, and national security.This project develops automated program analysis techniques for identifying invariants from software code, and leveraging these invariants to detect and mitigate silent errors at run time. By treating the application software code as white-boxes, it seeks to generate invariants that capture the expected program behavior. By leveraging the invariants as correctness conditions, it overcomes the major hurdle in detecting silent errors, which is the lack of visible symptoms. In addition to detecting errors, the invariants are also used by runtime monitors to intelligently perturb the execution order or memory state to proactively avoid failures at run time. When the rollback recovery becomes inevitable, the invariants are used as guidance to minimize the re-execution overhead.The proposed methods and software tools are evaluated on real applications from the research community as well as sources such as SciDAC.
在系统故障增加的情况下,高性能计算的恢复能力是实现美国国家研究委员会开展百亿亿次科学的愿景的一个主要技术障碍。主要基于检查点和重放的现有技术对于具有更多数量级硬件和软件组件的新兴系统不再有效。该项目旨在克服现有技术的主要局限性:通过开发和利用自动化软件分析和综合技术来检测和减轻无声错误。正在开发的新方法可以将可调节程度的弹性编译到应用软件代码中,并具有改变未来几代 HPC 应用程序开发的潜力。通过将软件代码视为白盒而不是黑盒,与现有技术相比,这些新方法可以为 HPC 弹性问题提供更加经济的解决方案。该项目将有助于实现美国 NRC 开展百亿亿次科学的愿景,这对于解决国家在新能源、医疗保健和国家安全等前沿领域的迫切需求至关重要。该项目开发自动化程序分析技术,用于识别来自软件代码,并利用这些不变量来检测和减少运行时的静默错误。通过将应用程序软件代码视为白盒,它试图生成捕获预期程序行为的不变量。通过利用不变量作为正确性条件,它克服了检测无声错误的主要障碍,即缺乏可见的症状。除了检测错误之外,运行时监视器还使用不变量来智能地扰乱执行顺序或内存状态,以主动避免运行时的故障。当回滚恢复不可避免时,不变量被用作指导,以最小化重新执行开销。所提出的方法和软件工具在来自研究社区以及 SciDAC 等来源的实际应用中进行了评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Chao Wang其他文献

A new decomposition method based on the coherency matrix
一种基于相干矩阵的新分解方法
The mechanical behavior and collapse of graphene-assembled hollow nanospheres under compression
石墨烯组装空心纳米球在压缩下的机械行为和塌陷
  • DOI:
    10.1016/j.carbon.2020.11.040
  • 发表时间:
    2021-03
  • 期刊:
  • 影响因子:
    10.9
  • 作者:
    Yifan Zhao;Yushun Zhao;Fan Wu;Yue Zhao;Yaming Wang;Chao Sui;Xiaodong He;Chao Wang;Huifeng Tan;Chao Wang
  • 通讯作者:
    Chao Wang
Understanding of the Effect of Climate Change on Tropical Cyclone Intensity: A Review
了解气候变化对热带气旋强度的影响:回顾
  • DOI:
    10.1007/s00376-021-1026-x
  • 发表时间:
    2022-01
  • 期刊:
  • 影响因子:
    5.8
  • 作者:
    Liguang Wu;Haikun Zhao;Chao Wang;Jian Cao;Jia Liang
  • 通讯作者:
    Jia Liang
Evolution and Removal of Surface Scratches by magnetorheological finishing(MRF)
磁流变精加工(MRF)表面划痕的演变和去除
  • DOI:
    10.1117/1.oe.58.5.055102
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    1.3
  • 作者:
    Jianwei Ji;Wei Gao;Chao Wang;Yunfei Zhang;Wei Fan;Min Xu;Fang Ji
  • 通讯作者:
    Fang Ji
Design and optimization of electromagnetic tomography and electrical resistance tomography dual-modality sensor
电磁层析成像和电阻层析成像双模态传感器的设计与优化
  • DOI:
    10.1088/1361-6501/ac8146
  • 发表时间:
    2022-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chao Wang;Ruichang Wang;Xiao Liang;Jiamin Ye;Xueyong Chen
  • 通讯作者:
    Xueyong Chen

Chao Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Chao Wang', 18)}}的其他基金

Collaborative Research: FW-HTF-R: Wearable Safety Sensing and Assistive Robot-Worker Collaboration for an Augmented Workforce in Construction
合作研究:FW-HTF-R:可穿戴安全传感和辅助机器人工人协作,增强建筑劳动力
  • 批准号:
    2222881
  • 财政年份:
    2022
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
Collaborative Research: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
合作研究:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
  • 批准号:
    2220345
  • 财政年份:
    2022
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
NSF-BSF: Synchronous electro-optical DNA detection using low-noise dielectric nanopores on sapphire
NSF-BSF:使用蓝宝石上的低噪声介电纳米孔进行同步电光 DNA 检测
  • 批准号:
    2020464
  • 财政年份:
    2020
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
FW-HTF-P: Collaborative Research: Wearable Safety and Health Assistive Robot Collaboration for Skilled Construction Workers
FW-HTF-P:合作研究:为熟练建筑工人提供可穿戴安全与健康辅助机器人协作
  • 批准号:
    2026575
  • 财政年份:
    2020
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
Photochemically Induced, Polymer-Assisted Deposition for 3D Printing of Micrometer-Wide and Nanometer-Thin Silver Structures
用于微米宽和纳米薄银结构 3D 打印的光化学诱导聚合物辅助沉积
  • 批准号:
    1947753
  • 财政年份:
    2020
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
CAREER: Integrated Optofluidic Chips towards Label-Free Detection of Exosomal MicroRNA Biomarkers
职业:集成光流控芯片实现外泌体 MicroRNA 生物标志物的无标记检测
  • 批准号:
    1847324
  • 财政年份:
    2019
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
Low-Profile Ultra-Wideband Wide-Scanning Multi-Function Beam-Steerable Array Antennas
薄型超宽带宽扫描多功能波束可控阵​​列天线
  • 批准号:
    EP/S005625/1
  • 财政年份:
    2019
  • 资助金额:
    $ 42万
  • 项目类别:
    Research Grant
Enhancing CO2 Reduction by Controlling the Ensemble of Active Sites
通过控制活动站点的整体来加强二氧化碳减排
  • 批准号:
    1930013
  • 财政年份:
    2019
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
Interplay of Mass Transport and Chemical Kinetics in the Electroreduction CO2
电还原 CO2 中传质与化学动力学的相互作用
  • 批准号:
    1803482
  • 财政年份:
    2018
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Safety Guard: A Formal Approach to Safety Enforcement in Embedded Control Systems
CSR:小型:协作研究:安全卫士:嵌入式控制系统中安全执行的正式方法
  • 批准号:
    1813117
  • 财政年份:
    2018
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant

相似国自然基金

程序性调控小胶质细胞/巨噬细胞极化的水凝胶材料构筑与促脊髓损伤修复研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
小程序中用户隐私数据的违规泄露行为检测方法
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
CaSR调节肾小管上皮细胞程序性坏死在小管间质纤维化中的作用及机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    53 万元
  • 项目类别:
    面上项目

相似海外基金

SHF: Small: Practical Dynamic Program Reasoning Across Language Boundaries
SHF:小:跨语言边界的实用动态程序推理
  • 批准号:
    2146233
  • 财政年份:
    2022
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
SHF: Small: Tackling Mapping and Scheduling Problems for Quantum Program Compilation
SHF:小型:解决量子程序编译的映射和调度问题
  • 批准号:
    2129872
  • 财政年份:
    2021
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
SHF: Small: Program Analysis for Dependable Clustering
SHF:小型:可靠集群的程序分析
  • 批准号:
    2007730
  • 财政年份:
    2020
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
SHF: SMALL: Automated Discovery of Cross-Language Program Behavior Inconsistency
SHF:SMALL:跨语言程序行为不一致的自动发现
  • 批准号:
    2006947
  • 财政年份:
    2020
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
SHF: Small: AI Model Debugging by Analyzing Model Internals with Python Program Analysis
SHF:小:通过 Python 程序分析分析模型内部结构进行 AI 模型调试
  • 批准号:
    1910300
  • 财政年份:
    2019
  • 资助金额:
    $ 42万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了