SHF: Small: Program Analysis-based Makeover for HPC Application Resilience
SHF:小型:基于程序分析的 HPC 应用程序弹性改造
基本信息
- 批准号:1722710
- 负责人:
- 金额:$ 42万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-01-15 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
HPC resilience in the presence of increased system failures is a major technical hurdle for realizing the vision of the US National Research Council for conducting exascale science. Existing techniques, based primarily on checkpoint and replay, are no longer effective for emerging systems with orders-of-magnitude more hardware and software components. This project aims to overcome the main limitation of existing techniques: the detection and mitigation of silent errors by developing and leveraging automated software analysis and synthesis techniques.The new methods under development can compile a tunable degree of resilience into the application software code, and have potential to transform the development of future generations of HPC applications. By treating the software code as white-boxes, as opposed to black-boxes, these new methods can provide significantly more economical solutions to the HPC resilience problem compared to existing techniques. The project will help realize the US NRC's vision of conducting exascale science, which is crucial for addressing the nation?s urgent needs in frontiers such as new energy, health care, and national security.This project develops automated program analysis techniques for identifying invariants from software code, and leveraging these invariants to detect and mitigate silent errors at run time. By treating the application software code as white-boxes, it seeks to generate invariants that capture the expected program behavior. By leveraging the invariants as correctness conditions, it overcomes the major hurdle in detecting silent errors, which is the lack of visible symptoms. In addition to detecting errors, the invariants are also used by runtime monitors to intelligently perturb the execution order or memory state to proactively avoid failures at run time. When the rollback recovery becomes inevitable, the invariants are used as guidance to minimize the re-execution overhead.The proposed methods and software tools are evaluated on real applications from the research community as well as sources such as SciDAC.
在有增加系统故障的情况下,HPC的弹性是实现美国国家研究委员会进行Exascale科学的愿景的主要技术障碍。现有的技术主要基于检查点和重播,不再对具有更多的硬件和软件组件的新兴系统有效。该项目旨在克服现有技术的主要局限性:通过开发和利用自动软件分析和综合技术来检测和缓解无声错误。开发的新方法可以将可调的弹性汇编为应用软件代码,并有可能改变HPC应用程序的未来后代的开发。通过将软件代码视为白盒,而不是黑盒,与现有技术相比,这些新方法可以为HPC弹性问题提供更经济的解决方案。该项目将有助于实现美国NRC进行Exascale Science的愿景,这对于解决国家的紧急需求至关重要,例如新的能源,医疗保健和国家安全。该项目开发了自动化的程序分析技术,可从软件代码中识别不变性,并利用这些不景气来检测和减轻静音时间的静音时间。通过将应用程序软件代码视为白色框,它试图生成捕获预期程序行为的不变性。通过利用不变性作为正确性条件,它克服了检测沉默错误的主要障碍,即缺乏明显的症状。除了检测错误外,运行时监视器还使用不变性来智能扰动执行顺序或内存状态,以主动避免在运行时避免故障。当回滚恢复变得不可避免时,不间断的人被用作最大程度地减少重新执行开销的指导。拟议的方法和软件工具将在研究社区以及SCIDAC等来源的真实应用程序上评估。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Chao Wang其他文献
A new decomposition method based on the coherency matrix
一种基于相干矩阵的新分解方法
- DOI:
10.1109/apsar.2015.7306255 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Jianbo Wang;Chao Wang;Hong Zhang;Fan Wu;Bo Zhang - 通讯作者:
Bo Zhang
The mechanical behavior and collapse of graphene-assembled hollow nanospheres under compression
石墨烯组装空心纳米球在压缩下的机械行为和塌陷
- DOI:
10.1016/j.carbon.2020.11.040 - 发表时间:
2021-03 - 期刊:
- 影响因子:10.9
- 作者:
Yifan Zhao;Yushun Zhao;Fan Wu;Yue Zhao;Yaming Wang;Chao Sui;Xiaodong He;Chao Wang;Huifeng Tan;Chao Wang - 通讯作者:
Chao Wang
Understanding of the Effect of Climate Change on Tropical Cyclone Intensity: A Review
了解气候变化对热带气旋强度的影响:回顾
- DOI:
10.1007/s00376-021-1026-x - 发表时间:
2022-01 - 期刊:
- 影响因子:5.8
- 作者:
Liguang Wu;Haikun Zhao;Chao Wang;Jian Cao;Jia Liang - 通讯作者:
Jia Liang
Evolution and Removal of Surface Scratches by magnetorheological finishing(MRF)
磁流变精加工(MRF)表面划痕的演变和去除
- DOI:
10.1117/1.oe.58.5.055102 - 发表时间:
2019 - 期刊:
- 影响因子:1.3
- 作者:
Jianwei Ji;Wei Gao;Chao Wang;Yunfei Zhang;Wei Fan;Min Xu;Fang Ji - 通讯作者:
Fang Ji
Design and optimization of electromagnetic tomography and electrical resistance tomography dual-modality sensor
电磁层析成像和电阻层析成像双模态传感器的设计与优化
- DOI:
10.1088/1361-6501/ac8146 - 发表时间:
2022-07 - 期刊:
- 影响因子:0
- 作者:
Chao Wang;Ruichang Wang;Xiao Liang;Jiamin Ye;Xueyong Chen - 通讯作者:
Xueyong Chen
Chao Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Chao Wang', 18)}}的其他基金
Collaborative Research: FW-HTF-R: Wearable Safety Sensing and Assistive Robot-Worker Collaboration for an Augmented Workforce in Construction
合作研究:FW-HTF-R:可穿戴安全传感和辅助机器人工人协作,增强建筑劳动力
- 批准号:
2222881 - 财政年份:2022
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
Collaborative Research: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
合作研究:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
- 批准号:
2220345 - 财政年份:2022
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
NSF-BSF: Synchronous electro-optical DNA detection using low-noise dielectric nanopores on sapphire
NSF-BSF:使用蓝宝石上的低噪声介电纳米孔进行同步电光 DNA 检测
- 批准号:
2020464 - 财政年份:2020
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
FW-HTF-P: Collaborative Research: Wearable Safety and Health Assistive Robot Collaboration for Skilled Construction Workers
FW-HTF-P:合作研究:为熟练建筑工人提供可穿戴安全与健康辅助机器人协作
- 批准号:
2026575 - 财政年份:2020
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
Photochemically Induced, Polymer-Assisted Deposition for 3D Printing of Micrometer-Wide and Nanometer-Thin Silver Structures
用于微米宽和纳米薄银结构 3D 打印的光化学诱导聚合物辅助沉积
- 批准号:
1947753 - 财政年份:2020
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
CAREER: Integrated Optofluidic Chips towards Label-Free Detection of Exosomal MicroRNA Biomarkers
职业:集成光流控芯片实现外泌体 MicroRNA 生物标志物的无标记检测
- 批准号:
1847324 - 财政年份:2019
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
Low-Profile Ultra-Wideband Wide-Scanning Multi-Function Beam-Steerable Array Antennas
薄型超宽带宽扫描多功能波束可控阵列天线
- 批准号:
EP/S005625/1 - 财政年份:2019
- 资助金额:
$ 42万 - 项目类别:
Research Grant
Enhancing CO2 Reduction by Controlling the Ensemble of Active Sites
通过控制活动站点的整体来加强二氧化碳减排
- 批准号:
1930013 - 财政年份:2019
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
Interplay of Mass Transport and Chemical Kinetics in the Electroreduction CO2
电还原 CO2 中传质与化学动力学的相互作用
- 批准号:
1803482 - 财政年份:2018
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
CSR: Small: Collaborative Research: Safety Guard: A Formal Approach to Safety Enforcement in Embedded Control Systems
CSR:小型:协作研究:安全卫士:嵌入式控制系统中安全执行的正式方法
- 批准号:
1813117 - 财政年份:2018
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
相似国自然基金
CaSR调节肾小管上皮细胞程序性坏死在小管间质纤维化中的作用及机制研究
- 批准号:
- 批准年份:2022
- 资助金额:53 万元
- 项目类别:面上项目
CaSR调节肾小管上皮细胞程序性坏死在小管间质纤维化中的作用及机制研究
- 批准号:82270749
- 批准年份:2022
- 资助金额:53.00 万元
- 项目类别:面上项目
小程序中用户隐私数据的违规泄露行为检测方法
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
模式识别受体通过调节脂质代谢影响中性粒细胞程序性死亡参与抗中性粒细胞胞浆抗体相关小血管炎发病机制的研究
- 批准号:
- 批准年份:2022
- 资助金额:53 万元
- 项目类别:面上项目
小程序中用户隐私数据的违规泄露行为检测方法
- 批准号:62272377
- 批准年份:2022
- 资助金额:54.00 万元
- 项目类别:面上项目
相似海外基金
SHF: Small: Practical Dynamic Program Reasoning Across Language Boundaries
SHF:小:跨语言边界的实用动态程序推理
- 批准号:
2146233 - 财政年份:2022
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
SHF: Small: Tackling Mapping and Scheduling Problems for Quantum Program Compilation
SHF:小型:解决量子程序编译的映射和调度问题
- 批准号:
2129872 - 财政年份:2021
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
SHF: Small: Program Analysis for Dependable Clustering
SHF:小型:可靠集群的程序分析
- 批准号:
2007730 - 财政年份:2020
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
SHF: SMALL: Automated Discovery of Cross-Language Program Behavior Inconsistency
SHF:SMALL:跨语言程序行为不一致的自动发现
- 批准号:
2006947 - 财政年份:2020
- 资助金额:
$ 42万 - 项目类别:
Standard Grant
SHF: Small: AI Model Debugging by Analyzing Model Internals with Python Program Analysis
SHF:小:通过 Python 程序分析分析模型内部结构进行 AI 模型调试
- 批准号:
1910300 - 财政年份:2019
- 资助金额:
$ 42万 - 项目类别:
Standard Grant