SHF: Small: Collaborative Research: ALETHEIA: A Framework for Automatic Detection/Correction of Corruptions in Extreme Scale Scientific Executions
SHF:小型:协作研究:ALETHEIA:超大规模科学执行中腐败自动检测/纠正的框架
基本信息
- 批准号:1619253
- 负责人:
- 金额:$ 24.42万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-06-15 至 2021-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Trusting scientific applications requires guaranteeing the validity of computed results. Unfortunately, many examples of scientific computations have led to incorrect results, sometimes with catastrophic consequences. Currently known validation techniques cover only a fraction of the possible corruptions that numerical simulation and data analytics applications may suffer during execution. As science processes grow in size and complexity, the reliability and validity of their constituent steps is increasingly difficult to ascertain. Assessing validity in the presence of potential data corruptions is a serious and insufficiently recognized problem. Corruption may occur at all levels of computing, from the hardware to the application. An important aspect of these corruptions is that until they are discovered, all executions are at risk of being corrupted silently. In some documented cases, months have elapsed between the discovery of a corruption and notification to users. In the meantime, a potentially large number of executions may be corrupted, and incorrect conclusions may result. It may be difficult, after the fact, to check whether executions have actually been corrupted or not, so that even if corruptions do not lead to mistakes, they may lead to significant productivity losses. Virtually all simulations producing very large results need to reduce their data volume in some way before saving it --one technique is called lossy compression. This project strives to validate the end result of the simulation coupled with lossy compression. This approach is useful for scientific simulations in such diverse areas as climate, cosmology, fluid dynamics, weather, and astrophysics --the drivers of this project. This collaborative project applies the principle of an external algorithmic observer (EAO), where the product of a scientific application is compared with that of a surrogate function of much lower complexity. Corruptions are corrected using a variation of triple modular redundancy: if a corruption is detected, a second surrogate function is executed, and the correct value is chosen from the two results that are most in agreement. This new online detection/correction approach involves approximate comparison of the lossy compressed results of the scientific application and the surrogate function. The project explores the detection performance of surrogate functions, lossy compressors, and approximate comparison techniques. The project also explores how to select the surrogate, lossy compression, and approximate functions to optimize objectives and constraints set by the users. The evaluation considers a set of five applications spanning different computational methods, producing large datasets with I/O bottlenecks, and covering a variety of science problem domains relevant to the NSF. In addition to serving the needs of scientists working in the fields listed above, this project will enhance the research experience of undergraduate students. A summer school focused on resilience is planned for summer 2016, and corruption detection/correction will be a major topic. The project is also organizing tutorials in major science conferences that include online detection/correction of numerical simulations.
信任科学应用需要保证计算结果的有效性。不幸的是,许多科学计算的例子导致了不正确的结果,有时会带来灾难性的后果。当前已知的验证技术仅涵盖数值模拟和数据分析应用程序在执行过程中可能遭受的可能损坏的一小部分。随着科学过程的规模和复杂性的增长,其组成步骤的可靠性和有效性越来越难以确定。在存在潜在数据腐败的情况下评估有效性是一个严重且不足以认识到的问题。从硬件到应用程序的各个级别,可能会发生损坏。这些腐败的一个重要方面是,在发现它们之前,所有处决都有默默地腐败的风险。在某些有记录的情况下,发现腐败与用户通知之间已经过去了几个月。同时,可能会损坏大量执行,并可能得出不正确的结论。事实之后,可能很难检查执行是否实际损坏,因此即使腐败不会导致错误,它们也可能导致巨大的生产力损失。几乎所有产生非常大结果的模拟都需要以某种方式减少其数据量,然后再保存 - 一种技术称为有损压缩。该项目致力于验证模拟的最终结果,并结合有损压缩。这种方法可用于在诸如气候,宇宙学,流体动力学,天气和天体物理学(该项目的驱动力)等不同领域的科学模拟。该协作项目采用了外部算法观察者(EAO)的原理,其中将科学应用的乘积与复杂性较低的替代功能进行了比较。使用三重模块化冗余的变体纠正损坏:如果检测到损坏,执行第二个替代功能,并且从最共同达成的两个结果中选择了正确的值。这种新的在线检测/校正方法涉及科学应用和替代功能的有损压缩结果的近似比较。该项目探讨了替代功能,有损压缩机和近似比较技术的检测性能。该项目还探讨了如何选择替代物,有损压缩和近似功能,以优化用户设定的目标和约束。该评估考虑了跨越不同计算方法的五个应用程序,生产具有I/O瓶颈的大数据集,并涵盖了与NSF相关的各种科学问题域。除了满足上述领域的科学家的需求外,该项目还将增强本科生的研究经验。计划在2016年夏季进行一所针对韧性的暑期学校,腐败检测/更正将是一个主要话题。该项目还组织了主要科学会议的教程,包括在线检测/校正数值模拟。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tom Peterka其他文献
The global lambda visualization facility: An international ultra-high-definition wide-area visualization collaboratory
- DOI:
10.1016/j.future.2006.03.009 - 发表时间:
2006-10-01 - 期刊:
- 影响因子:
- 作者:
Jason Leigh;Luc Renambot;Andrew Johnson;Byungil Jeong;Ratko Jagodic;Nicholas Schwarz;Dmitry Svistula;Rajvikram Singh;Julieta Aguilera;Xi Wang;Venkatram Vishwanath;Brenda Lopez;Dan Sandin;Tom Peterka;Javier Girado;Robert Kooima;Jinghua Ge;Lance Long;Alan Verlo;Thomas A. DeFanti - 通讯作者:
Thomas A. DeFanti
Extreme-scale workflows: A perspective from the JLESC international community
- DOI:
10.1016/j.future.2024.07.041 - 发表时间:
2024-12-01 - 期刊:
- 影响因子:
- 作者:
Orcun Yildiz;Amal Gueroudji;Julien Bigot;Bruno Raffin;Rosa M. Badia;Tom Peterka - 通讯作者:
Tom Peterka
Personal Varrier: Autostereoscopic virtual reality display for distributed scientific visualization
- DOI:
10.1016/j.future.2006.03.011 - 发表时间:
2006-10-01 - 期刊:
- 影响因子:
- 作者:
Tom Peterka;Daniel J. Sandin;Jinghua Ge;Javier Girado;Robert Kooima;Jason Leigh;Andrew Johnson;Marcus Thiebaux;Thomas A. DeFanti - 通讯作者:
Thomas A. DeFanti
Tom Peterka的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
基于超宽频技术的小微型无人系统集群协作关键技术研究与应用
- 批准号:
- 批准年份:2020
- 资助金额:57 万元
- 项目类别:面上项目
异构云小蜂窝网络中基于协作预编码的干扰协调技术研究
- 批准号:61661005
- 批准年份:2016
- 资助金额:30.0 万元
- 项目类别:地区科学基金项目
密集小基站系统中的新型接入理论与技术研究
- 批准号:61301143
- 批准年份:2013
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
ScFVCD3-9R负载Bcl-6靶向小干扰RNA治疗EAMG的试验研究
- 批准号:81072465
- 批准年份:2010
- 资助金额:31.0 万元
- 项目类别:面上项目
基于小世界网络的传感器网络研究
- 批准号:60472059
- 批准年份:2004
- 资助金额:21.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
- 批准号:
2331302 - 财政年份:2024
- 资助金额:
$ 24.42万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
- 批准号:
2331301 - 财政年份:2024
- 资助金额:
$ 24.42万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
- 批准号:
2412357 - 财政年份:2024
- 资助金额:
$ 24.42万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Quasi Weightless Neural Networks for Energy-Efficient Machine Learning on the Edge
合作研究:SHF:小型:用于边缘节能机器学习的准失重神经网络
- 批准号:
2326895 - 财政年份:2023
- 资助金额:
$ 24.42万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Enabling Efficient 3D Perception: An Architecture-Algorithm Co-Design Approach
协作研究:SHF:小型:实现高效的 3D 感知:架构-算法协同设计方法
- 批准号:
2334624 - 财政年份:2023
- 资助金额:
$ 24.42万 - 项目类别:
Standard Grant