SHF: Small: Software-Driven Hardware Resiliency

SHF:小型:软件驱动的硬件弹性

基本信息

  • 批准号:
    1320941
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-09-01 至 2017-08-31
  • 项目状态:
    已结题

项目摘要

Moore's law continues to provide abundant devices on chip, but they are increasingly subject to failures from many sources. The hardware reliability problem is expected to be pervasive, affecting markets from embedded systems to high performance computing. There is an urgent need for research to address this problem with extremely low overheads in area, performance, and power (precluding traditional redundancy based solutions). Recently, researchers have proposed a software-driven hardware reliability solution that handles only the device faults that become visible to software and cause anomalous software behavior. This line of work has been quite successful in detecting most faults at extremely low cost. Unfortunately, some hardware faults escape detection by the proposed anomaly monitors, resulting in silent data corruption or SDC. These remaining few SDCs have been the Achilles heel of the software-driven hardware resiliency approach and a hindrance to widespread adoption. The proposed research seeks to overcome this obstacle. The research includes methodological innovations that can determine application sites vulnerable to SDCs within a practical workflow and resiliency solution that uses this information to develop low cost detection and recovery techniques to mitigate the impact of SDCs. It builds on a recent resiliency analysis tool developed by the Principle Investigator's group called Relyzer. The key insight is that instead of trying to determine the outcome of each fault site, Relyzer can seek to determine which application sites will produce equivalent outcomes. This enables pruning a large number of sites and focusing on fault injections for just one site per equivalence class, resulting in significant reduction in resiliency evaluation time. In addition to providing a list of SDC vulnerable instructions, Relyzer also provides a wealth of information on why they are vulnerable. This motivates the use of inexpensive application-specific detectors that exploit this information. However, Relyzer has several limitations in speed, accuracy, and generality, precluding its use in a practical workflow. This research will first develop new techniques to address these limitations and to implement them in a tool. Second, this research will explore systematic techniques to develop practical resiliency solutions that exploit the wealth of fault-propagation information exposed by Relyzer. It will develop systematic low-cost detection and recovery techniques, with quantifiable tradeoffs between resiliency and performance overheads, that can be incorporated in a practical workflow for real applications. If successful, this work will address a key challenge in meeting the expectations of Moore's law performance for a wide variety of societal advances. Besides the research benefits, it will provide a concrete tool for practical full application resiliency analysis and will also train graduate students.
摩尔定律继续提供丰富的片上器件,但它们越来越容易受到多种原因的影响。硬件可靠性问题预计将普遍存在,影响从嵌入式系统到高性能计算的市场。迫切需要研究以极低的面积、性能和功耗开销(排除传统的基于冗余的解决方案)来解决这个问题。最近,研究人员提出了一种软件驱动的硬件可靠性解决方案,该解决方案仅处理软件可见并导致软件行为异常的设备故障。这一系列工作非常成功地以极低的成本检测到大多数故障。不幸的是,一些硬件故障逃避了所提出的异常监视器的检测,导致静默数据损坏或 SDC。剩下的少数 SDC 是软件驱动的硬件弹性方法的致命弱点,也是广泛采用的障碍。拟议的研究旨在克服这一障碍。 该研究包括方法创新,可以确定实际工作流程中易受 SDC 影响的应用程序站点,以及弹性解决方案,该解决方案使用此信息开发低成本检测和恢复技术,以减轻 SDC 的影响。 它建立在原理研究者小组最近开发的名为 Relyzer 的弹性分析工具的基础上。关键的见解是,Relyzer 可以尝试确定哪些应用程序站点将产生相同的结果,而不是尝试确定每个故障站点的结果。这使得能够修剪大量站点并专注于每个等价类仅一个站点的故障注入,从而显着减少弹性评估时间。除了提供 SDC 易受攻击的指令列表之外,Relyzer 还提供了大量有关其易受攻击的原因的信息。这促使人们使用廉价的专用检测器来利用这些信息。然而,Relyzer 在速度、准确性和通用性方面存在一些限制,阻碍了其在实际工作流程中的使用。这项研究将首先开发新技术来解决这些限制并在工具中实现它们。其次,本研究将探索系统技术来开发实用的弹性解决方案,利用 Relyzer 公开的大量故障传播信息。它将开发系统性的低成本检测和恢复技术,并在弹性和性能开销之间进行可量化的权衡,这些技术可以合并到实际应用程序的实际工作流程中。如果成功,这项工作将解决满足摩尔定律对各种社会进步的期望的关键挑战。除了研究优势之外,它还将为实际的全面应用弹性分析提供具体工具,并培训研究生。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sarita Adve其他文献

Under-canopy dataset for advancing simultaneous localization and mapping in agricultural robotics
用于推进农业机器人同步定位和绘图的树冠下数据集
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    José Cuarán;Andres Eduardo Baquero Velasquez;Mateus Valverde Gasparino;N. Uppalapati;A. N. Sivakumar;Justin Wasserman;Muhammad Huzaifa;Sarita Adve;Girish Chowdhary
  • 通讯作者:
    Girish Chowdhary
Performance of image and video processing with general-purpose processors and media ISA extensions
使用通用处理器和媒体 ISA 扩展的图像和视频处理性能
  • DOI:
    10.1145/307338.300990
  • 发表时间:
    1999-05-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Parthasarathy Ranganathan;Sarita Adve;N. Jouppi
  • 通讯作者:
    N. Jouppi
FastFlip: Compositional Error Injection Analysis
FastFlip:组合错误注入分析
  • DOI:
    10.48550/arxiv.2403.13989
  • 发表时间:
    2024-03-20
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Keyur Joshi;Rahul Singh;Tommaso Bassetto;Sarita Adve;Darko Marinov;Sasa Misailovic
  • 通讯作者:
    Sasa Misailovic

Sarita Adve的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sarita Adve', 18)}}的其他基金

Collaborative Research: PPoSS: LARGE: Scalable Specialization in Distributed Edge-Cloud Systems – The Extended Reality Case
协作研究:PPoSS:大型:分布式边缘云系统的可扩展专业化 — 扩展现实案例
  • 批准号:
    2217144
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
CCRI: New: An Open End-to-End Extended Reality System Infrastructure: Enabling Domain-Specific Edge Systems Research
CCRI:新:开放的端到端扩展现实系统基础设施:支持特定领域的边缘系统研究
  • 批准号:
    2120464
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SHF: Medium: Software Engineering for Hardware Errors
SHF:中:针对硬件错误的软件工程
  • 批准号:
    1956374
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
SHF: Small: Hardware-Software Co-Designed Coherence: A Complete Coherence Solution for Performance-, Energy-, and Complexity-Efficiency
SHF:小型:硬件-软件协同设计的一致性:针对性能、能源和复杂性效率的完整一致性解决方案
  • 批准号:
    1619245
  • 财政年份:
    2016
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SHF: Small: DeNovo: Rethinking Hardware for Disciplined Parallelism
SHF:小型:DeNovo:重新思考硬件以实现严格的并行性
  • 批准号:
    1018796
  • 财政年份:
    2010
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
CPA-CSA-T: Low Cost and Comprehensive Hardware Reliability
CPA-CSA-T:低成本和全面的硬件可靠性
  • 批准号:
    0811693
  • 财政年份:
    2008
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Lifetime Reliability Aware Microprocessors
终生可靠性感知微处理器
  • 批准号:
    0541383
  • 财政年份:
    2006
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CISE Research Resources: Programming Environments and Applications for Clusters and Grids
CISE 研究资源:集群和网格的编程环境和应用程序
  • 批准号:
    0224453
  • 财政年份:
    2002
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
ITR: Collaborative Hardware-Software Adaptation for Multimedia Applications
ITR:多媒体应用的软硬件协同适配
  • 批准号:
    0205638
  • 财政年份:
    2002
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Using Simultaneous Multithreaded Processors for Soft Real-Time Applications
使用同步多线程处理器进行软实时应用
  • 批准号:
    0209198
  • 财政年份:
    2002
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant

相似国自然基金

ALKBH5介导的SOCS3-m6A去甲基化修饰在颅脑损伤后小胶质细胞炎性激活中的调控作用及机制研究
  • 批准号:
    82301557
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
miRNA前体小肽miPEP在葡萄低温胁迫抗性中的功能研究
  • 批准号:
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
PKM2苏木化修饰调节非小细胞肺癌起始细胞介导的耐药生态位的机制研究
  • 批准号:
    82372852
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
基于翻译组学理论探究LncRNA H19编码多肽PELRM促进小胶质细胞活化介导电针巨刺改善膝关节术后疼痛的机制研究
  • 批准号:
    82305399
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
CLDN6高表达肿瘤细胞亚群在非小细胞肺癌ICB治疗抗性形成中的作用及机制研究
  • 批准号:
    82373364
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

SHF: Small: Hardware-Software Co-design for Privacy Protection on Deep Learning-based Recommendation Systems
SHF:小型:基于深度学习的推荐系统的隐私保护软硬件协同设计
  • 批准号:
    2334628
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SHF: Small: Taming Huge Page Problems for Memory Bulk Operations Using a Hardware/Software Co-Design Approach
SHF:小:使用硬件/软件协同设计方法解决内存批量操作的大页面问题
  • 批准号:
    2400014
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SHF: Small: Improving Efficiency of Vision Transformers via Software-Hardware Co-Design and Acceleration
SHF:小型:通过软硬件协同设计和加速提高视觉变压器的效率
  • 批准号:
    2233893
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: RUI: Keystone: Modular Concurrent Software Verification
协作研究:SHF:小型:RUI:Keystone:模块化并发软件验证
  • 批准号:
    2243636
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: RUI: Keystone: Modular Concurrent Software Verification
协作研究:SHF:小型:RUI:Keystone:模块化并发软件验证
  • 批准号:
    2243637
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了