CPA-CSA-T: Low Cost and Comprehensive Hardware Reliability
CPA-CSA-T:低成本和全面的硬件可靠性
基本信息
- 批准号:0811693
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2008
- 资助国家:美国
- 起止时间:2008-09-01 至 2013-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Hardware reliability is becoming an increasing concern in the late CMOS era. Components in shipped chips will fail for many reasons, requiring mechanisms to detect, diagnose, recover from, and repair/reconfigure around these failed components so that the system can provide reliable operation. The pervasiveness of the problem across a broad market demands low-cost and general reliability solutions that can be deployed in general-purpose, commodity systems running applications with varying reliability requirements. Traditional reliability solutions involving excessive redundancy are too expensive, as are piecemeal solutions that address individual failure modes. This work proposes a full system solution that aims to provide a common framework for error detection, diagnosis, recovery, and repair/reconfiguration for a variety of hardware failure modes, with a customizable reliability vs. overhead tradeoff.Two key high-level observations motivate the approach. First, the hardware reliability solutions need handle only the device faults that propagate through higher levels of the system and become observable to software. Second, in spite of the reliability threat, fault-free operation remains the common case and must be optimized, possibly at the cost of increased overhead once a fault is detected. The proposed system therefore detects faults by watching for anomalous software behavior (or symptoms of faults), using novel zero to low-cost hardware and software monitors. After a fault is detected, it invokes an innovative, but potentially expensive, procedure for diagnosing the fault source to enable reconfiguration/repair (in the case of hard faults). For recovery, it relies on a checkpoint/replay mechanism, including pure hardware and hybrid software assisted recovery depending on detection latency. Coordinating all of the above is a thin firmware layer that provides flexibility and customizability. A major component of the work is a much needed formulation and validation of microarchitecture level fault models, required to drive high-level reliability solutions.
在 CMOS 时代后期,硬件可靠性日益受到关注。发货芯片中的组件会因多种原因而发生故障,需要针对这些故障组件进行检测、诊断、恢复和修复/重新配置的机制,以便系统能够提供可靠的运行。该问题在广阔的市场中普遍存在,需要低成本和通用的可靠性解决方案,这些解决方案可以部署在运行具有不同可靠性要求的应用程序的通用商品系统中。涉及过多冗余的传统可靠性解决方案过于昂贵,解决个别故障模式的零碎解决方案也是如此。这项工作提出了一个完整的系统解决方案,旨在为各种硬件故障模式的错误检测、诊断、恢复和修复/重新配置提供通用框架,并具有可定制的可靠性与开销权衡。两个关键的高级观察激励的方法。首先,硬件可靠性解决方案只需处理通过系统更高级别传播并可由软件观察到的设备故障。其次,尽管存在可靠性威胁,无故障运行仍然是常见情况,必须进行优化,一旦检测到故障,可能会增加开销。因此,所提出的系统通过使用新颖的零成本到低成本硬件和软件监视器来观察异常软件行为(或故障症状)来检测故障。检测到故障后,它会调用一种创新但可能昂贵的程序来诊断故障源,以实现重新配置/修复(在硬故障的情况下)。对于恢复,它依赖于检查点/重放机制,包括纯硬件和混合软件根据检测延迟辅助恢复。协调上述所有内容的是一个薄固件层,它提供了灵活性和可定制性。这项工作的一个主要组成部分是急需的微架构级故障模型的制定和验证,这是驱动高水平可靠性解决方案所需的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sarita Adve其他文献
Under-canopy dataset for advancing simultaneous localization and mapping in agricultural robotics
用于推进农业机器人同步定位和绘图的树冠下数据集
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
José Cuarán;Andres Eduardo Baquero Velasquez;Mateus Valverde Gasparino;N. Uppalapati;A. N. Sivakumar;Justin Wasserman;Muhammad Huzaifa;Sarita Adve;Girish Chowdhary - 通讯作者:
Girish Chowdhary
Performance of image and video processing with general-purpose processors and media ISA extensions
使用通用处理器和媒体 ISA 扩展的图像和视频处理性能
- DOI:
10.1145/307338.300990 - 发表时间:
1999-05-01 - 期刊:
- 影响因子:0
- 作者:
Parthasarathy Ranganathan;Sarita Adve;N. Jouppi - 通讯作者:
N. Jouppi
FastFlip: Compositional Error Injection Analysis
FastFlip:组合错误注入分析
- DOI:
10.48550/arxiv.2403.13989 - 发表时间:
2024-03-20 - 期刊:
- 影响因子:0
- 作者:
Keyur Joshi;Rahul Singh;Tommaso Bassetto;Sarita Adve;Darko Marinov;Sasa Misailovic - 通讯作者:
Sasa Misailovic
Sarita Adve的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sarita Adve', 18)}}的其他基金
Collaborative Research: PPoSS: LARGE: Scalable Specialization in Distributed Edge-Cloud Systems – The Extended Reality Case
协作研究:PPoSS:大型:分布式边缘云系统的可扩展专业化 — 扩展现实案例
- 批准号:
2217144 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
CCRI: New: An Open End-to-End Extended Reality System Infrastructure: Enabling Domain-Specific Edge Systems Research
CCRI:新:开放的端到端扩展现实系统基础设施:支持特定领域的边缘系统研究
- 批准号:
2120464 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Medium: Software Engineering for Hardware Errors
SHF:中:针对硬件错误的软件工程
- 批准号:
1956374 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
SHF: Small: Hardware-Software Co-Designed Coherence: A Complete Coherence Solution for Performance-, Energy-, and Complexity-Efficiency
SHF:小型:硬件-软件协同设计的一致性:针对性能、能源和复杂性效率的完整一致性解决方案
- 批准号:
1619245 - 财政年份:2016
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Small: Software-Driven Hardware Resiliency
SHF:小型:软件驱动的硬件弹性
- 批准号:
1320941 - 财政年份:2013
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Small: DeNovo: Rethinking Hardware for Disciplined Parallelism
SHF:小型:DeNovo:重新思考硬件以实现严格的并行性
- 批准号:
1018796 - 财政年份:2010
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Lifetime Reliability Aware Microprocessors
终生可靠性感知微处理器
- 批准号:
0541383 - 财政年份:2006
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CISE Research Resources: Programming Environments and Applications for Clusters and Grids
CISE 研究资源:集群和网格的编程环境和应用程序
- 批准号:
0224453 - 财政年份:2002
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
ITR: Collaborative Hardware-Software Adaptation for Multimedia Applications
ITR:多媒体应用的软硬件协同适配
- 批准号:
0205638 - 财政年份:2002
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Using Simultaneous Multithreaded Processors for Soft Real-Time Applications
使用同步多线程处理器进行软实时应用
- 批准号:
0209198 - 财政年份:2002
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似国自然基金
Ag/PAAm-CSA水凝胶柔性电极用于胃黏膜消融和创面保护的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
严寒环境CaO@CaCO3“核壳”发热材料调控PC-CSA复合水泥体系热-力性能及稳定性研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
CSA调控水稻光敏育性转换分子机制的研究
- 批准号:31970803
- 批准年份:2019
- 资助金额:57 万元
- 项目类别:面上项目
ABCB1甲基化水平调控T淋巴细胞内CsA浓度引起CsA药效学差异的研究
- 批准号:81803634
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
“代谢-转运互作”介导槲皮素及其活性代谢物Q3GA调控CsA药动学的分子机制
- 批准号:81874326
- 批准年份:2018
- 资助金额:57.0 万元
- 项目类别:面上项目
相似海外基金
Applying methods & lessons learned from online illicit trade detection to CSA text links
使用方法
- 批准号:
10070472 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Collaborative R&D
Collaborative Research: CSAwesome: Transitioning Teachers from AP CSP to CSA with Differentiated Professional Development
合作研究:CSAwesome:通过差异化专业发展将教师从 AP CSP 过渡到 CSA
- 批准号:
2031361 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: CSAwesome: Transitioning Teachers from AP CSP to CSA with Differentiated Professional Development
合作研究:CSAwesome:通过差异化专业发展将教师从 AP CSP 过渡到 CSA
- 批准号:
2031362 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Functional analysis of Cockayne syndrome proteins in transcription-coupled nucleotide excision repair
科凯恩综合征蛋白在转录偶联核苷酸切除修复中的功能分析
- 批准号:
20K06543 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Young child sexual abuse (CSA) survivors' perspectives on CSA Prevention Methods
幼儿性虐待 (CSA) 幸存者对 CSA 预防方法的看法
- 批准号:
2276792 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Studentship