CSR-PSCE,SM: A Holistic Design Approach to Reliability Using 3D Stacked
CSR-PSCE,SM:使用 3D 堆叠的可靠性整体设计方法
基本信息
- 批准号:0834798
- 负责人:
- 金额:$ 40.29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2008
- 资助国家:美国
- 起止时间:2008-09-01 至 2013-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The future of information technology industry depends on designing computer systems that are tolerant of errors caused by variations in device characteristics. Traditionally system reliability is achieved by replicating critical system components. Since variability induced errors occur slowly over time, replication for the sole purpose of providing reliability is prohibitively expensive for low cost computing platforms. This research explores using 3D stacking to implement redundant components and variability monitoring circuitry on a 3D stacked die. Using 3D stacking the redundant computation blocks can be built using a variation resilient process technology that may be slower than the process technology used for building the primary processor. This research takes a holistic approach to designing the 3D stacked monitoring spanning from innovative microarchitecture solutions to exploiting application's inherent error tolerance. On the microarchitecture front, this research explores the potential for seamlessly reconfiguring the monitoring layer to act in three modes: performance assists, when variability induced errors are rare, or as guard processors, when variability induced errors begin to appear, or as backup processors, when device aging may result in irreparable errors on the primary processing substrate. On the architecture front, this research explores a new exception class called Reliability Aware Exceptions that allow microarchitecture blocks to raise an exception in response to a variability induced error. These software visible exceptions can then be exploited by application classes that are inherently error tolerant and can customized exception handling mechanisms.
信息技术行业的未来取决于设计能够容忍因设备特性变化引起的错误的计算机系统。传统上,系统可靠性是通过复制关键系统组件来实现的。由于可变性引起的错误随着时间的推移缓慢发生,因此对于低成本计算平台来说,仅出于提供可靠性的目的而进行的复制成本过高。本研究探索使用 3D 堆叠在 3D 堆叠芯片上实现冗余组件和可变性监控电路。使用 3D 堆叠,可以使用变化弹性处理技术来构建冗余计算块,该技术可能比用于构建主处理器的处理技术慢。这项研究采用整体方法来设计 3D 堆叠监控,涵盖从创新微架构解决方案到利用应用程序固有的容错能力。在微架构方面,本研究探讨了无缝重新配置监控层以在三种模式下运行的潜力:当可变性引起的错误很少见时,性能辅助,或者当可变性引起的错误开始出现时,作为保护处理器,或者作为备份处理器,当器件老化可能导致主处理基板出现不可修复的错误时。在架构方面,本研究探索了一种称为可靠性感知异常的新异常类,它允许微架构块引发异常以响应可变性引起的错误。然后,这些软件可见的异常可以被本质上具有容错能力并且可以定制异常处理机制的应用程序类利用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Murali Annavaram其他文献
Differentially Private Next-Token Prediction of Large Language Models
大型语言模型的差分隐私下一个标记预测
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
James Flemings;Meisam Razaviyayn;Murali Annavaram - 通讯作者:
Murali Annavaram
Murali Annavaram的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Murali Annavaram', 18)}}的其他基金
SHF: Small: ML Accelerator Cohort Architecture
SHF:小型:ML 加速器群组架构
- 批准号:
2224319 - 财政年份:2022
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
Student Travel Support for the 2018 International Symposium on Computer Architecture (ISCA)
2018 年计算机体系结构国际研讨会 (ISCA) 学生旅行支持
- 批准号:
1812942 - 财政年份:2018
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
SHF:Small: Accelerating Graph Analytics Through Coordinated Storage, Memory and Computing Advances
SHF:Small:通过协调存储、内存和计算进步加速图形分析
- 批准号:
1719074 - 财政年份:2017
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
SHF:Small: Benchmarking of Transient and Intermittent Errors and Their Application to Microarchitecture
SHF:Small:瞬态和间歇性错误的基准测试及其在微架构中的应用
- 批准号:
1219186 - 财政年份:2012
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
IEEE International Symposium on Workload Characterization (IISWC) Student Subsidy Proposal
IEEE 国际工作负载表征研讨会 (IISWC) 学生资助提案
- 批准号:
1104542 - 财政年份:2011
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
CAREER: From Nonstop-Monitoring to Nano-ISA: An Adaptive Multi-Dimensional Framework for Processor Reliability
职业生涯:从不间断监控到 Nano-ISA:处理器可靠性的自适应多维框架
- 批准号:
0954211 - 财政年份:2010
- 资助金额:
$ 40.29万 - 项目类别:
Continuing Grant
CSR-PSCE,SM: Trade-offs Between Static Power, Performance and Reliability in Future Chip Multiprocessors
CSR-PSCE,SM:未来芯片多处理器静态功耗、性能和可靠性之间的权衡
- 批准号:
0834799 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
CT-ISG: A Game Theoretic Framework for Privacy Preservation in Community-Based Mobile Applications
CT-ISG:基于社区的移动应用程序中隐私保护的博弈论框架
- 批准号:
0831545 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
相似海外基金
CSR-PSCE, SM: MPI-PPA: Improving Efficiency of Large-Scale Clusters Through Statistical Performance Prediction
CSR-PSCE、SM:MPI-PPA:通过统计性能预测提高大规模集群的效率
- 批准号:
0936251 - 财政年份:2009
- 资助金额:
$ 40.29万 - 项目类别:
Continuing Grant
Collaborative Research: CSR-PSCE, SM: Adaptive Memory Management in Shared Environments
合作研究:CSR-PSCE、SM:共享环境中的自适应内存管理
- 批准号:
0834323 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Continuing Grant
CSR-PSCE,SM: Trade-offs Between Static Power, Performance and Reliability in Future Chip Multiprocessors
CSR-PSCE,SM:未来芯片多处理器静态功耗、性能和可靠性之间的权衡
- 批准号:
0834799 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant
CSR-PSCE,SM: Recovery Aware Parallel Computing
CSR-PSCE,SM:恢复感知并行计算
- 批准号:
0834514 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Continuing Grant
CSR-PSCE, SM: Automatic Multithreaded and Transactional Memory Workload Synthesis for Efficient Multi-core Design Space Evaluation
CSR-PSCE、SM:自动多线程和事务性内存工作负载合成,用于高效的多核设计空间评估
- 批准号:
0834288 - 财政年份:2008
- 资助金额:
$ 40.29万 - 项目类别:
Standard Grant