CAREER: Towards Gray-Fault Tolerant Cloud through Harnessing and Enhancing System Observability
职业:通过利用和增强系统可观测性迈向灰色容错云
基本信息
- 批准号:2317751
- 负责人:
- 金额:$ 60.95万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-03-15 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Cloud systems are the crucial infrastructure to many services existing today. Ensuring cloud software runs continuously without disruptions is both vital and challenging. Decades of research have developed mature techniques to detect and mask faults in distributed systems. But these techniques often use a simple model that assumes a system component either works or completely stops. Numerous real-world cloud incidents, however, suggest that production cloud systems frequently experience gray failures---a degraded operational mode in which a system component appears to be working but is in fact severely impaired. Gray failures cannot be effectively dealt with by current solutions. The overall objective of this proposal is to develop a holistic approach to detect, pinpoint and diagnose gray failures in production cloud systems. To realize the objective, four synergistic research activities are proposed. Specifically, the project conducts a study on real-world gray failure cases in popular distributed systems, measure and characterize the observability of existing systems. The project then designs a novel hybrid analysis that automatically inserts report-generation hooks across the whole systems stack to harness observability for detecting gray failures. To pinpoint the culprit component, this project further proposes algorithms to infer causality from the collected observations. Lastly, this project designs a runtime checking framework for increasing observability and online diagnosis of gray failures. Gray failures are a common cause of cloud service outages, resulting in significant financial loss. This project can effectively improve our understandings of gray failures and help detect and debug gray failures to reduce their impact on the ubiquitous cloud infrastructures. Software is moving to be more distributed with increasing subtle failure modes. Observability, fault detection, and localization are critical skills for this paradigm shift but are rarely covered in the existing curriculum. This project addresses this educational gap through curriculum development and student training. This project also promotes Computer Science education to underrepresented Baltimore high school students by organizing workshops in partnership with a non-profit organization, Code in the Schools, for local high school students to showcase cloud and system failure concepts.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
云系统是当今现有许多服务的关键基础架构。确保云软件在没有中断的情况下连续运行既重要又具有挑战性。数十年的研究已经开发了成熟的技术来检测和掩盖分布式系统中的断层。但是这些技术通常使用一个简单的模型,该模型假设系统组件可以工作或完全停止。但是,许多现实世界中的云事件表明,生产云系统经常经历灰色故障 - 一种降级的操作模式,其中系统组件似乎在起作用,但实际上受到严重损害。当前解决方案无法有效处理灰色故障。该建议的总体目的是开发一种整体方法来检测,查明和诊断生产云系统中的灰色故障。为了实现目标,提出了四项协同研究活动。具体而言,该项目对流行的分布式系统中现实世界中的灰色故障案例进行了研究,并衡量并表征了现有系统的可观察性。然后,该项目设计了一种新型的混合分析,该分析会自动插入整个系统堆栈中的报告生成钩,以利用可观察到检测灰色故障的可观察性。为了确定罪魁祸首的成分,该项目进一步提出了算法,以从收集的观测值中推断出因果关系。最后,该项目设计了一个运行时检查框架,以增加可观察性和在线诊断灰色故障。灰色失败是云服务中断的常见原因,导致了大幅财务损失。该项目可以有效地提高我们对灰色故障的理解,并有助于检测和调试灰色失败,以减少其对无处不在的云基础架构的影响。软件正在以增加微妙的故障模式的增加,以更加分布。可观察性,故障检测和本地化是这种范式转移的关键技能,但在现有课程中很少涵盖。该项目通过课程发展和学生培训来解决这一教育差距。该项目还通过与非营利组织“代码中的代码”组织讲习班来促进计算机科学教育对巴尔的摩高中学生的代表性不足,供本地高中生宣传云和系统失败概念。这项奖项反映了NSF的法定任务,并认为通过基金会的知识优点和广泛的crietia crietia criperia criperia criperia criperia criperia criperia criperia criperia criperia recectia rection the Apportia奖。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Pushing Performance Isolation Boundaries into Application with pBox
- DOI:10.1145/3600006.3613159
- 发表时间:2023-10
- 期刊:
- 影响因子:0
- 作者:Yigong Hu;Gongqi Huang;Peng Huang
- 通讯作者:Yigong Hu;Gongqi Huang;Peng Huang
Simplifying Cloud Management with Cloudless Computing
通过无云计算简化云管理
- DOI:10.1145/3626111.3628206
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Qiu, Yiming;Kon, Patrick Tser;Xing, Jiarong;Huang, Yibo;Liu, Hongyi;Wang, Xinyu;Huang, Peng;Chowdhury, Mosharaf;Chen, Ang
- 通讯作者:Chen, Ang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Peng Huang其他文献
Genome Editing Recreates Hereditary Persistence of Fetal Hemoglobin in Primary Human Erythroblasts
基因组编辑重现了原代人类成红细胞中胎儿血红蛋白的遗传持久性
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Elizabeth A. Traxler;Yu Yao;Chunliang Li;Jeremy D. Grevet;Peng Huang;Shaela Wright;G. Blobel;M. Weiss - 通讯作者:
M. Weiss
Design and Hardware Implementation of Neuromorphic Systems With RRAM Synapses and Threshold-Controlled Neurons for Pattern Recognition
用于模式识别的具有 RRAM 突触和阈值控制神经元的神经形态系统的设计和硬件实现
- DOI:
10.1109/tcsi.2018.2812419 - 发表时间:
2018-04 - 期刊:
- 影响因子:0
- 作者:
Yuning Jiang;Peng Huang;Dongbin Zhu;Zheng Zhou;Runze Han;Lifeng Liu;Xiaoyan Liu;Jinfeng Kang - 通讯作者:
Jinfeng Kang
Tailoring the cationic and anionic sites of LaFeO3-based perovskite generates multiple vacancies for efficient water oxidation
定制 LaFeO3 基钙钛矿的阳离子和阴离子位点可产生多个空位,实现高效水氧化
- DOI:
10.1039/d1ta03604a - 发表时间:
2021-08 - 期刊:
- 影响因子:11.9
- 作者:
Paul Blessington Selva;Tuzhi Xiong;Peng Huang;Qirong Tan;Yongchao Huang;Hao Yang;M.-Sadeeq Balogun - 通讯作者:
M.-Sadeeq Balogun
Antitumotelomerase-selective oncolytic adenoviral agents, OBP-301(Telomelysin)in prostate Cancer
前列腺癌中的抗肿瘤端粒酶选择性溶瘤腺病毒药物 OBP-301(Telomelysin)
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:0
- 作者:
Yoshida;M;et. al.;Peng Huang;Enokida H. et al.;榎田 英樹;Kaku H;Urakami S. et al.;Peng Hung - 通讯作者:
Peng Hung
Research on the Performance of China’s Mixed Ownership Enterprises Listed on SME Board—Based on the Perspective of Ownership Balance
我国中小企业板混合所有制企业绩效研究——基于股权制衡视角
- DOI:
10.12783/dtssehs/eemt2017/14548 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Desheng Zhu;Peng Huang - 通讯作者:
Peng Huang
Peng Huang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Peng Huang', 18)}}的其他基金
CNS Core: Small: Intelligent Fault Injection to Expose and Reproduce Production-Grade Bugs in Cloud Systems
CNS 核心:小型:智能故障注入以暴露和重现云系统中的生产级错误
- 批准号:
2317698 - 财政年份:2023
- 资助金额:
$ 60.95万 - 项目类别:
Standard Grant
FMitF: Track I: Synthesizing Semantic Checkers for Runtime Verification of Production Distributed Systems
FMITF:第一轨:综合语义检查器以进行生产分布式系统的运行时验证
- 批准号:
2318937 - 财政年份:2023
- 资助金额:
$ 60.95万 - 项目类别:
Standard Grant
CNS Core: Small: Intelligent Fault Injection to Expose and Reproduce Production-Grade Bugs in Cloud Systems
CNS 核心:小型:智能故障注入以暴露和重现云系统中的生产级错误
- 批准号:
2149664 - 财政年份:2021
- 资助金额:
$ 60.95万 - 项目类别:
Standard Grant
CAREER: Towards Gray-Fault Tolerant Cloud through Harnessing and Enhancing System Observability
职业:通过利用和增强系统可观测性迈向灰色容错云
- 批准号:
1942794 - 财政年份:2020
- 资助金额:
$ 60.95万 - 项目类别:
Continuing Grant
CRII: CSR: Toward Understanding and Automatically Detecting Specious Configuration in Large Systems
CRII:CSR:理解和自动检测大型系统中的可疑配置
- 批准号:
1755737 - 财政年份:2018
- 资助金额:
$ 60.95万 - 项目类别:
Standard Grant
相似国自然基金
HIV-1 Vpr蛋白诱导CD4+ T细胞向B细胞转分化的机制研究
- 批准号:82302514
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
深水湖泊氨氧化古菌垂向生境分化及其演化机制研究
- 批准号:42372353
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
cAMP/PKA-Cav1.2-DCT调控牙髓干细胞成牙向分化的分子机制研究
- 批准号:82301056
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
EAST高极向比压运行模式下芯部与边界兼容机制的数值模拟研究
- 批准号:12375228
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
CXCR5依赖的边缘区B细胞向滤泡树突状细胞呈递外泌体引发心脏移植排斥的研究
- 批准号:82300460
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
CAREER: Towards Gray-Fault Tolerant Cloud through Harnessing and Enhancing System Observability
职业:通过利用和增强系统可观测性迈向灰色容错云
- 批准号:
1942794 - 财政年份:2020
- 资助金额:
$ 60.95万 - 项目类别:
Continuing Grant
Comparison of tau-PET tracers: Progress towards a universal measure
tau-PET 示踪剂的比较:通用测量的进展
- 批准号:
10169910 - 财政年份:2020
- 资助金额:
$ 60.95万 - 项目类别:
Towards Generating a Multimodal and Multivariate Classification Model from Imaging and Non-Imaging Measures for Accurate Diagnosis and Monitoring of Dementia in Parkinsons disease.
从影像学和非影像学测量中生成多模式和多变量分类模型,以准确诊断和监测帕金森病痴呆。
- 批准号:
10028103 - 财政年份:2020
- 资助金额:
$ 60.95万 - 项目类别:
Towards Generating a Multimodal and Multivariate Classification Model from Imaging and Non-Imaging Measures for Accurate Diagnosis and Monitoring of Dementia in Parkinsons disease.
从影像学和非影像学测量中生成多模式和多变量分类模型,以准确诊断和监测帕金森病痴呆。
- 批准号:
10241526 - 财政年份:2020
- 资助金额:
$ 60.95万 - 项目类别:
Visceral fat, systemic inflammation and brain-tissue health: towards early detection and prevention of Alzheimer’s disease.
内脏脂肪、全身炎症和脑组织健康:早期发现和预防阿尔茨海默病。
- 批准号:
10202468 - 财政年份:2018
- 资助金额:
$ 60.95万 - 项目类别: