Collaborative Research: CNS Core: Small: A new framework for building fail-slow fault-tolerant distributed systems
合作研究:CNS Core:Small:构建慢速容错分布式系统的新框架
基本信息
- 批准号:2130590
- 负责人:
- 金额:$ 24.95万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project targets a long-lasting and an increasingly pervasive challenge of distributed system design and implementation—fail-slow fault tolerance. Most existing fault-tolerant distributed systems are developed and tested to tolerate faults where a node has completely stopped, but they often do not perform well with the “fail-slow” faults, where a faulty node has not crashed but is operating at a degraded speed far below the standard performance. Fail-slow faults can happen for various reasons including hardware (e.g., an overheated chip), software (e.g., the process uses up all the memory), network (e.g., a loose cable), and human errors (e.g., the administrator launches too many processes on the same node). In many current fault-tolerant distributed systems, the fail-slow nodes can damage the entire system performance by holding up the healthy nodes in their execution. For example, a healthy node may keep buffering outbound messages to the slow nodes until it uses up its memory and crash. Improving fail-slow fault-tolerance is an important issue as fail-slow faults have been reported to be common in large-scale distributed systems deployed in modern data centers. The performance issues they cause are more hidden and hard to debug. To help improve this situation, this work will develop a set of novel, transformative technologies, including distributed-system programming support, design patterns, and runtime verification techniques, that will be encapsulated in a unified programming framework and will dramatically improve the performance and fault-tolerance of modern distributed systems.This research may have a major impact on industry and society, since distributed systems are the cornerstones of modern computing infrastructures such as cloud computing, cluster and datacenter technologies, and high performance computing. In particular, this work will be done in collaboration with widely used distributed databases, specifically MongoDB and TiDB. The PIs envision this effort as a catalyst for multidisciplinary research and education on distributed systems technologies at Stony Brook University and the University of Illinois. The PIs will use this work as a core that they hope will eventually grow to agglutinate other faculty of diverse expertise with interests in cloud computing, distributed systems, and software engineering technologies. Both universities are experiencing an unprecedented surge of students in Computer Science. The PIs are working with the department to broaden the course offerings with multidisciplinary courses in the general area of cloud computing, distributed systems, reliable systems, and software engineering. The PIs will incorporate the topics in this proposal in the courses they are teaching. The PIs have a long-standing commitment to undergraduate education and research, and to broaden participation to under-represented minorities. They will use this work to involve undergraduates and under-represented students in their research groups.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目针对分布式系统设计和实施(Fail-Slow slow Farterance)的持久和越来越普遍的挑战。开发和测试了大多数现有的耐故障分布式系统,以耐受节点完全停止的故障,但是它们通常在“失败慢”故障的情况下表现不佳,因为故障节点尚未崩溃,但以降级速度远低于标准性能。出于各种原因,可能发生故障 - 慢性故障,包括硬件(例如,过热的芯片),软件(例如,该过程使用所有内存),网络(例如,宽松的电缆)和人体错误(例如,管理员在同一节点上启动了太多的进程)。在许多当前容忍故障的分布式系统中,失败的节点可以通过在执行中保持健康的节点来损害整个系统性能。例如,健康的节点可能会继续向慢节点缓冲出站消息,直到它消除其内存和崩溃为止。改善失败 - 慢性故障耐受性是一个重要的问题,因为据报道,在现代数据中心部署的大规模分布式系统中,故障 - 慢断层很常见。他们引起的性能问题更隐藏,难以调试。为了帮助改善这种情况,这项工作将开发一系列新型的变革性技术,包括分布式系统编程支持,设计模式和运行时验证技术,这些技术将封装在统一的编程框架中,并会极大地改善对现代分布式系统的绩效和竞争力,这些研究可能会对工业和社会产生重大影响,因为它可能会构成现代化的综合系统。和数据中心技术以及高性能计算。特别是,这项工作将与广泛使用的分布式数据库合作完成。 PIS设想了这项努力作为Stony Brook大学和伊利诺伊大学分布式系统技术的多学科研究和教育的催化剂。 PI将把这项工作用作核心,他们希望最终会发展起来,以融合了对云计算,分布式系统和软件工程技术的兴趣的其他潜水员专业知识的教师。两所大学都经历了计算机科学领域的前所未有的学生。 PI与部门合作,通过云计算,分布式系统,可靠系统和软件工程的多学科课程扩大课程产品。 PI将在他们正在教的课程中将主题纳入本提案中。 PI对本科教育和研究有一项长期的承诺,并将参与扩大到代表不足的少数民族。他们将使用这项工作将本科生和代表性不足的学生参与其研究小组。该奖项反映了NSF的法定任务,并使用基金会的知识分子优点和更广泛的影响评估标准,通过评估诚实地表示支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Rolis: a software approach to efficiently replicating multi-core transactions
- DOI:10.1145/3492321.3519561
- 发表时间:2022-03
- 期刊:
- 影响因子:0
- 作者:Weihai Shen;Ansh Khanna;Sebastian Angel;S. Sen;Shuai Mu
- 通讯作者:Weihai Shen;Ansh Khanna;Sebastian Angel;S. Sen;Shuai Mu
DepFast: Orchestrating Code of Quorum Systems
DepFast:编排 Quorum 系统代码
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Luo, Xuhao;Shen, Weihai;Mu, Shuai;Xu, Tianyin
- 通讯作者:Xu, Tianyin
NCC: Natural Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall
- DOI:10.48550/arxiv.2305.14270
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Haonan Lu;Shuai Mu;S. Sen;Wyatt Lloyd
- 通讯作者:Haonan Lu;Shuai Mu;S. Sen;Wyatt Lloyd
Waverunner: An Elegant Approach to Hardware Acceleration of State Machine Replication
Waverunner:状态机复制硬件加速的优雅方法
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Alimadadi, Mohammadreza;Mai, Hieu;Cho, Shenghsun;Ferdman, Michael;Milder, Peter;Mu, Shuai
- 通讯作者:Mu, Shuai
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shuai Mu其他文献
An Improved, Scalable and Impurity-Free Process for Lixivaptan
Lixivaptan 的改进、可扩展且无杂质的工艺
- DOI:
10.1002/jhet.2176 - 发表时间:
2015 - 期刊:
- 影响因子:2.4
- 作者:
Shuai Mu;Duan Niu;Y. Liu;Zhang Dashuai;Dengke Liu;Chang - 通讯作者:
Chang
DPh-BTBT/P2V2TT共結晶の合成・構造および光物性
DPh-BTBT/P2V2TT共晶的合成、结构及光学性质
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Shuai Mu;高石慎也;山下正廣 - 通讯作者:
山下正廣
Aggregation induced emission and balanced ambipolar carrier transport of a distyrylthieno[3,2-b]thiophene based highly efficient red luminescent organic single crystal
二苯乙烯基噻吩并[3,2-b]噻吩基高效红色发光有机单晶的聚集诱导发射和平衡双极性载流子传输
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Shuai Mu;Shinya Takaishi;Masahiro Yamashita;Kazuaki Oniwa;Tiena Jin;Naoki Asao - 通讯作者:
Naoki Asao
住民との協働における地方自治体(職員)が持つべき戦略的視点-ブラジル・クリチバ市における開発的実践の分析から-
地方政府(官员)与居民合作时应具备的战略视角 - 巴西库里蒂巴发展实践分析 -
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Shuai Mu;Shinya Takaishi;Masahiro Yamashita;南 友二郎;南 友二郎;南 友二郎;南 友二郎;南 友二郎 - 通讯作者:
南 友二郎
CDW-MH Phase Transition in Quasi-One-Dimensional Halogen-Bridged Metal Complexes & Recent Progresses in Halogen-Bridged Metal Complexes (Toward Electronic Devices)
准一维卤桥金属配合物中的 CDW-MH 相变
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Shuai Mu;Shinya Takaishi;Masahiro Yamashita;高石慎也;高石慎也 - 通讯作者:
高石慎也
Shuai Mu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Shuai Mu', 18)}}的其他基金
Collaborative Research: CISE: Large: Systems Support for Run-Anywhere Serverless
协作研究:CISE:大型:对 Run-Anywhere Serverless 的系统支持
- 批准号:
2321725 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Continuing Grant
CAREER: Rethinking Replication in Highly Available and Reliable Data Stores
职业:重新思考高可用且可靠的数据存储中的复制
- 批准号:
2238768 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Continuing Grant
相似国自然基金
IL-17A通过STAT5影响CNS2区域甲基化抑制调节性T细胞功能在银屑病发病中的作用和机制研究
- 批准号:82304006
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
miR-20a通过调控CD4+T细胞焦亡促进CNS炎性脱髓鞘疾病的发生及机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
miR-20a通过调控CD4+T细胞焦亡促进CNS炎性脱髓鞘疾病的发生及机制研究
- 批准号:82201491
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
血浆CNS来源外泌体中寡聚磷酸化α-synuclein对PD病程的提示研究
- 批准号:82101506
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于脑微血管内皮细胞模型的毒力岛4在单增李斯特菌CNS炎症中的作用及机制研究
- 批准号:32160834
- 批准年份:2021
- 资助金额:35 万元
- 项目类别:地区科学基金项目
相似海外基金
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
- 批准号:
2230945 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
- 批准号:
2406598 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
- 批准号:
2418188 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
- 批准号:
2345339 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
- 批准号:
2225578 - 财政年份:2023
- 资助金额:
$ 24.95万 - 项目类别:
Standard Grant