CSR: Small: Lightning in Clouds: Detection and Characterization of Very Short Bottlenecks
CSR:小:云中闪电:极短瓶颈的检测和表征
基本信息
- 批准号:1421561
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-10-01 至 2017-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A plausible explanation for the persistent low utilization of data centers (around 18% by Gartner reports) is the managerial need to maintain quality of service against the well-known Latency Long Tail problem, where some apparently random requests that normally return within milliseconds would suddenly take multiple seconds. The latency long tail problem arises at moderate utilization levels (e.g., 50%) with all resources far from saturation. Despite the efforts to remedy the latency long tail problem in various ways, its causes have remained elusive: In most cases, the very requests that took several seconds actually return within milliseconds when executed by themselves. Studying and solving the latency long tail problem will contribute to better utilization while maintaining quality of service, leading to lower costs for cloud users, higher return on investment for cloud providers, and lower power consumption for the environment. The main goal of this project is the investigation of the class of very short bottlenecks, in which the CPU becomes saturated only for a small fraction of a second, as a significant cause of latency long tail problems. Despite their short lifespan, very short bottlenecks can lead to significant response time increases (several seconds) by propagating queuing effects up and down the request chain in an n-tier application system because of strong dependencies among the tiers during request processing. This project runs large scale experiments in clouds and simulators to generate extensive fine-grain monitoring data in the investigation of very short bottlenecks, which are virtually invisible under typical performance monitoring tools with sampling periods of seconds or minutes. To match the time scale of very short bottlenecks, special instrumentation software tools are being refined to sample intra-server resource utilization at millisecond resolution and timestamp inter-server messages at microsecond resolution. Preliminary studies of n-tier application benchmarks with naturally bursty workloads have found very short bottlenecks that cause latency long tail in several system layers: systems software (JVM garbage collection), processor architecture (dynamic voltage and frequency scaling), and consolidation of applications in virtualized cloud environments. They show the potential for many other sources of very short bottlenecks, e.g., kernel daemon processes that use 100% of CPU for several milliseconds. Through careful distributed event analysis of the experimental data, new kinds of very short bottlenecks can be discovered, verified, reproduced, and studied in detail. Concrete solutions for specific very short bottlenecks have been developed, e.g., an improved Java garbage collector. However, other very short bottlenecks have no specific bug-fixes, e.g., those created by consolidated workload overlapping bursts of statistical nature. As an alternative to bug-fixes, more general solutions that disrupt queuing propagation are being explored. As a concrete example, instead of using a classic request/response approach, where waiting threads participate in the queuing propagation, asynchronous requests with notification of responses to reduce overall queuing is being investigated as a potential solution to eliminate or reduce the impact of several kinds of very short bottlenecks.
数据中心利用率持续低下(Gartner 报告约为 18%)的一个合理解释是,管理需要针对众所周知的延迟长尾问题来维持服务质量,即一些通常在几毫秒内返回的明显随机请求会突然出现。需要几秒钟。延迟长尾问题出现在中等利用率水平(例如 50%)且所有资源远未饱和的情况下。尽管人们努力以各种方式解决延迟长尾问题,但其原因仍然难以捉摸:在大多数情况下,花费几秒钟的请求在自行执行时实际上会在几毫秒内返回。研究和解决延迟长尾问题将有助于提高利用率,同时保持服务质量,从而降低云用户的成本,为云提供商提供更高的投资回报,并降低环境的功耗。该项目的主要目标是调查非常短的瓶颈类别,其中 CPU 仅在一小部分时间内达到饱和,这是延迟长尾问题的一个重要原因。尽管它们的生命周期很短,但非常短的瓶颈可能会导致响应时间显着增加(几秒钟),因为在请求处理过程中各层之间存在很强的依赖性,因此在 n 层应用程序系统中的请求链中上下传播排队效应,从而导致响应时间显着增加(几秒)。该项目在云和模拟器中运行大规模实验,以在调查非常短的瓶颈时生成广泛的细粒度监控数据,这些瓶颈在采样周期为几秒或几分钟的典型性能监控工具下几乎是不可见的。为了匹配非常短的瓶颈的时间尺度,正在改进特殊的仪器软件工具,以毫秒分辨率对服务器内资源利用率进行采样,并以微秒分辨率对服务器间消息进行时间戳处理。对具有自然突发工作负载的 n 层应用程序基准的初步研究发现,非常短的瓶颈会在多个系统层中导致延迟长尾:系统软件(JVM 垃圾收集)、处理器架构(动态电压和频率缩放)以及应用程序整合虚拟化云环境。它们显示了许多其他非常短的瓶颈来源的潜力,例如,内核守护进程在几毫秒内使用 100% 的 CPU。通过对实验数据进行仔细的分布式事件分析,可以发现、验证、重现和详细研究新型的非常短的瓶颈。针对特定的非常短的瓶颈的具体解决方案已经开发出来,例如改进的 Java 垃圾收集器。然而,其他非常短的瓶颈没有特定的错误修复,例如,由统计性质的合并工作负载重叠突发造成的瓶颈。作为错误修复的替代方案,正在探索破坏排队传播的更通用的解决方案。举一个具体的例子,我们正在研究使用带有响应通知的异步请求来减少总体排队的潜在解决方案,而不是使用等待线程参与排队传播的经典请求/响应方法,作为消除或减少多种影响的潜在解决方案。非常短的瓶颈。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Calton Pu其他文献
Approaches for service deployment
服务部署方法
- DOI:
10.1002/marc.201500587 - 发表时间:
2024-09-13 - 期刊:
- 影响因子:3.2
- 作者:
Qinyi Wu;Calton Pu;Wenchang Yan;Gueyoung Jung;Georgia Tech;Munindar P Singh - 通讯作者:
Munindar P Singh
Collaborative Computing: Networking, Applications and Worksharing
协作计算:网络、应用程序和工作共享
- DOI:
10.1007/978-3-642-03354-4 - 发表时间:
2024-09-13 - 期刊:
- 影响因子:0
- 作者:
James Joshi;Elisa Bertino;Calton Pu;H. Ramampiaro - 通讯作者:
H. Ramampiaro
JTangCSB: A Cloud Service Bus for Cloud and Enterprise Application Integration
JTangCSB:用于云和企业应用集成的云服务总线
- DOI:
10.1109/mic.2014.62 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Xingjian Lu;Calton Pu;Zhaohui Wu;Hanwei Chen - 通讯作者:
Hanwei Chen
Buffer overflows: attacks and defenses for the vulnerability of the decade
缓冲区溢出:十年来漏洞的攻击与防御
- DOI:
10.1109/discex.2000.821514 - 发表时间:
2000-01-25 - 期刊:
- 影响因子:0
- 作者:
Crispin Cowan;Perry Wagle;Calton Pu;Steve Beattie;Jonathan Walpole - 通讯作者:
Jonathan Walpole
Buffer Overflows : Attacks and Defenses for the Vulnerability of the Decade *
缓冲区溢出:十年来漏洞的攻击和防御 *
- DOI:
10.1109/discex.2000.821514 - 发表时间:
2000-01-25 - 期刊:
- 影响因子:0
- 作者:
Crispin Cowan;Perry Wagle;Calton Pu;Steve Beattie;Jonathan Walpole - 通讯作者:
Jonathan Walpole
Calton Pu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Calton Pu', 18)}}的其他基金
HNDS-I: Collaborative Research: Developing a Data Platform for Analysis of Nonprofit Organizations
HNDS-I:协作研究:开发用于分析非营利组织的数据平台
- 批准号:
2024320 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: Live Reality: Sustainable and Up-to-Date Information Quality in Live Social Media through Continuous Evidence-Based Knowledge Acquisition
EAGER:实时现实:通过持续的循证知识获取,实时社交媒体中可持续且最新的信息质量
- 批准号:
2039653 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RAPID: Tracking and Evaluation of the Coronavirus (COVID-19) Epidemic Propagation by Finding and Maintaining Live Knowledge in Social Media
RAPID:通过在社交媒体中查找和维护实时知识来跟踪和评估冠状病毒(COVID-19)的流行传播
- 批准号:
2026945 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
1st US-Japan Workshop Enabling Global Collaborations in Big Data Research; June, 2017, Atlanta, GA
第一届美日研讨会促进大数据研究的全球合作;
- 批准号:
1741034 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RCN: SAVI: Adaptive Management and Use of Resilient Infrastructures in Smart Cities: Support for Global Collaborative Research on Real-Time Analytics of Heterogeneous Big Data
RCN:SAVI:智慧城市弹性基础设施的适应性管理和使用:支持异构大数据实时分析的全球协作研究
- 批准号:
1550379 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: An Exploratory Study of Multi-Hazard Management through Multi-Source Integration of Physical and Social Sensors
EAGER:通过物理和社会传感器的多源集成进行多危害管理的探索性研究
- 批准号:
1402266 - 财政年份:2014
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SAVI: EAGER: for Global Research on Applying Information Technology to Support Effective Disaster Management (GRAIT-DM)
SAVI:EAGER:应用信息技术支持有效灾害管理的全球研究 (GRAIT-DM)
- 批准号:
1250260 - 财政年份:2012
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RAPID: Automating Emergency Data and Metadata Management to Support Effective Short Term and Long Term Disaster Recovery Efforts
RAPID:自动化应急数据和元数据管理,支持有效的短期和长期灾难恢复工作
- 批准号:
1138666 - 财政年份:2011
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
CSR:Small: Multi-Bottlenecks: What They Are and How to Find Them
CSR:小:多瓶颈:它们是什么以及如何找到它们
- 批准号:
1116451 - 财政年份:2011
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
II-NEW: Collaborative Research: Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons)
II-新:协作研究:垃圾邮件处理、归档和监控社区设施 (SPAM Commons)
- 批准号:
0855180 - 财政年份:2009
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
小分子代谢物Catechin与TRPV1相互作用激活外周感觉神经元介导尿毒症瘙痒的机制研究
- 批准号:82371229
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
DHEA抑制小胶质细胞Fis1乳酸化修饰减轻POCD的机制
- 批准号:82301369
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
异常激活的小胶质细胞通过上调CTSS抑制微血管特异性因子MFSD2A表达促进1型糖尿病视网膜病变的免疫学机制研究
- 批准号:82370827
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
SETDB1调控小胶质细胞功能及参与阿尔茨海默病发病机制的研究
- 批准号:82371419
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
PTBP1驱动H4K12la/BRD4/HIF1α复合物-PKM2正反馈环路促进非小细胞肺癌糖代谢重编程的机制研究及治疗方案探索
- 批准号:82303616
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Metabolism, Aging, Pathogenesis, Stress and Small RNAs Meeting
新陈代谢、衰老、发病机制、压力和小 RNA 会议
- 批准号:
9990946 - 财政年份:2021
- 资助金额:
$ 45万 - 项目类别:
Studies on electron acceleration and multiplication in lightning by a ground-based array of small dosimeters
地基小型剂量计阵列对闪电中电子加速和倍增的研究
- 批准号:
20K22354 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Collaborative Research: Characterizing Small-scale Lightning Discharges Associated with Explosive Volcanic Activity at Sakurajima Volcano
合作研究:描述与樱岛火山爆发性火山活动相关的小规模闪电放电特征
- 批准号:
1445703 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: Characterizing Small-scale Lightning Discharges Associated with Explosive Volcanic Activity at Sakurajima Volcano
合作研究:描述与樱岛火山爆发性火山活动相关的小规模闪电放电特征
- 批准号:
1445704 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant