CAREER: Principled yet practical observability for a microservices-based cloud

职业:基于微服务的云的原则性且实用的可观察性

基本信息

  • 批准号:
    2340128
  • 负责人:
  • 金额:
    $ 60.95万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-07-01 至 2029-06-30
  • 项目状态:
    未结题

项目摘要

Society relies on cloud-based software services built using the microservices architecture in almost every aspect of their everyday lives---e.g., to shop, watch movies, and work. Though the microservices architecture has many advantages, it has one critical drawback. Observing how user requests (e.g., to buy a book) are processed by services is extremely challenging because they involve myriad interactions among many simpler (micro)services. This lack of observability complicates important management tasks, such as problem diagnosis and resource management. Previous research has demonstrated the strong potential of distributed tracing---which captures graphs of how microservices interact to process requests---to provide microservice observability. But, results in real-world settings have been disappointing. This gap between potential and reality occurs because research efforts assume principled trace graphs that capture a variety of behaviors and have no data loss. But, in practice, services are never well-instrumented and data loss is common. The overarching goal of this proposal is to create a new tracing platform that automatically infers the data needed to make traces principled. Doing so will actualize distributed tracings' vast potential for microservice observability, improve the utility of existing tracing-based management tools, and enable transformative new tools. These outcomes will improve the resiliency and efficiency of the software services society depends on. Insights from this project will inform age-appropriate course material and projects in a college course on debugging cloud systems, a high-school research program, and a middle-school outreach program.This project proposes a novel tracing platform that automatically enriches span-based traces with two sets of primitives. 1) The happens-before concurrency/wait primitives and 2) the holes and holes covering primitives. The former allows requests' critical paths to be identified, enabling slack analyses, targeted performance debugging, and precise resource allocation decisions. The latter allows areas of data loss to be expressed in traces along with predictions of what work might execute in them. Since scheduling decisions may obfuscate causal structure, the project will investigate active probing methods to tease out concurrent and waiting relationships. To account for uncertainty, it will explore probabilistic data models to represent the primitives. The project will demonstrate the value of the primitives by modifying an existing auto-scaling solution and performance-debugging tool to use them. It will also demonstrate a new trace subgraph sampling approach made possible by the holes and holes covering primitive. The proposed platform, improved management tools, and inference methods will be publicly available to benefit the computer science and microservice observability communities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
社会在日常生活的几乎每个方面都依赖于使用微服务架构构建的基于云的软件服务——例如购物、看电影和工作。 尽管微服务架构有很多优点,但它有一个严重的缺点。 观察服务如何处理用户请求(例如,购买一本书)极具挑战性,因为它们涉及许多更简单(微)服务之间的无数交互。 这种可观察性的缺乏使重要的管理任务变得复杂,例如问题诊断和资源管理。 先前的研究已经证明了分布式跟踪(捕获微服务如何交互以处理请求的图表)在提供微服务可观察性方面的强大潜力。 但是,现实世界的结果却令人失望。 潜力与现实之间的差距之所以出现,是因为研究工作假设有原则的跟踪图可以捕获各种行为并且不会丢失数据。 但在实践中,服务从来都没有经过完善的检测,数据丢失也很常见。 该提案的总体目标是创建一个新的跟踪平台,自动推断使跟踪有原则性所需的数据。这样做将实现分布式跟踪在微服务可观察性方面的巨大潜力,提高现有基于跟踪的管理工具的实用性,并启用变革性的新工具。这些成果将提高社会所依赖的软件服务的弹性和效率。 该项目的见解将为大学调试云系统课程、高中研究计划和中学推广计划中适合年龄的课程材料和项目提供信息。该项目提出了一种新颖的跟踪平台,可以自动丰富基于跨度的跟踪平台。具有两组基元的跟踪。 1) 发生在并发/等待原语和 2) 漏洞和覆盖原语的漏洞。 前者可以识别请求的关键路径,从而实现松弛分析、有针对性的性能调试和精确的资源分配决策。 后者允许数据丢失的区域以痕迹的形式表达,并预测其中可能执行的工作。 由于调度决策可能会混淆因果结构,因此该项目将研究主动探测方法来梳理并发和等待关系。 为了解释不确定性,它将探索概率数据模型来表示基元。 该项目将通过修改现有的自动缩放解决方案和性能调试工具来使用它们来展示这些原语的价值。 它还将演示一种新的跟踪子图采样方法,该方法通过孔和覆盖图元的孔而成为可能。 拟议的平台、改进的管理工具和推理方法将公开,以使计算机科学和微服务可观测性社区受益。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Raja Sambasivan其他文献

Raja Sambasivan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Raja Sambasivan', 18)}}的其他基金

CSR: Small: A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications
CSR:小型:用于诊断分布式应用程序中性能问题的即时跨层仪表框架
  • 批准号:
    2016178
  • 财政年份:
    2019
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Standard Grant
CSR: Small: A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications
CSR:小型:用于诊断分布式应用程序中性能问题的即时跨层仪表框架
  • 批准号:
    2016178
  • 财政年份:
    2019
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Standard Grant
CSR: Small: A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications
CSR:小型:用于诊断分布式应用程序中性能问题的即时跨层仪表框架
  • 批准号:
    1815323
  • 财政年份:
    2018
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
  • 批准号:
    2237640
  • 财政年份:
    2023
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Continuing Grant
Principled phylogenomic analysis without gene tree estimation
无需基因树估计的有原则的系统发育分析
  • 批准号:
    2308495
  • 财政年份:
    2023
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Standard Grant
Principled Reasoning about Dynamical Systems
关于动力系统的原理推理
  • 批准号:
    RGPIN-2020-05031
  • 财政年份:
    2022
  • 资助金额:
    $ 60.95万
  • 项目类别:
    Discovery Grants Program - Individual
CRCNS Research Proposal: Collaborative Research: US-German Collaboration toward a biophysically principled network model of transcranial magnetic stimulation (TMS)
CRCNS 研究提案:合作研究:美德合作建立经颅磁刺激 (TMS) 的生物物理原理网络模型
  • 批准号:
    10708986
  • 财政年份:
    2022
  • 资助金额:
    $ 60.95万
  • 项目类别:
CRCNS Research Proposal: Collaborative Research: US-German Collaboration toward a biophysically principled network model of transcranial magnetic stimulation (TMS)
CRCNS 研究提案:合作研究:美德合作建立经颅磁刺激 (TMS) 的生物物理原理网络模型
  • 批准号:
    10610594
  • 财政年份:
    2022
  • 资助金额:
    $ 60.95万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了