Collaborative Research: OAC Core: Simulation-driven runtime resource management for distributed workflow applications
协作研究:OAC Core:分布式工作流应用程序的模拟驱动的运行时资源管理
基本信息
- 批准号:2106059
- 负责人:
- 金额:$ 28万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Many scientific breakthroughs in domains such as health, climate modeling, particle physics, seismology, etc., can only be achieved by performing complex processing of vast amounts of data. This processing is automated by software systems that use the compute, storage, and network hardware provided by the cyberinfrastructure. In addition to automation, a key objective of these systems is the efficient use of the resources as measured by cost and energy usage, while making the processing as fast as possible or as needed. To this end, these systems must make decisions regarding which resources should be used to do what and when. Many such systems are used in production today and make such decisions. Yet making good, let alone best, decisions is still an open research challenge. Theoretical research has proposed solutions that are difficult to put into practice, and practical solutions are known to not make good decisions, or at least not consistently so. However, both theory and practice follow the same basic philosophy: make decisions by reasoning about known information on what needs to be computed and on what hardware resources are available. This philosophy has shown its limits, so this project adopts a radically different approach. The key idea is to repeatedly execute fast, computationally inexpensive simulations of the application execution in order to evaluate large sets of potential resource management decisions and automatically select the most desirable ones. The benefits of this approach will be demonstrated for several software systems used to support scientific applications that are critical for the development and sustainability of society.Software systems are used to run scientific applications on advanced cyberinfrastructure. These systems automate application execution, and make resource management decision along several axes including selecting and provisioning (virtualized) hardware, picking application configuration options, and scheduling application activities in time and space. Their objective is to optimize both application performance and also a set of resource usage efficiency metrics that include monetary and energy costs. Consequently, the resource management decision space is enormous, and making good decisions is a steep challenge that has been the subject of countless efforts, both from theoreticians and practitioners. However, the challenge is far from being solved: theoreticians produce solutions that are rarely used by practitioners, and conversely practitioners implement solutions that may be highly sub-optimal because they not informed by theory. This project resolves this disconnect by obviating the need for developing effective resource management strategies. The key idea is to use online simulations to search the resource management decision space rapidly at runtime. Large numbers of fast simulations of the application's execution are executed throughout that very execution, so as to evaluate many potential resource management options and automatically select desirable ones. This approach thus shifts the overall problem from the design of complex resource management algorithms to the enumeration of many resource management decisions. The transformation of resource management practice in cyberinfrastructure systems not only renders the resource management problem tractable but also unlocks previously out-of-reach resource management decisions. The benefits of this transformation will be demonstrated for a critical class of production systems and applications, specifically Workflow Management Systems and the scientific applications they support.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
只有通过对大量数据进行复杂的处理,才能实现许多领域的科学突破,例如健康,气候建模,粒子物理学,地震学等。 该处理是由使用CyberinFrastructure提供的计算,存储和网络硬件的软件系统自动化的。 除自动化外,这些系统的关键目标是通过成本和能源使用衡量的资源有效地使用,同时使处理尽可能快地或尽可能快地进行处理。为此,这些系统必须决定应使用哪些资源来做什么以及何时进行。 当今的生产中使用了许多这样的系统并做出这样的决定。然而,做好的决定仍然是一项开放的研究挑战。理论研究提出了难以实践的解决方案,并且已知实用的解决方案不会做出好的决定,或者至少并非一致。 但是,理论和实践都遵循相同的基本理念:通过关于需要计算的内容以及可用哪些硬件资源的知识信息来做出决策。这种哲学表明了它的局限性,因此该项目采用了一种根本不同的方法。 关键想法是对应用程序执行的快速,计算廉价的模拟反复执行,以评估大量潜在的资源管理决策集并自动选择最期望的决策。对于支持对社会发展和可持续性至关重要的科学应用程序的几种软件系统,将证明这种方法的好处。软件系统用于运行高级网络基础设施的科学应用。 这些系统可自动执行应用程序,并沿多个轴进行资源管理决策,包括选择和配置(虚拟化)硬件,选择应用程序配置选项以及时间和空间中的应用程序活动。他们的目标是优化应用程序性能以及包括货币和能源成本在内的一组资源使用效率指标。因此,资源管理决策空间是巨大的,做出良好的决策是一个巨大的挑战,是理论家和从业者的无数努力。 但是,挑战尚未解决:理论家生产的解决方案很少被从业者使用,而从业者则实施了可能是高度优势的解决方案,因为他们没有通过理论告知。该项目通过消除需要制定有效资源管理策略的需求来解决这种脱节。 关键想法是使用在线模拟在运行时迅速搜索资源管理决策空间。在整个执行过程中,对应用程序执行进行了大量快速模拟,以评估许多潜在的资源管理选项并自动选择理想的选项。 因此,这种方法将整体问题从复杂的资源管理算法的设计转变为许多资源管理决策的列举。网络基础结构系统中资源管理实践的转换不仅使资源管理问题可解决,而且还可以解锁以前无法到达的资源管理决策。 对于关键类别的生产系统和应用程序,特别是工作流程管理系统和他们支持的科学应用程序,将证明这种转型的好处。该奖项反映了NSF的法定任务,并被认为是通过基金会的知识分子的智力优点和更广泛影响的评估标准来通过评估来获得支持的。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
WfCommons: Data Collection and Runtime Experiments using Multiple Workflow Systems
WfCommons:使用多个工作流系统的数据收集和运行时实验
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Casanova, H.;K. Berney, K.;Chastel, S.;R. Ferreira da Silva, Rafael
- 通讯作者:R. Ferreira da Silva, Rafael
On the Feasibility of Simulation-driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems
网络基础设施运行时系统仿真驱动组合调度的可行性
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Casanova. H.;Wong Y. C.;Pottier, L.;Ferreira da Silva, R.
- 通讯作者:Ferreira da Silva, R.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Henri Casanova其他文献
High-Bandwidth Low-Latency Approximate Interconnection Networks
高带宽低延迟近似互连网络
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Daichi Fujiki;Kiyo Ishii;Ikki Fujiwara;Hiroki Matsutani;Hideharu Amano ;Henri Casanova;Michihiro Koibuchi - 通讯作者:
Michihiro Koibuchi
一般化ガンマクラスタリングについて
关于广义伽马聚类
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Ikki Fujiwara;Michihiro Koibuchi. Tomoya Ozaki;Hiroki Matsutani;Henri Casanova;稲垣貴大・結縁祥治;野津昭文,大前勝弘,江口真透 - 通讯作者:
野津昭文,大前勝弘,江口真透
Discussion on Approximate Interconnection Networks
近似互连网络的讨论
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Nguyen T. Truong;Henri Casanova;鯉渕 道紘 - 通讯作者:
鯉渕 道紘
FPGAアクセラレータと高位合成系を用いた瞳検出手法の実装
利用FPGA加速器和高级综合系统实现瞳孔检测方法
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
鯉渕道紘;松谷宏紀;天野英晴;D.Frank Hsu;Henri Casanova;土肥慶亮,柴田裕一郎,小栗清 - 通讯作者:
土肥慶亮,柴田裕一郎,小栗清
Characterizing fault tolerance in genetic programming
表征遗传编程中的容错能力
- DOI:
10.1145/1555284.1555286 - 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
D. L. González;Francisco Fernández de Vega;Henri Casanova - 通讯作者:
Henri Casanova
Henri Casanova的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Henri Casanova', 18)}}的其他基金
Collaborative Research: Elements: Simulation-driven Evaluation of Cyberinfrastructure Systems
协作研究:要素:网络基础设施系统的仿真驱动评估
- 批准号:
2103489 - 财政年份:2021
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
CCRI: Planning: Collaborative Research: Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Management Systems
CCRI:规划:协作研究:支持科学工作流程管理系统系统开发和研究的基础设施
- 批准号:
2016610 - 财政年份:2020
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Small: Integrating core CI literacy and skills into university curricula via simulation-driven activities
协作研究:网络培训:实施:小型:通过模拟驱动的活动将核心 CI 素养和技能融入大学课程
- 批准号:
1923621 - 财政年份:2019
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Worflow Users, Developers, and Researchers
协作研究:SI2-SSE:WRENCH:面向科学 Worflow 用户、开发人员和研究人员的模拟工作台
- 批准号:
1642369 - 财政年份:2017
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: II-New: Distributed Research Testbed (DiRT)
协作研究:II-新:分布式研究测试台 (DiRT)
- 批准号:
0855245 - 财政年份:2009
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: CSR-PDOS: Designing Large-Scale Distributed Systems for Realistic Failure Models
合作研究:CSR-PDOS:为现实故障模型设计大规模分布式系统
- 批准号:
0546688 - 财政年份:2005
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
相似国自然基金
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
- 批准号:62371263
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
腙的Heck/脱氮气重排串联反应研究
- 批准号:22301211
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
- 批准号:52364038
- 批准年份:2023
- 资助金额:33 万元
- 项目类别:地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
- 批准号:82371176
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FOXO3 m6A甲基化修饰诱导滋养细胞衰老效应在补肾法治疗自然流产中的机制研究
- 批准号:82305286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 28万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 28万 - 项目类别:
Standard Grant