Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
基本信息
- 批准号:2403398
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-07-01 至 2027-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Supercomputers, or high-performance computing (HPC) clusters, are instrumental in propelling scientific and engineering research by offering vast computational resources. These systems are increasingly crucial as artificial intelligence (AI) techniques become pervasive across various fields, including climate modeling, drug discovery, and physics simulations, significantly expanding the need for computational power and data management. However, the existing HPC infrastructures face challenges with extended job wait times and suboptimal resource use, primarily due to the escalating complexity of computations and the burgeoning demands for resources. Unlike traditional HPC tasks, AI algorithms and models exhibit distinct resource requirements, often resulting in either excess or insufficient resource allocation for AI tasks. This project aims to bridge the gap between HPC resource provisioning and AI application demands through an in-depth analysis of resource allocation and utilization within HPC environments running AI workloads. The goal is to identify strategies for minimizing resource waste and reducing the length of job queues by efficiently reallocating idle resources to accommodate large-scale AI tasks. By creating and disseminating datasets, models, algorithms, and system source code, this initiative will contribute valuable tools and insights to the research community. The findings will be broadly shared through research papers, technical reports, book chapters, course materials, and tutorials, enhancing the knowledge base in both HPC and AI fields and supporting the broader objectives of promoting scientific progress, improving national health, prosperity, and welfare, and contributing to national defense. This project centers on advancing the efficiency and productivity of HPC systems by innovatively leveraging idle resources to expedite AI job processing and diminish waiting periods. The research is structured around three interconnected themes, each addressing critical aspects of resource utilization and AI performance enhancement within HPC environments. The initial theme undertakes a comprehensive analysis of idle resources in HPC systems, aiming to identify patterns and opportunities for resource optimization. Building on the insights gained, the second theme explores methodologies for the safe and timely harvesting of idle resources across various categories, ensuring that these resources can be reallocated without compromising system stability or performance. The third theme is dedicated to developing strategies that utilize these harvested resources to boost AI application outcomes significantly and, by extension, enhance the overall productivity of HPC operations. The project will implement a tangible HPC testbed equipped with real-world benchmarks and workloads alongside these thematic investigations. This testbed will serve as a platform for empirically validating developed algorithms and systems, facilitating a rigorous assessment of their effectiveness in improving HPC resource allocation and utilization.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
超级计算机或高性能计算(HPC)集群,通过提供大量的计算资源来推动科学和工程研究。随着人工智能(AI)技术在各个领域的普及,包括气候建模,药物发现和物理模拟,大大扩大了计算能力和数据管理的需求,这些系统越来越至关重要。但是,现有的HPC基础架构通过延长的工作等待时间和次优的资源使用面临挑战,这主要是由于计算的复杂性和对资源的新兴需求的升级。与传统的HPC任务不同,AI算法和模型表现出不同的资源要求,通常导致AI任务的过量或不足资源分配。该项目旨在通过对运行AI工作负载的HPC环境中资源分配和利用的深入分析来弥合HPC资源提供和AI应用程序之间的差距。目的是确定最大程度地减少资源浪费的策略,并通过有效地重新分配空闲资源以适应大规模的AI任务,从而减少工作队列的长度。通过创建和传播数据集,模型,算法和系统源代码,该计划将为研究社区提供宝贵的工具和见解。这些发现将通过研究论文,技术报告,书籍章节,课程材料和教程大致分享,增强了HPC和AI领域的知识基础,并支持促进科学进步,改善国家健康,繁荣和福利的更广泛的目标,并为国防而做出贡献。该项目以创新的利用闲置资源来加快AI工作处理并减少等待时间来提高HPC系统的效率和生产率。该研究围绕三个相互联系的主题进行了结构,每个主题都涉及资源利用的关键方面和HPC环境中AI性能的提高。最初的主题对HPC系统中的空闲资源进行了全面分析,旨在确定资源优化的模式和机会。第二个主题以洞察力为基础,探讨了在各个类别中安全和及时收集闲置资源的方法,以确保可以将这些资源重新分配而不会损害系统稳定性或性能。第三个主题致力于制定利用这些收获的资源来大大提高AI应用结果的策略,并扩展提高HPC运营的整体生产率。该项目将实施一个有形的HPC测试床,配备了现实世界的基准和工作量以及这些主题研究。该测试床将作为实证验证算法和系统的平台,促进对其在改善HPC资源分配和利用方面的有效性进行严格的评估。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的智力和更广泛影响的评估来通过评估来支持的,这是值得的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Hao Wang其他文献
Tetragon-based carbon allotropes T-C8 and its derivatives: A theoretical investigation
四方基碳同素异形体T-C8及其衍生物:理论研究
- DOI:
10.1016/j.commatsci.2017.12.028 - 发表时间:
2018-03 - 期刊:
- 影响因子:3.3
- 作者:
Yanan Lv;Hao Wang;Yuqing Guo;Bo Jiang;Yingxiang Cai - 通讯作者:
Yingxiang Cai
A phosphaphenanthrene-benzimidazole derivative for enhancing fire safety of epoxy resins
一种增强环氧树脂防火安全性的磷杂菲-苯并咪唑衍生物
- DOI:
10.1016/j.reactfunctpolym.2022.105390 - 发表时间:
2022-11 - 期刊:
- 影响因子:5.1
- 作者:
Yixiang Xu;Junjie Wang;Wenbin Zhang;Siqi Huo;Zhengping Fang;Pingan Song;Dong Wang;Hao Wang - 通讯作者:
Hao Wang
Global existence and decay of solutions for hard potentials to the fokker-planck-boltzmann equation without cut-off
无截止福克-普朗克-玻尔兹曼方程硬势解的全局存在和衰减
- DOI:
10.3934/cpaa.2020135 - 发表时间:
2020 - 期刊:
- 影响因子:1
- 作者:
Lvqiao Liu;Hao Wang - 通讯作者:
Hao Wang
Global existence and decay of solutions for soft potentials to the Fokker–Planck–Boltzmann equation without cut-off
无截止的福克-普朗克-玻尔兹曼方程软势解的全局存在和衰减
- DOI:
10.1016/j.jmaa.2020.123947 - 发表时间:
2020 - 期刊:
- 影响因子:1.3
- 作者:
Hao Wang - 通讯作者:
Hao Wang
Visualizing Plant Cells in A Brand New Way
以全新方式可视化植物细胞
- DOI:
10.1016/j.molp.2016.02.006 - 发表时间:
- 期刊:
- 影响因子:27.5
- 作者:
Hao Wang - 通讯作者:
Hao Wang
Hao Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Hao Wang', 18)}}的其他基金
RII Track-4:NSF: Federated Analytics Systems with Fine-grained Knowledge Comprehension: Achieving Accuracy with Privacy
RII Track-4:NSF:具有细粒度知识理解的联合分析系统:通过隐私实现准确性
- 批准号:
2327480 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Small: Critical Learning Periods Augmented Robust Federated Learning
协作研究:SaTC:核心:小型:关键学习期增强鲁棒联邦学习
- 批准号:
2315612 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CRII: OAC: High-Efficiency Serverless Computing Systems for Deep Learning: A Hybrid CPU/GPU Architecture
CRII:OAC:用于深度学习的高效无服务器计算系统:混合 CPU/GPU 架构
- 批准号:
2153502 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
RI: Small: Enabling Interpretable AI via Bayesian Deep Learning
RI:小型:通过贝叶斯深度学习实现可解释的人工智能
- 批准号:
2127918 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
US-China planning visit: Development of High Performance and Multifunctional Infrastructure Material
中美计划访问:高性能多功能基础设施材料的开发
- 批准号:
1338297 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SBIR Phase II: SAFE: Behavior-based Malware Detection and Prevention
SBIR 第二阶段:SAFE:基于行为的恶意软件检测和预防
- 批准号:
0750299 - 财政年份:2008
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
SBIR Phase I: SpiderWeb - Self-Healing Networks for Spyware Detection
SBIR 第一阶段:SpiderWeb - 用于间谍软件检测的自我修复网络
- 批准号:
0638170 - 财政年份:2007
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Constructibility and Large Cardinal Numbers
可构造性和大基数
- 批准号:
7902941 - 财政年份:1979
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
钛基骨植入物表面电沉积镁氢涂层及其促成骨性能研究
- 批准号:52371195
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
CLMP介导Connexin45-β-catenin复合体对先天性短肠综合征的致病机制研究
- 批准号:82370525
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
人工局域表面等离激元高灵敏传感及其系统小型化的关键技术研究
- 批准号:62371132
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
优先流对中俄原油管道沿线多年冻土水热稳定性的影响机制研究
- 批准号:42301138
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
用于稳定锌负极的界面层/电解液双向调控研究
- 批准号:52302289
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant