Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
基本信息
- 批准号:2401244
- 负责人:
- 金额:$ 33.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2026-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Large computing clusters, including data centers and supercomputers, are used for a variety of applications including scientific computations and machine learning. Modern compute clusters typically use specialized accelerator hardware to speed up computations. Operators of accelerator-rich clusters aim to have high resource utilization across all users of the cluster. However, these systems are often under-utilized due to performance variability across accelerators; that is, application performance varies across accelerators even when the same application is run on the same type of accelerator. This proposal will develop Fortuna, a set of tools that can be used by cluster operators and researchers to characterize and harness variability across accelerators. First, Fortuna will use new methodologies to characterize how much performance variability exists across a wide range of accelerator hardware. Second, Fortuna will identify which applications are more likely to suffer from performance variability. Finally, Fortuna will include new scheduling mechanisms that can use variability measurements and knowledge about applications to improve utilization.Broader impacts of the proposed research include open-source implementations of algorithms and tools, which will be applicable to many large-scale clusters and lay the groundwork for wider industry adoption. The project will also create course modules on system design principles with heterogeneous hardware and software, based on the tools developed as a part of the proposal. This will teach the next generation of students how to design hardware and software to improve utilization of future systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大型计算集群,包括数据中心和超级计算机,用于各种应用,包括科学计算和机器学习。现代计算集群通常使用专用加速器硬件来加速计算。富含加速器的集群的运营商旨在让集群的所有用户获得高资源利用率。然而,由于加速器之间的性能差异,这些系统常常未得到充分利用;也就是说,即使同一应用程序在同一类型的加速器上运行,不同加速器的应用程序性能也会有所不同。该提案将开发 Fortuna,这是一套工具,集群运营商和研究人员可以使用它来表征和利用加速器之间的可变性。首先,Fortuna 将使用新的方法来表征各种加速器硬件中存在的性能差异程度。其次,Fortuna 将确定哪些应用程序更有可能受到性能变化的影响。最后,Fortuna 将包括新的调度机制,可以使用可变性测量和有关应用程序的知识来提高利用率。拟议研究的更广泛影响包括算法和工具的开源实现,这将适用于许多大规模集群并奠定了为更广泛的行业采用奠定基础。该项目还将根据作为提案一部分开发的工具,创建关于具有异构硬件和软件的系统设计原理的课程模块。这将教会下一代学生如何设计硬件和软件以提高未来系统的利用率。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhao Zhang其他文献
The genomic history of the Iberian Peninsula over the past 8000 years
伊比利亚半岛过去8000年的基因组历史
- DOI:
10.4236/jbbs.2019.96018 - 发表时间:
2024-09-14 - 期刊:
- 影响因子:0
- 作者:
I. Olalde;Swapan Mallick;Nick Patterson;N. Rohl;Mouco;Marina Silva;Katharina Dulias;C. Edwards;Francesca G;ini;ini;Maria;Pala;Pedro;Soares;Manuel;Ferr;o;o;Nicole;Adamski;Broom;khoshbacht;khoshbacht;O. Cheronet;B. Culleton;Daniel Fern;es;es;Marie Lawson;Matthew Mah;Jonas Oppenheimer;Kristin Stewardson;Zhao Zhang;Juan Manuel Jiménez Arenas;Isidro Jorge Toro Moyano;Domingo C. Salazar;P. Castanyer;Marta Santos;J. Tremoleda;Marina Lozano;Pablo García;Borja;J. Fernández;J. A. Mujika;Cecilio Barroso;J. Bermúdez;E. Mínguez;Josep Burch;Neus Coromina;David Vivó;A. Cebrià;Josep Maria Fullola;Oreto García‐Puchol;J. I. Morales;F. Xavier;12;Oms;Tona;Majó;Josep;Vergés;Antònia;Díaz;Imma;13;Castanyer;F. J. López;A. M. Silva;C. Alonso;Germán;Delibes;de;Castro;Javier;Jiménez;Echevarría;Adolfo;Moreno;Guillermo Pascual Berlanga;Pablo Ramos;José Ramos Muñoz;E. Vij;e;e;16;Vila;Gustau Aguilella Arzo;Ángel Esparza Arroyo;K. Lillios;Jennifer Mack;J. Velasco;A. Waterman;Luis Benítez de Lugo Enrich;María Benito;18;Sánchez;B. Agustí;F. Codina;Gabriel de Prado;A. Estalrrich;Álvaro;Fernández;Flores;Clive;Finlayson;Geraldine;Stewart;20;Francisco Giles;Antonio Rosas;V. González;Gabriel García Atiénzar;M. S. H. Pérez;Arm;o Llanos;o;Carrión Marco;Isabel Beneyto;David López;Mar Tormo;A. C. Valera;C. Blasco;Corina Liesau;Patricia Ríos;Joan Daura;Jesús de Pedro Michó;Agustín A Diez Castillo;R. F. Fernández;R. Garrido;V. S. Gonçalves;E. Guerra;Ana Mercedes;26;Herrero;Joaquim Juan;Dani López;S. McClure;Merino Pérez;Arturo Oliver Foix;Montse Borràs;A. Sousa;Manuel Vidal Encinas;D. Kennett;Martin B. Richards;K. Alt;W. Haak;R. Pinhasi;C. Lalueza;David Reich - 通讯作者:
David Reich
Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
Hawkeye:基于深度强化学习的 Android 应用变更目标测试
- DOI:
10.1145/3639477.3639749 - 发表时间:
2023-09-04 - 期刊:
- 影响因子:0
- 作者:
Chao Peng;Zhengwei Lv;Jiarong Fu;Jiayuan Liang;Zhao Zhang;Ajitha Rajan;Ping Yang - 通讯作者:
Ping Yang
Identification of microenvironment‐related genes with prognostic value in clear cell renal cell carcinoma
鉴定对透明细胞肾细胞癌具有预后价值的微环境相关基因
- DOI:
10.1002/jcb.29654 - 发表时间:
2020-01-21 - 期刊:
- 影响因子:4
- 作者:
Zhao Zhang;Zeyan Li;Zhao Liu;Xiang Zhang;Nengwang Yu;Zhonghua Xu - 通讯作者:
Zhonghua Xu
A performance comparison of DRAM memory system optimizations for SMT processors
SMT 处理器的 DRAM 内存系统优化的性能比较
- DOI:
10.1109/hpca.2005.2 - 发表时间:
2005-02-12 - 期刊:
- 影响因子:0
- 作者:
Zhichun Zhu;Zhao Zhang - 通讯作者:
Zhao Zhang
Association Between Sex and Immune-Related Adverse Events During Immune Checkpoint Inhibitor Therapy.
免疫检查点抑制剂治疗期间性别与免疫相关不良事件之间的关联。
- DOI:
10.1093/jnci/djab035 - 发表时间:
2021-03-10 - 期刊:
- 影响因子:0
- 作者:
Ying Jing;Yongchang Zhang;Jing Wang;Kunyan Li;Xue Chen;Jianfu Heng;Qian Gao;Youqiong Ye;Zhao Zhang;Yaoming Liu;Y. Lou;Steven H. Lin;L. Diao;Hong Liu;Xiang Chen;G. Mills;Leng Han - 通讯作者:
Leng Han
Zhao Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhao Zhang', 18)}}的其他基金
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
- 批准号:
2411294 - 财政年份:2024
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
- 批准号:
2340011 - 财政年份:2024
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2401245 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2311766 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2312689 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2401246 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2106661 - 财政年份:2021
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing
合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享
- 批准号:
2008388 - 财政年份:2020
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1643271 - 财政年份:2016
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1514229 - 财政年份:2015
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
相似国自然基金
信号理论视角下的企业社会责任逆向解耦策略研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
“双碳”目标视域下企业社会责任对碳排放的作用机理、实现路径与行为演化研究
- 批准号:
- 批准年份:2022
- 资助金额:45 万元
- 项目类别:面上项目
平台型企业社会责任行为内在驱动机制与能力构建研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
共同富裕目标下企业社会责任的实现路径及绩效研究
- 批准号:72272171
- 批准年份:2022
- 资助金额:45 万元
- 项目类别:面上项目
中资海外旅游企业社会责任的测度及其因果机制研究:以马来西亚为例
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: CSR: Small: Caphammer: A New Security Exploit in Energy Harvesting Systems and its Countermeasures
合作研究:CSR:小型:Caphammer:能量收集系统的新安全漏洞及其对策
- 批准号:
2314680 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Scaling Secure Serverless Computing on Heterogeneous Datacenters
协作研究:CSR:中:在异构数据中心上扩展安全无服务器计算
- 批准号:
2312207 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: MemDrive: Memory-Driven Full-Stack Collaboration for Autonomous Embedded Systems
协作研究:CSR:媒介:MemDrive:自主嵌入式系统的内存驱动全栈协作
- 批准号:
2312397 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Adaptive Environmental Awareness for Collaborative Augmented Reality
协作研究:企业社会责任:媒介:协作增强现实的自适应环境意识
- 批准号:
2312762 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Small: Cross-layer learning-based Energy-Efficient and Resilient NoC design for Multicore Systems
协作研究:CSR:小型:基于跨层学习的多核系统节能和弹性 NoC 设计
- 批准号:
2321225 - 财政年份:2023
- 资助金额:
$ 33.31万 - 项目类别:
Standard Grant