Collaborative Research: Frameworks: Performance Engineering Scientific Applications with MVAPICH and TAU using Emerging Communication Primitives
合作研究:框架:使用新兴通信原语的 MVAPICH 和 TAU 的性能工程科学应用
基本信息
- 批准号:2311830
- 负责人:
- 金额:$ 90万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-01 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Earthquake hazards pose potentially life-threatening risks to communities and cause significant economic damage. Numerical simulations of earthquakes on large-scale supercomputers are emerging as key to guiding the infrastructure and policy decisions as a result of earthquake modeling. These seismic and other codes including simulations involving Fast Fourier Transform (FFT) distribute the processing across a large number of compute nodes in a supercomputer. Optimizing the communication between nodes is key to achieving good performance but it is a daunting task given the scale of execution. The MVAPICH communication library that implements the Message Passing Interface (MPI) and the TAU Performance System, a profiling tool to observe the communication, will be tightly coupled to assess the performance impact of tuning these codes during execution. These libraries will share key performance parameters and optimize the communication in these applications to improve the time to solution. Performance-engineered versions of these codes will help drive the next generation of earthquake forecasting and help improve our understanding of seismic events to reduce risks to population centers and the environment. The research will enable undergraduate and graduate curriculum advancements via research in pedagogy for High Performance Computing (HPC), Deep/Machine Learning, and Data Analytics courses. The results will also be disseminated to the collaborating organizations of the investigators to impact their HPC software applications.Emerging HPC systems---driven by many-core processors and accelerator architectures--- require innovations in existing infrastructure to deliver the best performance for science domains. The MPI 4.0 standard has also brought forward new opportunities for co-designing applications. These include partitioned point-to-point and collective operations, and neighborhood collectives. With these advances, there is a critical need to update the commonly used tools and libraries that form the basis for the NSF’s HPC cyberinfrastructure. The research undertakes this challenge and pursues new performance engineering avenues---by exploiting a co-design approach using the MPI_T API---in the MVAPICH2 and TAU libraries with scientific applications. The project focuses on two popular HPC applications spanning multiple domains and representing various communication patterns - Anelastic Wave Propagation (AWP-ODC) and Highly efficient FFTs for Exascale (heFFTe). AWP-ODC is a highly scalable parallel finite-difference application with point-to-point operations that enables 3D earthquake calculations. HeFFTe, dominated by collective operations, is a massively parallel application that provides a scalable and efficient implementation of the widely used Fast Fourier Transform (FFT) operations. The research aims to investigate and develop the following innovations by co-designing MVAPICH2 and TAU libraries to scale driving science domains---including AWP-ODC and heFFTe: 1) Load-aware designs for MPI asynchronous communication, 2) Cross runtime coordination for MPI+X applications, 3) Partitioned point-to-point primitives, 4) Application-aware neighborhood collective communication, 5) Support for adaptive persistent collective communication, and 6) Coordinating communication kernels on GPUs. Integrated development and evaluation are carried out to ensure proper integration of proposed designs with the driving applications, and closely work with internal and external collaborators to facilitate wide deployment and adoption of the released software. The transformative impact of the proposed effort is to extract the performance and scalability of HPC applications in next-generation HPC architectures through intelligent performance engineering.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
地震危害对社区构成潜在威胁生命的风险,并造成严重的经济损害。大规模超级计算机上的地震的数值模拟是指导基础设施和政策决策的关键,这是地震建模的结果。这些地震和其他代码,包括涉及快速傅立叶变换(FFT)的模拟,在超级计算机中分布了大量计算节点的处理。优化节点之间的通信是实现良好性能的关键,但鉴于执行的规模,这是一项艰巨的任务。实现消息传递接口(MPI)的Mvapich通信库和TAU性能系统(一种观察通信的分析工具)将紧密耦合,以评估执行过程中调整这些代码的性能影响。这些库将共享关键性能参数并优化这些应用程序中的通信,以改善解决方案的时间。这些代码的性能工程版本将有助于推动下一代地震预测,并有助于提高我们对地震事件的理解,以减少人口中心和环境的风险。这项研究将通过高性能计算的教育学研究(HPC),深度/机器学习和数据分析课程来实现本科和研究生课程的进步。结果还将传播给研究人员的合作组织,以影响其HPC软件应用程序。发出由多核处理器和加速器体系结构驱动的HPC系统 - 需要现有基础设施中的创新,以为科学领域提供最佳性能。 MPI 4.0标准还为共同设计应用程序带来了新的机会。这些包括分区的点对点和集体操作以及邻里集体。通过这些进步,迫切需要更新构成NSF HPC Cyberinfrstructure基础的常用工具和库。这项研究提出了这一挑战,并通过使用MPI_T API(在MVAPICH2)和具有科学应用的TAU库中利用共同设计方法来追求新的绩效工程途径。该项目着重于两个流行的HPC应用程序,这些应用程序涵盖了多个域并代表各种通信模式 - 无弹性波传播(AWP -ODC)和Exascale(HEFFTE)的高效FFT。 AWP-ODC是一种具有高度可扩展的并行有限差分应用,其点对点操作可以实现3D地震计算。赫夫特(Heffte)以集体操作为主导,是一个大规模的并行应用,可提供广泛使用的快速傅立叶变换(FFT)操作的可扩展性实现。 The research aims to investigate and develop the following innovations by co-designing MVAPICH2 and TAU libraries to scale driving science domains---includes AWP-ODC and heFFTe: 1) Load-aware designs for MPI asynchronous communication, 2) Cross runtime coordination for MPI+X applications, 3) Partitioned point-to-point primitives, 4) Application-aware neighborhood collective communication, 5) Support for adaptive持续的集体沟通,以及6)在GPU上协调沟通内核。进行集成开发和评估以确保拟议的设计与驾驶应用程序的正确集成,并与内部和外部合作者密切合作,以促进已发布软件的广泛部署和采用。拟议的努力的变革性影响是通过智能绩效工程提取HPC应用程序在下一代HPC体系结构中的性能和可伸缩性。该奖项反映了NSF的法定任务,并通过基金会的知识分子优点和更广泛的影响标准通过评估来诚实地诚实地支持了支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据
数据更新时间:2024-06-01
Dhabaleswar Panda的其他基金
CSR: Small: CONCERT: Designing Scalable Communication Runtimes with On-the-fly Compression for HPC and AI Applications on Heterogeneous Architectures
CSR:小型:CONCERT:为异构架构上的 HPC 和 AI 应用程序设计具有动态压缩的可扩展通信运行时
- 批准号:23129272312927
- 财政年份:2023
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Travel: Student Travel Support for MVAPICH User Group (MUG) 2023 Conference
旅行:MVAPICH 用户组 (MUG) 2023 年会议的学生旅行支持
- 批准号:23312232331223
- 财政年份:2023
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Travel: Student Travel Support for MVAPICH User group (MUG) 2022 Conference
旅行:MVAPICH 用户组 (MUG) 2022 年会议的学生旅行支持
- 批准号:22318252231825
- 财政年份:2022
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
AI Institute for Intelligent CyberInfrastructure with Computational Learning in the Environment (ICICLE)
环境中具有计算学习功能的智能网络基础设施人工智能研究所 (ICICLE)
- 批准号:21126062112606
- 财政年份:2021
- 资助金额:$ 90万$ 90万
- 项目类别:Cooperative AgreementCooperative Agreement
MRI: RADiCAL: Reconfigurable Major Research Cyberinfrastructure for Advanced Computational Data Analytics and Machine Learning
MRI:RADiCAL:用于高级计算数据分析和机器学习的可重构主要研究网络基础设施
- 批准号:20186272018627
- 财政年份:2020
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
OAC Core: Small: Next-Generation Communication and I/O Middleware for HPC and Deep Learning with Smart NICs
OAC 核心:小型:使用智能 NIC 实现 HPC 和深度学习的下一代通信和 I/O 中间件
- 批准号:20079912007991
- 财政年份:2020
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Student Travel Support for MVAPICH User Group (MUG) Meeting
MAPICH 用户组 (MUG) 会议的学生旅行支持
- 批准号:19300031930003
- 财政年份:2019
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems
协作研究:框架:为新兴密集 GPU 系统设计下一代 MPI 库
- 批准号:19315371931537
- 财政年份:2019
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Student Travel Support for MVAPICH User Group (MUG) Meeting
MAPICH 用户组 (MUG) 会议的学生旅行支持
- 批准号:18397391839739
- 财政年份:2018
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
SI2-SSI: FAMII: High Performance and Scalable Fabric Analysis, Monitoring and Introspection Infrastructure for HPC and Big Data
SI2-SSI:FAMII:适用于 HPC 和大数据的高性能和可扩展结构分析、监控和自省基础设施
- 批准号:16641371664137
- 财政年份:2017
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
相似国自然基金
多价框架核酸与CRISPR/Cas协作传感平台研究及三阴性乳腺癌术后监测应用
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
多价框架核酸与CRISPR/Cas协作传感平台研究及三阴性乳腺癌术后监测应用
- 批准号:22204104
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
基于高阶正则化半监督学习的多跟踪器框架模型及融合策略研究
- 批准号:61571362
- 批准年份:2015
- 资助金额:57.0 万元
- 项目类别:面上项目
表示模型框架下高光谱遥感影像分类若干技术研究
- 批准号:61571033
- 批准年份:2015
- 资助金额:57.0 万元
- 项目类别:面上项目
随机几何框架下的多层异构蜂窝网中物理层安全问题研究
- 批准号:61401510
- 批准年份:2014
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: Frameworks: MobilityNet: A Trustworthy CI Emulation Tool for Cross-Domain Mobility Data Generation and Sharing towards Multidisciplinary Innovations
协作研究:框架:MobilityNet:用于跨域移动数据生成和共享以实现多学科创新的值得信赖的 CI 仿真工具
- 批准号:24111522411152
- 财政年份:2024
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
- 批准号:24112972411297
- 财政年份:2024
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
- 批准号:24112982411298
- 财政年份:2024
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:23267142326714
- 财政年份:2024
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: AF: Small: Structural Graph Algorithms via General Frameworks
合作研究:AF:小型:通过通用框架的结构图算法
- 批准号:23473222347322
- 财政年份:2024
- 资助金额:$ 90万$ 90万
- 项目类别:Standard GrantStandard Grant