SI2-SSI: FAMII: High Performance and Scalable Fabric Analysis, Monitoring and Introspection Infrastructure for HPC and Big Data

SI2-SSI:FAMII:适用于 HPC 和大数据的高性能和可扩展结构分析、监控和自省基础设施

基本信息

  • 批准号:
    1664137
  • 负责人:
  • 金额:
    $ 80万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-07-01 至 2020-06-30
  • 项目状态:
    已结题

项目摘要

As the computing, networking, heterogeneous hardware, and storagetechnologies continue to evolve in High-End Computing (HEC) platforms,it becomes increasingly essential and challenging to understand theinteractions between time-critical High-Performance Computing (HPC)and Big Data applications, the software infrastructures upon whichthey rely for achieving high-performing portable solutions, theunderlying communication fabric these high-performance middlewaresdepend on and the schedulers that manage HPC clusters. Suchunderstanding will enable all involved parties (applicationdevelopers/users, system administrators, and middleware developers) tomaximize the efficiency and performance of the individual componentsthat comprise a modern HPC system and solve different grand challengeproblems. There is a clear need and unfortunate lack of a high-performance andscalable tool that is capable of analyzing and correlating thecommunication on the fabric with the behavior of HPC/Big Dataapplications, underlying middleware and the job scheduler on existinglarge HPC systems. The proposed synergistic and collaborative effort,undertaken by a team of computer and computational scientists from OSUand OSC, aims to create an integrated software infrastructure for high-performance and scalable Fabric Analysis, Monitoring andIntrospection for HPC and Big Data. This tool will achieve thefollowing objectives: 1) be portable, easy to use and easy tounderstand, 2) have high performance and scalable rendering andstorage techniques and, 3) be applicable to the differentcommunication fabrics and programming models that are likely to beused on existing large HPC systems and emerging exascale systems. Thetransformative impact of the proposed research and development effortis to design a comprehensive analysis and performance monitoring toolfor applications of current and next generation multipetascale/exascale systems to harness the maximum performance andscalability.The proposed research and the associated infrastructure will have asignificant impact on enabling optimizations of HPC and Big Dataapplications that have previously been difficult to provide. Thesepotential outcomes will be demonstrated by using the proposedframework to validate a variety of HPC and Big Data benchmarks andapplications under multiple scenarios. The integrated middleware andtools will be made publicly available to the community through publicrepositories and publications in the top forums, enabling other MPIand Big Data stacks to adopt the designs. Research results will alsobe disseminated to the collaborating organizations of theinvestigators to impact their HPC software products andapplications. The proposed research directions and their solutionswill be used in the curriculum of the PIs to train undergraduate andgraduate students, including under-represented minorities and femalestudents. The technical challenges addressed by the proposal include: 1)Scalable visualization of large and complex HEC networks so as toprovide a near instant rendering to end users, 2) A generalized datagathering scheme which is easily portable to multiple communicationfabrics, novel compute architectures and high-performance middleware,3) Enhanced data storage performance through optimized databaseschemas and the use of memory-backed key value stores/databases, 4)Support in MPI, PGAS, and Big Data libraries to enable the proposedmonitoring, analysis, and introspection framework, and 5) Enablingdeeper introspection of particular regions of application. Theresearch will also be driven by a set of HPC and Big Dataapplications. The transformative impact of the proposed research anddevelopment effort is to design a comprehensive analysis andperformance monitoring tool for applications of current and nextgeneration multi petascale/exascale systems to harness the maximumperformance and scalability.
随着计算,网络,异构硬件和储藏技术在高端计算(HEC)平台中继续发展,了解时间关键的高性能计算(HPC)之间的互相越来越重要,具有挑战性MiddleWaresDecon和管理HPC群集的调度程序。 这样的理解将使所有相关方(应用程序发展者/用户,系统管理员和中间件开发人员)都可以使各个组件的效率和性能构成现代HPC系统的效率和性能,并解决了不同的大挑战问题。明显的需求和不幸的是缺乏高性能和尺度的工具,该工具能够分析和将织物上的沟通与HPC/Big DataApplications的行为,基础中间件和现有LARARGE HPC系统的工作调度程序的行为相关联。 由OSUAND OSC的计算机和计算科学家团队进行的拟议协同和协作努力旨在创建用于高性能和可扩展结构分析的集成软件基础架构,对HPC和大数据进行监视和兴建。该工具将实现目标的目标:1)便携式,易于使用且易于使用,2)具有高性能和可扩展的渲染和存储技术,以及3)适用于不同的通信织物和编程模型,这些模型可能会在现有的大型HPC系统和新兴的Exascale Systems上使用。 拟议的研发和开发工作对设计全面的分析和性能监控工具的影响,以实现当前和下一代多曲面/Exascale系统的应用,以利用最高的性能和尺度性。拟议的研究和相关基础架构将对以前提供的HPC和Big Data -applications的优化有明显的影响。通过使用建议的Framework在多种情况下验证各种HPC和大数据基准和应用程序,将证明这些结果的结果。 集成的中间件和工具将通过顶级论坛中的公共事件和出版物公开向社区公开使用,从而使其他MPIAND大数据堆栈能够采用这些设计。 研究结果将使ALSOBE传播给评估者的合作组织,以影响其HPC软件产品和应用程序。拟议的研究指示及其解决方案将用于PI的课程中,以培训本科生和研究生,包括代表性不足的少数民族和女性。该提案解决的技术挑战包括:1)大型且复杂的HEC网络的可伸缩可视化,以便将最终用户的近乎即时渲染呈现,2)一种通用数据管理方案,可轻松地用于多个通信范围,新型通信,新颖的计算架构架构和高性能中间件的储存量,3)增强的数据库,3)储存4),3)储存4),3)储备4) MPI,PGA和大数据库可以实现提议的监控,分析和内省框架,以及5)启用特定应用区域的内省剂。 搜索也将由一组HPC和大数据应用程序驱动。拟议的研究和开发工作的变革性影响是为当前和下一代多Petascale/Exascale系统的应用设计全面的分析和绩效监控工具,以利用最大的性能和可扩展性。

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems
大规模 HPC 系统上 MPI 的可扩展的基于网络的性能分析工具
C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters
设计用于对高性能 GPU 集群进行可扩展和深入分析的分析和可视化工具
  • DOI:
    10.1109/hipc.2019.00022
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kousha, Pouya;Ramesh, Bharath;Kandadi Suresh, Kaushik;Chu, Ching-Hsiang;Jain, Arpan;Sarkauskas, Nick;Subramoni, Hari;Panda, Dhabaleswar K.
  • 通讯作者:
    Panda, Dhabaleswar K.
UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures
  • DOI:
    10.1007/978-3-030-32813-9_18
  • 发表时间:
    2018-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haiyang Shi;Xiaoyi Lu;D. Panda
  • 通讯作者:
    Haiyang Shi;Xiaoyi Lu;D. Panda
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dhabaleswar Panda其他文献

Dhabaleswar Panda的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dhabaleswar Panda', 18)}}的其他基金

CSR: Small: CONCERT: Designing Scalable Communication Runtimes with On-the-fly Compression for HPC and AI Applications on Heterogeneous Architectures
CSR:小型:CONCERT:为异构架构上的 HPC 和 AI 应用程序设计具有动态压缩的可扩展通信运行时
  • 批准号:
    2312927
  • 财政年份:
    2023
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Travel: Student Travel Support for MVAPICH User Group (MUG) 2023 Conference
旅行:MVAPICH 用户组 (MUG) 2023 年会议的学生旅行支持
  • 批准号:
    2331223
  • 财政年份:
    2023
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Performance Engineering Scientific Applications with MVAPICH and TAU using Emerging Communication Primitives
合作研究:框架:使用新兴通信原语的 MVAPICH 和 TAU 的性能工程科学应用
  • 批准号:
    2311830
  • 财政年份:
    2023
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Travel: Student Travel Support for MVAPICH User group (MUG) 2022 Conference
旅行:MVAPICH 用户组 (MUG) 2022 年会议的学生旅行支持
  • 批准号:
    2231825
  • 财政年份:
    2022
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
AI Institute for Intelligent CyberInfrastructure with Computational Learning in the Environment (ICICLE)
环境中具有计算学习功能的智能网络基础设施人工智能研究所 (ICICLE)
  • 批准号:
    2112606
  • 财政年份:
    2021
  • 资助金额:
    $ 80万
  • 项目类别:
    Cooperative Agreement
MRI: RADiCAL: Reconfigurable Major Research Cyberinfrastructure for Advanced Computational Data Analytics and Machine Learning
MRI:RADiCAL:用于高级计算数据分析和机器学习的可重构主要研究网络基础设施
  • 批准号:
    2018627
  • 财政年份:
    2020
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
OAC Core: Small: Next-Generation Communication and I/O Middleware for HPC and Deep Learning with Smart NICs
OAC 核心:小型:使用智能 NIC 实现 HPC 和深度学习的下一代通信和 I/O 中间件
  • 批准号:
    2007991
  • 财政年份:
    2020
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Student Travel Support for MVAPICH User Group (MUG) Meeting
MAPICH 用户组 (MUG) 会议的学生旅行支持
  • 批准号:
    1930003
  • 财政年份:
    2019
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems
协作研究:框架:为新兴密集 GPU 系统设计下一代 MPI 库
  • 批准号:
    1931537
  • 财政年份:
    2019
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
Student Travel Support for MVAPICH User Group (MUG) Meeting
MAPICH 用户组 (MUG) 会议的学生旅行支持
  • 批准号:
    1839739
  • 财政年份:
    2018
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant

相似国自然基金

考虑SSI效应的导管架式海洋平台抗震性能研究
  • 批准号:
    52208510
  • 批准年份:
    2022
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
考虑SSI效应的导管架式海洋平台抗震性能研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
考虑SSI的层间隔震高层建筑结构在三维地震下的响应研究
  • 批准号:
    52168072
  • 批准年份:
    2021
  • 资助金额:
    35 万元
  • 项目类别:
    地区科学基金项目
考虑SSI效应的大型储罐动力学特性及其隔板减晃研究
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    61 万元
  • 项目类别:
    面上项目
考虑SSI效应的摇摆墙-框架结构抗震机理及性能评估方法研究
  • 批准号:
  • 批准年份:
    2019
  • 资助金额:
    60 万元
  • 项目类别:
    面上项目

相似海外基金

小学校理科における合意形成能力の育成 ーSSIが関わる内容を対象としてー
培养小学科学中建立共识的技能 - 针对与 SSI 相关的内容 -
  • 批准号:
    24H02435
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Grant-in-Aid for Encouragement of Scientists
Perioperative Oral Care in Head and Neck Cancer Patients for Prevention of Infection
头颈癌患者围术期口腔护理预防感染
  • 批准号:
    22H03389
  • 财政年份:
    2022
  • 资助金额:
    $ 80万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
The DECREASE SSI Trial (Decolonization to Reduce After-Surgery Events of Surgical Site Infection)
DECREASE SSI 试验(去殖民化以减少手术部位感染的术后事件)
  • 批准号:
    10670860
  • 财政年份:
    2022
  • 资助金额:
    $ 80万
  • 项目类别:
Design of science teaching strategies aimed at implementing 'integration' in the Society 5.0 era
社会5.0时代“融合”的科学教学策略设计
  • 批准号:
    22K03006
  • 财政年份:
    2022
  • 资助金额:
    $ 80万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The DECREASE SSI Trial (Decolonization to Reduce After-Surgery Events of Surgical Site Infection)
DECREASE SSI 试验(去殖民化以减少手术部位感染的术后事件)
  • 批准号:
    10501944
  • 财政年份:
    2022
  • 资助金额:
    $ 80万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了