Collaborative Research: OAC Core: Enabling Extremely Fine-grained Parallelism on Modern Many-core Architectures
合作研究:OAC Core:在现代多核架构上实现极其细粒度的并行性
基本信息
- 批准号:2107283
- 负责人:
- 金额:$ 16.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Computer systems are becoming increasingly complex: multisocket systems with many-core processors and general graphic processors have the potential to address the needs of demanding applications at the node level. Programmability and efficiency are often not easy to find together due to the hardware growing several orders of magnitude in degree of parallelism to thousands of computing units on a chip. Task parallelism is an important type of parallelism in which computation is broken down into a set of inter-dependent tasks which can be executed concurrently on various computing units. To achieve strong scaling and high levels of effective parallelism, there is a growing need in today's parallel languages with supporting over-decomposition (many more tasks than cores) in order to improve performance, hide latency caused by blocking operations, and otherwise achieve maximum speedup. By enabling the efficient support of fine-grained parallelism across the growing range of scales seen in modern and future hardware, it is expected that the productivity of parallel programmers will be enhanced. Trends show evidence that most of the Top500 high-performance computing systems will likely employ hardware that this work directly targets. The project aims to conduct a high-impact education program in distributed parallel programming with broad reach, encouraging student internships grounded in real-world challenges, and paving the way for technology transfer from research to open-source projects. Special emphasis is placed on engaging women and underrepresented minorities. This education facet will create a new and more accessible foundation for fluency in parallel computing for scientists and engineers.This work explores novel data-structures and algorithms that allow for scalable runtime and execution models for fine-grained parallelism at sub-microsecond timescales. Preliminary work by the PIs at the language and runtime levels suggests a path to achieving this. The project objectives are: 1) unifying runtime enabling task granularities measured in cycles: design, analysis, and implementation of building blocks for efficient fine-grained computing on diverse node hardware; 2) evaluating performance of these building blocks in the context of real parallel systems and application kernels on a range of computer architectures; 3) measuring performance and scalability impact of runtime on benchmark kernels and real applications; and 4) integrating this research with education programs from undergraduate to graduate levels through new course material on parallel computing. This high-risk/high-reward research is geared towards yielding transformative improvements in the ease and efficiency of programming parallel machines at every scale. The contributions lie in the realization of productive, implicitly parallel high-level languages optimized for single node deployments with many-core architectures to support fine-grained parallelism measured in cycles, enabling an entirely new class of many-task computing applications. The dataflow architecture makes implicit parallelism tractable with a programming model whose impact could rival that of MATLAB, R, and Python, with the added benefit that the same code could also run in a distributed system or large-scale HPC systems. Thus, the scientist would be able to write a program once, run it at any suitable scale, and have it seamlessly use the most appropriate granularity for each component of the hardware. This work’s innovations in dataflow architecture will be broadly applicable to a number of existing parallel programming systems such as OpenMP, Swift/Parsl, and CUDA/OpenCL, in terms of both efficiency in executing fine grained parallelism and adding support for implicit parallelism where possible. Target hardware includes Intel/AMD x86, ThunderX/2 ARM, IBM Power9, and NVIDIA/AMD GPUs.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
计算机系统变得越来越复杂:具有多核处理器和通用图形处理器的多插槽系统有可能满足节点级苛刻应用的需求,由于硬件数量不断增长,可编程性和效率通常不容易同时满足。任务并行性是一种重要的并行性类型,其中计算被分解为一组相互依赖的任务,这些任务可以在不同的计算单元上同时执行。规模化和高水平有效的并行性,当今的并行语言越来越需要支持过度分解(比核心更多的任务),以提高性能,隐藏阻塞操作引起的延迟,并通过启用高效支持来实现最大加速。随着现代和未来硬件中越来越多的规模的细粒度并行性的发展,预计并行程序员的生产力将会提高。趋势表明,大多数 Top500 高性能计算系统可能会采用这样的硬件。直接工作该项目旨在开展具有广泛影响力的分布式并行编程教育计划,鼓励学生基于现实世界的挑战进行实习,并为从研究到开源项目的技术转移铺平道路。这一教育方面将为科学家和工程师流畅地进行并行计算奠定新的、更容易获得的基础。这项工作探索了新颖的数据结构和算法,这些数据结构和算法允许可扩展的运行时和执行模型,以实现细粒度的并行性。在PI 在语言和运行时级别上的初步工作提出了实现这一目标的途径:1) 统一运行时,实现以周期为单位测量的任务粒度:设计、分析和实现构建块以实现高效。不同节点硬件上的细粒度计算;2) 在一系列计算机架构上的真实并行系统和应用程序内核的背景下评估这些构建块的性能;3) 测量运行时的性能和可扩展性影响;基准内核和实际应用;4)通过关于并行计算的新课程材料将这项研究与从本科到研究生的教育计划相结合。这项高风险/高回报的研究旨在实现编程简便性和效率的变革性改进。其贡献在于实现了生产、隐式并行高级语言,针对具有多核架构的单节点部署进行了优化,以支持按周期测量的细粒度并行性,从而实现了全新的多核并行化。 -任务计算应用程序。数据流架构使隐式并行性可以通过编程模型来处理,其影响可以与 MATLAB、R 和 Python 相媲美,并且具有相同的代码也可以在分布式系统或大型 HPC 系统中运行的额外好处。能够编写一次程序,以任何合适的规模运行它,并让它无缝地使用最适合硬件每个组件的粒度。这项工作在数据流架构方面的创新将广泛适用于许多现有的并行编程系统。作为OpenMP、Swift/Parsl 和 CUDA/OpenCL,在执行细粒度并行性方面的效率以及在可能的情况下添加对隐式并行性的支持目标硬件包括 Intel/AMD x86、ThunderX/2 ARM、IBM Power9 和 NVIDIA/AMD。 GPU。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures
通过现代多核架构上的可扩展并发队列实现极其细粒度的并行性
- DOI:10.1109/mascots53633.2021.9614292
- 发表时间:2021-11
- 期刊:
- 影响因子:0
- 作者:Nookala, Poornima;Dinda, Peter;Hale, Kyle C.;Chard, Kyle;Raicu, Ioan
- 通讯作者:Raicu, Ioan
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kyle Chard其他文献
A Distributed Economic Meta-scheduler for the Grid
网格的分布式经济元调度器
- DOI:
10.1109/ccgrid.2008.48 - 发表时间:
2008-05-19 - 期刊:
- 影响因子:0
- 作者:
Kyle Chard;K. Bubendorfer - 通讯作者:
K. Bubendorfer
QoS-aware edge AI placement and scheduling with multiple implementations in FaaS-based edge computing
基于 FaaS 的边缘计算中具有多种实现的 QoS 感知边缘 AI 布局和调度
- DOI:
10.1016/j.future.2024.03.035 - 发表时间:
2024-03-01 - 期刊:
- 影响因子:0
- 作者:
Nathaniel Hudson;Hana Khamfroush;Matt Baughman;D. Lucani;Kyle Chard;Ian T. Foster - 通讯作者:
Ian T. Foster
SECRE: Surrogate-Based Error-Controlled Lossy Compression Ratio Estimation Framework
SECRE:基于代理的误差控制有损压缩比估计框架
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Arham Khan;S. Di;Kai Zhao;Jinyang Liu;Kyle Chard;Ian T. Foster;Franck Cappello - 通讯作者:
Franck Cappello
Regulating Traffic in a Crowded Cache: Overcoming the Container Explosion Problem
调节拥挤缓存中的流量:克服容器爆炸问题
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Kevin Gao;Tim Shaffer;Kyle Chard - 通讯作者:
Kyle Chard
Using Facebook as a Cloud Platform for Solving Numerical Optimization Problem
使用 Facebook 作为解决数值优化问题的云平台
- DOI:
10.5120/8739-3197 - 发表时间:
2012-10-20 - 期刊:
- 影响因子:0
- 作者:
M. R. Islam;S. Mahi;Abu Sina;Mohammad Raju Chowdhury;Kyle Chard;Simon Caton;Omer Rana;K. Bubendorfer;O. Mengshoel;David E. Goldberg - 通讯作者:
David E. Goldberg
Kyle Chard的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kyle Chard', 18)}}的其他基金
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2311769 - 财政年份:2023
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: Sustainability: A Community-Centered Approach for Supporting and Sustaining Parsl
合作研究:可持续性:以社区为中心的支持和维持 Parsl 的方法
- 批准号:
2209919 - 财政年份:2022
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: REU Site: BigDataX: From theory to practice in Big Data computing at eXtreme scales
合作研究:REU 网站:BigDataX:极限规模大数据计算从理论到实践
- 批准号:
2150501 - 财政年份:2022
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Frameworks: Collaborative Research: ChronoLog: A High-Performance Storage Infrastructure for Activity and Log Workloads
框架:协作研究:ChronoLog:用于活动和日志工作负载的高性能存储基础架构
- 批准号:
2104008 - 财政年份:2021
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Frameworks: Collaborative Research: ChronoLog: A High-Performance Storage Infrastructure for Activity and Log Workloads
框架:协作研究:ChronoLog:用于活动和日志工作负载的高性能存储基础架构
- 批准号:
2104008 - 财政年份:2021
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
CCRI: Planning: Collaborative Research: Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Management Systems
CCRI:规划:协作研究:支持科学工作流程管理系统系统开发和研究的基础设施
- 批准号:
2016682 - 财政年份:2020
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
CCRI: Planning: Collaborative Research: Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Management Systems
CCRI:规划:协作研究:支持科学工作流程管理系统系统开发和研究的基础设施
- 批准号:
2016682 - 财政年份:2020
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
CSR: Small: Cost-Aware Cloud Profiling, Prediction, and Provisioning as a Service
CSR:小:具有成本意识的云分析、预测和配置即服务
- 批准号:
1816611 - 财政年份:2018
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
REU Site: Collaborative Research: BigDataX: From theory to practice in Big Data computing at eXtreme scales
REU 网站:协作研究:BigDataX:极限规模大数据计算从理论到实践
- 批准号:
1757970 - 财政年份:2018
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: SI2-SSI: Swift/E: Integrating Parallel Scripted Workflow into the Scientific Software Ecosystem
协作研究:SI2-SSI:Swift/E:将并行脚本工作流程集成到科学软件生态系统中
- 批准号:
1550588 - 财政年份:2016
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
相似国自然基金
基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
- 批准号:82304250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
肠道普拉梭菌代谢物丁酸抑制心室肌铁死亡改善老龄性心功能不全的机制研究
- 批准号:82300430
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
社会网络关系对公司现金持有决策影响——基于共御风险的作用机制研究
- 批准号:72302067
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向图像目标检测的新型弱监督学习方法研究
- 批准号:62371157
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
面向开放域对话系统信息获取的准确性研究
- 批准号:62376067
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403088 - 财政年份:2024
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403090 - 财政年份:2024
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402946 - 财政年份:2024
- 资助金额:
$ 16.63万 - 项目类别:
Standard Grant