Collaborative Research: OAC Core: Enabling Extremely Fine-grained Parallelism on Modern Many-core Architectures
合作研究:OAC Core:在现代多核架构上实现极其细粒度的并行性
基本信息
- 批准号:2107548
- 负责人:
- 金额:$ 33.37万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Computer systems are becoming increasingly complex: multisocket systems with many-core processors and general graphic processors have the potential to address the needs of demanding applications at the node level. Programmability and efficiency are often not easy to find together due to the hardware growing several orders of magnitude in degree of parallelism to thousands of computing units on a chip. Task parallelism is an important type of parallelism in which computation is broken down into a set of inter-dependent tasks which can be executed concurrently on various computing units. To achieve strong scaling and high levels of effective parallelism, there is a growing need in today's parallel languages with supporting over-decomposition (many more tasks than cores) in order to improve performance, hide latency caused by blocking operations, and otherwise achieve maximum speedup. By enabling the efficient support of fine-grained parallelism across the growing range of scales seen in modern and future hardware, it is expected that the productivity of parallel programmers will be enhanced. Trends show evidence that most of the Top500 high-performance computing systems will likely employ hardware that this work directly targets. The project aims to conduct a high-impact education program in distributed parallel programming with broad reach, encouraging student internships grounded in real-world challenges, and paving the way for technology transfer from research to open-source projects. Special emphasis is placed on engaging women and underrepresented minorities. This education facet will create a new and more accessible foundation for fluency in parallel computing for scientists and engineers.This work explores novel data-structures and algorithms that allow for scalable runtime and execution models for fine-grained parallelism at sub-microsecond timescales. Preliminary work by the PIs at the language and runtime levels suggests a path to achieving this. The project objectives are: 1) unifying runtime enabling task granularities measured in cycles: design, analysis, and implementation of building blocks for efficient fine-grained computing on diverse node hardware; 2) evaluating performance of these building blocks in the context of real parallel systems and application kernels on a range of computer architectures; 3) measuring performance and scalability impact of runtime on benchmark kernels and real applications; and 4) integrating this research with education programs from undergraduate to graduate levels through new course material on parallel computing. This high-risk/high-reward research is geared towards yielding transformative improvements in the ease and efficiency of programming parallel machines at every scale. The contributions lie in the realization of productive, implicitly parallel high-level languages optimized for single node deployments with many-core architectures to support fine-grained parallelism measured in cycles, enabling an entirely new class of many-task computing applications. The dataflow architecture makes implicit parallelism tractable with a programming model whose impact could rival that of MATLAB, R, and Python, with the added benefit that the same code could also run in a distributed system or large-scale HPC systems. Thus, the scientist would be able to write a program once, run it at any suitable scale, and have it seamlessly use the most appropriate granularity for each component of the hardware. This work’s innovations in dataflow architecture will be broadly applicable to a number of existing parallel programming systems such as OpenMP, Swift/Parsl, and CUDA/OpenCL, in terms of both efficiency in executing fine grained parallelism and adding support for implicit parallelism where possible. Target hardware includes Intel/AMD x86, ThunderX/2 ARM, IBM Power9, and NVIDIA/AMD GPUs.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
计算机系统变得越来越复杂:具有多核处理器和通用图形处理器的多插槽系统有可能满足节点级苛刻应用的需求,由于硬件数量不断增长,可编程性和效率通常不容易同时满足。任务并行性是一种重要的并行性类型,其中计算被分解为一组相互依赖的任务,这些任务可以在不同的计算单元上同时执行。规模化和高水平有效的并行性,当今的并行语言越来越需要支持过度分解(比核心更多的任务),以提高性能,隐藏阻塞操作引起的延迟,并通过启用高效支持来实现最大加速。随着现代和未来硬件中越来越多的规模的细粒度并行性的发展,预计并行程序员的生产力将会提高。趋势表明,大多数 Top500 高性能计算系统可能会采用这样的硬件。直接工作该项目旨在开展具有广泛影响力的分布式并行编程教育计划,鼓励学生基于现实世界的挑战进行实习,并为从研究到开源项目的技术转移铺平道路。这一教育方面将为科学家和工程师流畅地进行并行计算奠定新的、更容易获得的基础。这项工作探索了新颖的数据结构和算法,这些数据结构和算法允许可扩展的运行时和执行模型,以实现细粒度的并行性。在PI 在语言和运行时级别上的初步工作提出了实现这一目标的途径:1) 统一运行时,实现以周期为单位测量的任务粒度:设计、分析和实现构建块以实现高效。不同节点硬件上的细粒度计算;2) 在一系列计算机架构上的真实并行系统和应用程序内核的背景下评估这些构建块的性能;3) 测量运行时的性能和可扩展性影响;基准内核和实际应用;4)通过关于并行计算的新课程材料将这项研究与从本科到研究生的教育计划相结合。这项高风险/高回报的研究旨在实现编程简便性和效率的变革性改进。其贡献在于实现了生产、隐式并行高级语言,针对具有多核架构的单节点部署进行了优化,以支持按周期测量的细粒度并行性,从而实现了全新的多核并行化。 -任务计算应用程序。数据流架构使隐式并行性可以通过编程模型来处理,其影响可以与 MATLAB、R 和 Python 相媲美,并且具有相同的代码也可以在分布式系统或大型 HPC 系统中运行的额外好处。能够编写一次程序,以任何合适的规模运行它,并让它无缝地使用最适合硬件每个组件的粒度。这项工作在数据流架构方面的创新将广泛适用于许多现有的并行编程系统。作为OpenMP、Swift/Parsl 和 CUDA/OpenCL,在执行细粒度并行性方面的效率以及在可能的情况下添加对隐式并行性的支持目标硬件包括 Intel/AMD x86、ThunderX/2 ARM、IBM Power9 和 NVIDIA/AMD。 GPU。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ioan Raicu其他文献
Ioan Raicu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ioan Raicu', 18)}}的其他基金
Collaborative Research: REU Site: BigDataX: From theory to practice in Big Data computing at eXtreme scales
合作研究:REU 网站:BigDataX:极限规模大数据计算从理论到实践
- 批准号:
2150500 - 财政年份:2022
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
REU Site: Collaborative Research: BigDataX: From theory to practice in Big Data computing at eXtreme scales
REU 网站:协作研究:BigDataX:极限规模大数据计算从理论到实践
- 批准号:
1757964 - 财政年份:2018
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
CRI: II-NEW: MYSTIC: Programmable Systems Research Testbed to Explore a Stack-WIde Adaptive System fabriC
CRI:II-新:神秘:探索全栈自适应系统结构的可编程系统研究测试台
- 批准号:
1730689 - 财政年份:2017
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
REU Site: BigDataX: From Theory to Practice in Big Data Computing at Extreme Scales
REU 网站:BigDataX:超大规模大数据计算从理论到实践
- 批准号:
1461260 - 财政年份:2015
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
Student Travel Support for ACM HPDC 2011
ACM HPDC 2011 学生旅行支持
- 批准号:
1114379 - 财政年份:2011
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
CAREER: Avoiding Achilles' Heel in Exascale Computing with Distributed File Systems
职业:使用分布式文件系统避免百亿亿次计算中的致命弱点
- 批准号:
1054974 - 财政年份:2011
- 资助金额:
$ 33.37万 - 项目类别:
Continuing Grant
相似国自然基金
离子型稀土渗流-应力-化学耦合作用机理与溶浸开采优化研究
- 批准号:52364012
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
亲环蛋白调控作物与蚜虫互作分子机制的研究
- 批准号:32301770
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于金属-多酚网络衍生多相吸波体的界面调控及电磁响应机制研究
- 批准号:52302362
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
职场网络闲逛行为的作用结果及其反馈效应——基于行为者和观察者视角的整合研究
- 批准号:72302108
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
EIF6负调控Dicer活性促进EV71复制的分子机制研究
- 批准号:32300133
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 33.37万 - 项目类别:
Standard Grant