SHF: Small: A Scalable Architecture for Ubiquitous Parallelism

SHF：小型：无处不在的并行性的可扩展架构

基本信息

批准号：
1814969
负责人：
Daniel Sanchez Martin
金额：
$ 45万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1814969&HistoricalAwards=false
关键词：
SHF Small Scalable Architecture Ubiquitous

项目摘要

With cost-performance gains predicted by Moore's Law slowing down, future computer systems will need to harness increasing amounts of parallelism to improve performance. Achieving this goal requires new techniques to make massive parallelism practical, as current multicore systems fall short of this goal: they squander most of the parallelism available in applications and are exceedingly hard to program. To address these challenges, this project is investigating a novel parallel architecture that efficiently scales to thousands of cores and is almost as easy to program as sequential systems. It achieves these benefits by exploiting ordered parallelism, which is general and abundant but is hard to mine in current systems. The technologies being investigated will make future parallel systems more versatile, scalable, and easier to program. These techniques will especially benefit hard-to-parallelize irregular applications that are key in emerging domains, such as graph analytics, machine learning, and in-memory databases. The prototyping efforts will bring the benefits of ordered parallelism to existing systems. Finally, the infrastructure developed as part of this project will be released publicly, enabling others to build on the results of this work.Towards the goal of efficiently parallelizing the vast majority of applications while retaining the programming simplicity of sequential systems, this project is investigating and developing the following techniques: (1) distributed data-centric execution, which scales fine-grained ordered parallelism and speculative execution to rack-scale systems with tens of thousands of cores; (2) an expressive execution model that supports seamless combinations of speculative and non-speculative tasks, improving efficiency and parallelism; (3) adaptive speculation and resource management techniques that avoid performance pathologies, reduce wasted work, and make more efficient use of this novel architecture; and (4) an FPGA-based prototype of this architecture that leverages these techniques to exploit ordered parallelism and accelerate important applications. In this architecture, programs consist of tiny tasks with order constraints. The system executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead to uncover ordered parallelism. Tasks are distributed to run close to their data, reducing data movement and allowing the system to scale across multiple chips and boards. An early 256-core design demonstrates near-linear scalability on programs that are often deemed sequential, outperforming state-of-the-art algorithms by one to two orders of magnitude.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着摩尔定律预测的性价比增长放缓，未来的计算机系统将需要利用越来越多的并行性来提高性能。实现这一目标需要新技术来实现大规模并行性，因为当前的多核系统达不到这一目标：它们浪费了应用程序中可用的大部分并行性，并且非常难以编程。为了应对这些挑战，该项目正在研究一种新颖的并行架构，该架构可以有效地扩展到数千个内核，并且几乎与顺序系统一样易于编程。它通过利用有序并行性来实现这些好处，有序并行性是普遍且丰富的，但在当前系统中很难挖掘。正在研究的技术将使未来的并行系统更加通用、可扩展并且更易于编程。这些技术将特别有利于难以并行化的不规则应用程序，这些应用程序是新兴领域的关键，例如图形分析、机器学习和内存数据库。原型设计工作将为现有系统带来有序并行的好处。最后，作为该项目一部分开发的基础设施将公开发布，使其他人能够在这项工作的成果的基础上进行构建。为了实现有效并行化绝大多数应用程序，同时保留顺序系统编程简单性的目标，该项目正在研究并开发以下技术：（1）以数据为中心的分布式执行，将细粒度有序并行性和推测执行扩展到具有数万个核心的机架规模系统；（2）富有表现力的执行模型，支持推测性和非推测性任务的无缝组合，提高效率和并行性； (3) 自适应推测和资源管理技术，可以避免性能异常、减少无用工作并更有效地利用这种新颖的架构； (4) 该架构基于 FPGA 的原型，利用这些技术来利用有序并行性并加速重要应用程序。在这种架构中，程序由具有顺序约束的微小任务组成。该系统以推测性且无序的方式执行任务，并有效地推测未来的数千个任务以发现有序并行性。任务分布在靠近数据的地方运行，减少了数据移动，并允许系统跨多个芯片和板进行扩展。早期的 256 核设计展示了通常被认为是顺序的程序的近线性可扩展性，其性能比最先进的算法高出一到两个数量级。该奖项反映了 NSF 的法定使命，并被认为值得通过以下方式获得支持：使用基金会的智力价值和更广泛的影响审查标准进行评估。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Datamime: Generating Representative Benchmarks by Automatically Synthesizing Datasets

Datamime：通过自动合成数据集生成代表性基准

DOI：
10.1109/micro56248.2022.00082
发表时间：
2022-10-01
期刊：
2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
影响因子：
0
作者：
Hyun Ryong Lee;Daniel Sánchez
通讯作者：
Daniel Sánchez

Chronos: Efficient Speculative Parallelism for Accelerators

Chronos：加速器的高效推测并行性

DOI：
10.1145/3373376.3378454
发表时间：
2020-03
期刊：
Proceedings of the 25th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-25
影响因子：
0
作者：
Abeydeera, Maleen;Sanchez, Daniel
通讯作者：
Sanchez, Daniel

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism

协调有序并行架构中的推测和非推测执行

DOI：
10.1109/micro.2018.00026
发表时间：
2018-10
期刊：
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51
影响因子：
0
作者：
Jeffrey, Mark C.;Ying, Victor A.;Subramanian, Suvinay;Lee, Hyun Ryong;Emer, Joel;Sanchez, Daniel
通讯作者：
Sanchez, Daniel