SHF: Small: Addressing Challenges for the Next Decade of Massively Parallel NUMA Accelerators
SHF:小型:应对大规模并行 NUMA 加速器未来十年的挑战
基本信息
- 批准号:1910924
- 负责人:
- 金额:$ 49.54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2023-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The physical and economic principles that enabled Dennard scaling and Moore's law in the semiconductor industry have reached their breaking point. However, as the number of transistors economically fabricated on a single chip plateaus, the processor industry has pivoted to create single-package computing systems, composed of multiple sub-components known as chiplets. Chiplets, which communicate via high-bandwidth on-package networks, offer the potential for transparent performance scaling into the next decade. However, chiplets introduce challenging non-uniform memory access characteristics into single-package systems that have traditionally not been subject to these effects. This project develops techniques to overcome the challenges of non-uniform memory accesses on high-performance single- and multi-package systems without programmer intervention. Exploring programmer-transparent scaling mechanisms improves the portability and lifetime of programs, decreasing the cost and complexity of software. Through the creation of course content and undergraduate summer internships, the project fosters an understanding of how to program machines in a post-Moore world and how compute accelerators should be designed to minimize the impact on the end-programmer as system complexity increases.This project develops coordinated data placement and thread scheduling algorithms that leverage static information from the compiler and dynamic information from the runtime system to inform data placement and hardware-based thread scheduling. It advances the state-of-the-art by developing an open-source Graphic Processing Unit (GPU) simulator with a hierarchical interconnect that can be used to model both chiplet-based GPUs and multi-GPU systems. The researchers are exploring compiler informed data placement and thread scheduling in GPUs. Initial results demonstrate that a static analysis of the code can predict the data accessed by GPU threadblocks. Analysis shows that it is possible to determine which threads in a grid share memory pages, and the manner of that sharing, by building new static techniques that add an additional dimension to decades of work on compilers for sequential code. Using static information, in combination with runtime information provided by GPU drivers, the researchers are developing advanced data placement, prefetching, and thread scheduling algorithms. Both future chiplet-based designs and existing multi-GPU systems benefit from the development of these algorithms. Looking beyond the high-bandwidth memory used in GPUs today the project explores the system-level implications of heterogeneous memory in a chiplet-based system. Data placement and thread scheduling have even more importance in GPU systems of the future that make use of high bandwidth memory, traditional dynamic random-access memory, and non-volatile memory. The problem sizes in such systems are anticipated to be so large that opportunistic data placement and thread scheduling are even more critical than in conventional systems. The project uses sharing patterns based on the inter-kernel producer-consumer nature of machine learning workloads to change the program's code layout, runtime data placement, and threadblock scheduling algorithm to maximize locality in multi-node systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在半导体行业中实现登纳德缩放定律和摩尔定律的物理和经济原理已经达到了临界点。然而,随着在单芯片上经济地制造的晶体管数量趋于稳定,处理器行业已转向创建单封装计算系统,该系统由多个称为小芯片的子组件组成。小芯片通过高带宽封装网络进行通信,为未来十年的透明性能扩展提供了潜力。然而,小芯片将具有挑战性的非均匀存储器访问特性引入到传统上不受这些影响的单封装系统中。该项目开发的技术可克服高性能单封装和多封装系统上非均匀内存访问的挑战,而无需程序员干预。探索程序员透明的扩展机制可以提高程序的可移植性和生命周期,从而降低软件的成本和复杂性。通过创建课程内容和本科生暑期实习,该项目促进了人们对如何在后摩尔世界中对机器进行编程以及如何设计计算加速器以最大限度地减少系统复杂性增加时对最终程序员的影响的理解。该项目开发协调的数据放置和线程调度算法,利用来自编译器的静态信息和来自运行时系统的动态信息来通知数据放置和基于硬件的线程调度。它通过开发具有分层互连的开源图形处理单元 (GPU) 模拟器来推进最先进的技术,该模拟器可用于对基于小芯片的 GPU 和多 GPU 系统进行建模。研究人员正在探索 GPU 中编译器通知的数据放置和线程调度。初步结果表明,代码的静态分析可以预测 GPU 线程块访问的数据。分析表明,通过构建新的静态技术,可以确定网格中的哪些线程共享内存页面以及共享的方式,这些技术为序列代码编译器数十年的工作增加了额外的维度。研究人员正在使用静态信息与 GPU 驱动程序提供的运行时信息相结合来开发高级数据放置、预取和线程调度算法。未来基于小芯片的设计和现有的多 GPU 系统都受益于这些算法的开发。除了当今 GPU 中使用的高带宽内存之外,该项目还探讨了基于小芯片的系统中异构内存的系统级影响。数据放置和线程调度在未来使用高带宽内存、传统动态随机存取内存和非易失性内存的 GPU 系统中更加重要。预计此类系统中的问题规模非常大,以至于机会性数据放置和线程调度比传统系统更为重要。该项目使用基于机器学习工作负载的内核间生产者-消费者性质的共享模式来改变程序的代码布局、运行时数据放置和线程块调度算法,以最大限度地提高多节点系统中的局部性。该奖项反映了 NSF 的法定使命和通过使用基金会的智力价值和更广泛的影响审查标准进行评估,该项目被认为值得支持。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Deterministic Atomic Buffering
确定性原子缓冲
- DOI:10.1109/micro50266.2020.00083
- 发表时间:2020-10
- 期刊:
- 影响因子:0
- 作者:Chou, Yuan Hsi;Ng, Christopher;Cattell, Shaylin;Intan, Jeremy;Sinclair, Matthew D.;Devietti, Joseph;Rogers, Timothy G.;Aamodt, Tor M.
- 通讯作者:Aamodt, Tor M.
AccelWattch: A Power Modeling Framework for Modern GPUs
AccelWattch:现代 GPU 的功耗建模框架
- DOI:10.1145/3466752.3480063
- 发表时间:2021-10-17
- 期刊:
- 影响因子:0
- 作者:Vijay K;iah;iah;Scott Peverelle;Mahmoud Khairy;Junrui Pan;Amogh Manjunath;Timothy G. Rogers;Tor M. Aamodt;Nikolaos Hardavellas
- 通讯作者:Nikolaos Hardavellas
Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads
主要内核分析:模拟扩展 GPU 工作负载的易处理方法
- DOI:10.1145/3466752.3480100
- 发表时间:2021-10-17
- 期刊:
- 影响因子:0
- 作者:Cesar Avalos Baddouh;Mahmoud Khairy;Rol;N. Green;Mathias Payer;Timothy G. Rogers
- 通讯作者:Timothy G. Rogers
Locality-Centric Data and Threadblock Management for Massive GPUs
海量 GPU 的以局部为中心的数据和线程块管理
- DOI:10.1109/micro50266.2020.00086
- 发表时间:2020-10-01
- 期刊:
- 影响因子:0
- 作者:Mahmoud Khairy;Vadim Nikiforov;D. Nellans;Timothy G. Rogers
- 通讯作者:Timothy G. Rogers
Mitigating GPU Core Partitioning Performance Effects
减轻 GPU 核心分区性能影响
- DOI:10.1109/hpca56546.2023.10070957
- 发表时间:2023-02-01
- 期刊:
- 影响因子:0
- 作者:Aaron Barnes;Fangjia Shen;Timothy G. Rogers
- 通讯作者:Timothy G. Rogers
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Timothy Rogers其他文献
Efficiently Learning Relative Similarity Embeddings with Crowdsourcing
通过众包有效学习相对相似性嵌入
- DOI:
10.21105/joss.04517 - 发表时间:
2023-04-17 - 期刊:
- 影响因子:0
- 作者:
Scott Sievert;R. Nowak;Timothy Rogers - 通讯作者:
Timothy Rogers
Assessing Timely Presentation to Care Among People Diagnosed with HIV During Hospital Admission: A Population-Based Study in Ontario, Canada
评估入院期间诊断出的艾滋病毒患者的及时护理:加拿大安大略省的一项基于人群的研究
- DOI:
10.1007/s10461-018-2063-z - 发表时间:
2018-03-13 - 期刊:
- 影响因子:4.4
- 作者:
C. Kendall;Esther S Shoemaker;J. Raboud;A. Mark;A. Bayoumi;A. Burchell;M. Loutfy;S. Rourke;C. Liddy;R. Rosenes;Timothy Rogers;T. Antoniou - 通讯作者:
T. Antoniou
Cause-specific mortality among HIV-infected people in Ontario, 1995-2014: a population-based retrospective cohort study.
1995-2014 年安大略省艾滋病毒感染者的特定原因死亡率:基于人群的回顾性队列研究。
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
A. Burchell;J. Raboud;J. Donelle;M. Loutfy;S. Rourke;Timothy Rogers;R. Rosenes;C. Liddy;C. Kendall - 通讯作者:
C. Kendall
The BEST study--a prospective study to compare business class versus economy class air travel as a cause of thrombosis.
最佳研究——一项比较商务舱和经济舱航空旅行作为血栓形成原因的前瞻性研究。
- DOI:
10.7196/samj.2256 - 发表时间:
2003-07-01 - 期刊:
- 影响因子:0
- 作者:
B. Jacobson;M. Münster;Alberto Smith;K. Burnand;Andrew Carter;A. Abdool‐Carrim;E. Marcos;P. Becker;Timothy Rogers;D. le Roux;J. Calvert;M. Nel;Robyn Brackin;M. Veller - 通讯作者:
M. Veller
Analyzing the Communication Gap Between the Instructional Design Consultant and the Faculty Member in the Design and Development Process of a Web-Based Course
分析网络课程设计和开发过程中教学设计顾问和教师之间的沟通差距
- DOI:
- 发表时间:
2010-09-21 - 期刊:
- 影响因子:0
- 作者:
Timothy Rogers - 通讯作者:
Timothy Rogers
Timothy Rogers的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Timothy Rogers', 18)}}的其他基金
Autonomous Modelling Solutions for Operational Structural Dynamic Systems
运行结构动态系统的自主建模解决方案
- 批准号:
EP/W002140/1 - 财政年份:2022
- 资助金额:
$ 49.54万 - 项目类别:
Research Grant
CAREER: Accessible Accelerators: Leveraging Productive Software on Efficient Hardware
职业:无障碍加速器:在高效硬件上利用高效软件
- 批准号:
1943379 - 财政年份:2020
- 资助金额:
$ 49.54万 - 项目类别:
Continuing Grant
相似国自然基金
小分子代谢物Catechin与TRPV1相互作用激活外周感觉神经元介导尿毒症瘙痒的机制研究
- 批准号:82371229
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
DHEA抑制小胶质细胞Fis1乳酸化修饰减轻POCD的机制
- 批准号:82301369
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
异常激活的小胶质细胞通过上调CTSS抑制微血管特异性因子MFSD2A表达促进1型糖尿病视网膜病变的免疫学机制研究
- 批准号:82370827
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
SETDB1调控小胶质细胞功能及参与阿尔茨海默病发病机制的研究
- 批准号:82371419
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
PTBP1驱动H4K12la/BRD4/HIF1α复合物-PKM2正反馈环路促进非小细胞肺癌糖代谢重编程的机制研究及治疗方案探索
- 批准号:82303616
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Beat Extreme: An Interactive, Tailored Text Messaging Program Combining Extreme Weather Alerts with Hyper-localized Resources & Actionable Insights for Addressing Climate Change
Beat Extreme:一款将极端天气警报与超本地化资源相结合的交互式定制短信程序
- 批准号:
10698887 - 财政年份:2023
- 资助金额:
$ 49.54万 - 项目类别:
HealthyU-Latinx: A Technology-based Tool for addressing Health Literacy in Latinx Secondary Students and their Families
HealthyU-Latinx:一种基于技术的工具,用于提高拉丁裔中学生及其家庭的健康素养
- 批准号:
10699830 - 财政年份:2023
- 资助金额:
$ 49.54万 - 项目类别:
CPS: Small: Infusing Quantum Computing, Decomposition, and Learning for Addressing Cyber-Physical Systems Optimization Challenges
CPS:小型:融合量子计算、分解和学习来应对网络物理系统优化挑战
- 批准号:
2312086 - 财政年份:2023
- 资助金额:
$ 49.54万 - 项目类别:
Standard Grant
Addressing the wireless power problem: A low-power hybrid radio for neuroscience experiments
解决无线电源问题:用于神经科学实验的低功耗混合无线电
- 批准号:
10697023 - 财政年份:2023
- 资助金额:
$ 49.54万 - 项目类别:
Addressing the wireless power problem: A low-power hybrid radio for neuroscience experiments
解决无线电源问题:用于神经科学实验的低功耗混合无线电
- 批准号:
10697023 - 财政年份:2023
- 资助金额:
$ 49.54万 - 项目类别: