FoMR: DeepFetch: Compact Deep Learning based Prefetcher on Configurable Hardware

FoMR：DeepFetch：可配置硬件上基于紧凑深度学习的预取器

基本信息

批准号：
1912680
负责人：
Viktor Prasanna
金额：
$ 20万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1912680&HistoricalAwards=false
关键词：
FoMR DeepFetch Compact Deep Learning

项目摘要

Fast computer processors, tensor processing units, hardware accelerators, and heterogeneous architectures have enabled large-scale speed-ups in computational power, but memory speeds have not kept pace at the same time. Memory performance therefore has become the bottleneck in many applications that rely on heavy memory access. Several emerging memory technologies such 3D-Stacked Dynamic Random Access Memory (3D-DRAM) and non-volatile memory attempt to address memory bottleneck issues from a hardware perspective, but with a tradeoff among bandwidth, power, latency, and cost. Rather than redesigning existing algorithms to suit specific memory technology, this project will develop a Machine Learning-based approach that automatically learns access patterns which may be used to optimally prefetch data. Specifically, highly compact Long short-term memory (LSTM) models will be used as the centerpiece of the prefetcher for predicting memory accesses. Through novel model compression techniques, hierarchical memory modeling and dedicated hardware, this project will overcome barriers of fully exploiting machine learning and emerging hardware to improve prefetching. Successful completion of this project will lead to improved memory performance for applications, including signal processing, computer vision, and language processing.A practical LSTM based prefetcher implementation on hardware requires dealing with certain challenges that will be addressed in this endeavor: (i) training a small model (to enable fast inference) with large traces that is highly accurate in predicting memory accesses for multiple applications; (ii) model compression to ensure real-time inference; (iii) retraining the model online on-demand to learn application specific models, which would require fast learning with small amount of data; (iv) making prefetching decisions in real-time based on the prediction and uncertainty of the model ''what'', ''when'', and ''where'' to prefetch, which also requires careful modeling of the target memory hierarchy; (vi) based on the predictions, deciding in real-time if reordering data (dynamic data layout) can improve the latency, making future prefetches more effective; (vii) mapping the framework of predictions and decision making on limited available configurable hardware in - ensuring low latency training and high-throughput prefetching utilizing small area/power.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

快速的计算机处理器，张量处理单元，硬件加速器和异质体系结构已实现了计算能力的大规模加速，但是内存速度并未同时保持步伐。因此，记忆性能已成为许多依赖重型内存访问的应用程序中的瓶颈。几种新兴内存技术这样的3D堆叠动态随机访问存储器（3D-DRAM）和非挥发性内存尝试从硬件角度来解决内存瓶颈问题，但在带宽，功率，延迟，延迟和成本之间进行了权衡。该项目不是重新设计现有算法以适合特定的内存技术，而是开发一种基于机器学习的方法，该方法会自动学习访问模式，该方法可用于最佳预购数据。具体而言，高度紧凑的长期记忆（LSTM）模型将用作预摘要的核心，以预测内存访问。通过新颖的模型压缩技术，分层内存建模和专用硬件，该项目将克服完全利用机器学习和新兴硬件的障碍，以改善预取。该项目的成功完成将导致应用程序的记忆性能，包括信号处理，计算机视觉和语言处理。实用的基于LSTM的预餐具实现需要在硬件上实施，需要应对某些挑战，这些挑战将在此努力中解决：（i）培训一个小型模型（以快速的选择），可预测多个应用程序，以预测多个应用程序，以预测多个应用程序的应用程序；（ii）模型压缩以确保实时推理；（iii）在线登录模型以学习特定于应用程序的模型，这需要使用少量数据进行快速学习；（iv）基于模型的预测和不确定性实时做出预取决策，“ what''''''''''''''''''''''''''''''''''''''''''''' （vi）基于预测，实时确定是否重新排序数据（动态数据布局）可以改善延迟，从而使未来的预购更有效；（vii）在有限的可配置硬件上绘制预测和决策制定框架 - 确保使用小面积/权力的低潜伏期训练和高通量预取，这奖反映了NSF的法定任务，并认为通过基金会的知识智能和更广泛的影响，可以通过评估来进行评估，以审查Criteria。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

RAOP: Recurrent Neural Network Augmented Offset Prefetcher

DOI：
10.1145/3422575.3422807
发表时间：
2020-09
期刊：
Proceedings of the International Symposium on Memory Systems
影响因子：
0
作者：
Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna
通讯作者：
Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna

SHARP: Software Hint-Assisted Memory Access Prediction for Graph Analytics

DOI：
10.1109/hpec55821.2022.9926307
发表时间：
2022-09
期刊：
2022 IEEE High Performance Extreme Computing Conference (HPEC)
影响因子：
0
作者：
Pengmiao Zhang;R. Kannan;Xiangzhi Tong;Anant V. Nori;V. Prasanna
通讯作者：
Pengmiao Zhang;R. Kannan;Xiangzhi Tong;Anant V. Nori;V. Prasanna

ReSemble: reinforced ensemble framework for data prefetching

ReSemble：用于数据预取的增强型集成框架

DOI：
发表时间：
2022
期刊：
Storage and Analysis
影响因子：
0
作者：
Zhang, Pengmiao;Kannan, Rajgopal;Srivastava, Ajitesh;Nori, Anant V.;Prasanna, Viktor K.
通讯作者：
Prasanna, Viktor K.

MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction

DOI：
10.1007/978-3-030-47436-2_5
发表时间：
2020-04-17
期刊：
Advances in Knowledge Discovery and Data Mining
影响因子：
0
作者：
Srivastava A;Wang TY;Zhang P;De Rose CA;Kannan R;Prasanna VK
通讯作者：
Prasanna VK

TransforMAP: Transformer for Memory Access Prediction

TransforMAP：用于内存访问预测的变压器

DOI：
发表时间：
2021
期刊：
International Symposium on Computer Architecture
影响因子：
0
作者：
Zhang, Pengmiao;Srivastava, Ajitesh;Kannan, Rajgopal;Nori, Anant V.;Prasanna, Viktor K.
通讯作者：
Prasanna, Viktor K.