FoMR: DeepFetch: Compact Deep Learning based Prefetcher on Configurable Hardware

FoMR:DeepFetch:可配置硬件上基于紧凑深度学习的预取器

基本信息

  • 批准号:
    1912680
  • 负责人:
  • 金额:
    $ 20万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2022-09-30
  • 项目状态:
    已结题

项目摘要

Fast computer processors, tensor processing units, hardware accelerators, and heterogeneous architectures have enabled large-scale speed-ups in computational power, but memory speeds have not kept pace at the same time. Memory performance therefore has become the bottleneck in many applications that rely on heavy memory access. Several emerging memory technologies such 3D-Stacked Dynamic Random Access Memory (3D-DRAM) and non-volatile memory attempt to address memory bottleneck issues from a hardware perspective, but with a tradeoff among bandwidth, power, latency, and cost. Rather than redesigning existing algorithms to suit specific memory technology, this project will develop a Machine Learning-based approach that automatically learns access patterns which may be used to optimally prefetch data. Specifically, highly compact Long short-term memory (LSTM) models will be used as the centerpiece of the prefetcher for predicting memory accesses. Through novel model compression techniques, hierarchical memory modeling and dedicated hardware, this project will overcome barriers of fully exploiting machine learning and emerging hardware to improve prefetching. Successful completion of this project will lead to improved memory performance for applications, including signal processing, computer vision, and language processing.A practical LSTM based prefetcher implementation on hardware requires dealing with certain challenges that will be addressed in this endeavor: (i) training a small model (to enable fast inference) with large traces that is highly accurate in predicting memory accesses for multiple applications; (ii) model compression to ensure real-time inference; (iii) retraining the model online on-demand to learn application specific models, which would require fast learning with small amount of data; (iv) making prefetching decisions in real-time based on the prediction and uncertainty of the model ''what'', ''when'', and ''where'' to prefetch, which also requires careful modeling of the target memory hierarchy; (vi) based on the predictions, deciding in real-time if reordering data (dynamic data layout) can improve the latency, making future prefetches more effective; (vii) mapping the framework of predictions and decision making on limited available configurable hardware in - ensuring low latency training and high-throughput prefetching utilizing small area/power.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
快速的计算机处理器,张量处理单元,硬件加速器和异质体系结构已实现了计算能力的大规模加速,但是内存速度并未同时保持步伐。因此,记忆性能已成为许多依赖重型内存访问的应用程序中的瓶颈。几种新兴内存技术这样的3D堆叠动态随机访问存储器(3D-DRAM)和非挥发性内存尝试从硬件角度来解决内存瓶颈问题,但在带宽,功率,延迟,延迟和成本之间进行了权衡。该项目不是重新设计现有算法以适合特定的内存技术,而是开发一种基于机器学习的方法,该方法会自动学习访问模式,该方法可用于最佳预购数据。具体而言,高度紧凑的长期记忆(LSTM)模型将用作预摘要的核心,以预测内存访问。通过新颖的模型压缩技术,分层内存建模和专用硬件,该项目将克服完全利用机器学习和新兴硬件的障碍,以改善预取。该项目的成功完成将导致应用程序的记忆性能,包括信号处理,计算机视觉和语言处理。实用的基于LSTM的预餐具实现需要在硬件上实施,需要应对某些挑战,这些挑战将在此努力中解决:(i)培训一个小型模型(以快速的选择),可预测多个应用程序,以预测多个应用程序,以预测多个应用程序的应用程序; (ii)模型压缩以确保实时推理; (iii)在线登录模型以学习特定于应用程序的模型,这需要使用少量数据进行快速学习; (iv)基于模型的预测和不确定性实时做出预取决策,“ what''''''''''''''''''''''''''''''''''''''''''''' (vi)基于预测,实时确定是否重新排序数据(动态数据布局)可以改善延迟,从而使未来的预购更有效; (vii)在有限的可配置硬件上绘制预测和决策制定框架 - 确保使用小面积/权力的低潜伏期训练和高通量预取,这奖反映了NSF的法定任务,并认为通过基金会的知识智能和更广泛的影响,可以通过评估来进行评估,以审查Criteria。

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
RAOP: Recurrent Neural Network Augmented Offset Prefetcher
  • DOI:
    10.1145/3422575.3422807
  • 发表时间:
    2020-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna
  • 通讯作者:
    Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna
SHARP: Software Hint-Assisted Memory Access Prediction for Graph Analytics
ReSemble: reinforced ensemble framework for data prefetching
ReSemble:用于数据预取的增强型集成框架
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhang, Pengmiao;Kannan, Rajgopal;Srivastava, Ajitesh;Nori, Anant V.;Prasanna, Viktor K.
  • 通讯作者:
    Prasanna, Viktor K.
MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction
TransforMAP: Transformer for Memory Access Prediction
TransforMAP:用于内存访问预测的变压器
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Viktor Prasanna其他文献

Accelerating Deep Neural Network guided MCTS using Adaptive Parallelism
使用自适应并行加速深度神经网络引导的 MCTS
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuan Meng;Qian Wang;Tianxin Zu;Viktor Prasanna
  • 通讯作者:
    Viktor Prasanna
Accelerating GNN Training on CPU+Multi-FPGA Heterogeneous Platform
在 CPU 多 FPGA 异构平台上加速 GNN 训练
PEARL: Enabling Portable, Productive, and High-Performance Deep Reinforcement Learning using Heterogeneous Platforms
PEARL:使用异构平台实现便携式、高效且高性能的深度强化学习

Viktor Prasanna的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Viktor Prasanna', 18)}}的其他基金

IUCRC Phase I University of Southern California: Center for Intelligent Distributed Embedded Applications and Systems (IDEAS)
IUCRC 第一期南加州大学:智能分布式嵌入式应用和系统中心 (IDEAS)
  • 批准号:
    2231662
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Elements: Portable Library for Homomorphic Encrypted Machine Learning on FPGA Accelerated Cloud Cyberinfrastructure
元素:FPGA 加速云网络基础设施上同态加密机器学习的便携式库
  • 批准号:
    2311870
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
OAC Core: Scalable Graph ML on Distributed Heterogeneous Systems
OAC 核心:分布式异构系统上的可扩展图 ML
  • 批准号:
    2209563
  • 财政年份:
    2022
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: Accelerating Privacy Preserving Deep Learning for Real-time Secure Applications
SaTC:核心:小型:加速实时安全应用程序的隐私保护深度学习
  • 批准号:
    2104264
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Collaborative Research:PPoSS:Planning: Streamware - A Scalable Framework for Accelerating Streaming Data Science
合作研究:PPoSS:规划:Streamware - 加速流数据科学的可扩展框架
  • 批准号:
    2119816
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
RAPID: ReCOVER: Accurate Predictions and Resource Allocation for COVID-19 Epidemic Response
RAPID:ReCOVER:COVID-19 流行病应对的准确预测和资源分配
  • 批准号:
    2027007
  • 财政年份:
    2020
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
CNS Core: Small: AccelRITE: Accelerating ReInforcemenT Learning based AI at the Edge Using FPGAs
CNS 核心:小型:AccelRITE:使用 FPGA 在边缘加速基于强化学习的 AI
  • 批准号:
    2009057
  • 财政年份:
    2020
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
OAC Core: Small: Scalable Graph Analytics on Emerging Cloud Infrastructure
OAC 核心:小型:新兴云基础设施上的可扩展图形分析
  • 批准号:
    1911229
  • 财政年份:
    2019
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
CNS: CSR: Small: Exploiting 3D Memory for Energy-Efficient Memory-Driven Computing
CNS:CSR:小型:利用 3D 内存实现节能内存驱动计算
  • 批准号:
    1643351
  • 财政年份:
    2016
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
EAGER: Safer Connected Communities Through Integrated Data-driven Modeling, Learning, and Optimization
EAGER:通过集成的数据驱动建模、学习和优化打造更安全的互联社区
  • 批准号:
    1637372
  • 财政年份:
    2016
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了