SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform

SPX：协作研究：FASTLEAP：基于 FPGA 的紧凑型深度学习平台

基本信息

批准号：
1919117
负责人：
Yanzhi Wang
金额：
$ 35万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1919117&HistoricalAwards=false
关键词：
SPX Collaborative Research FASTLEAP FPGA

项目摘要

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used because of their high accuracy, excellent scalability, and self-adaptiveness properties. Many applications employ DNNs as the core technology, such as face detection, speech recognition, scene parsing. To meet the high accuracy requirement of various applications, DNN models are becoming deeper and larger, and are evolving at a fast pace. They are computation and memory intensive and pose intensive challenges to the conventional Von Neumann architecture used in computing. The key problem addressed by the project is how to accelerate deep learning, not only inference, but also training and model compression, which have not received enough attention in the prior research. This endeavor has the potential to enable the design of fast and energy-efficient deep learning systems, applications of which are found in our daily lives -- ranging from autonomous driving, through mobile devices, to IoT systems, thus benefiting the society at large.The outcome of this project is FASTLEAP - an Field Programmable Gate Array (FPGA)-based platform for accelerating deep learning. The platform takes in a dataset as an input and outputs a model which is trained, pruned, and mapped on FPGA, optimized for fast inferencing. The project will utilize the emerging FPGA technologies that have access to High Bandwidth Memory (HBM) and consist of floating-point DSP units. In a vertical perspective, FASTLEAP integrates innovations from multiple levels of the whole system stack algorithm, architecture and down to efficient FPGA hardware implementation. In a horizontal perspective, it embraces systematic DNN model compression and associated FPGA-based training, as well as FPGA-based inference acceleration of compressed DNN models. The platform will be delivered as a complete solution, with both the software tool chain and hardware implementation to ensure the ease of use. At algorithm level of FASTLEAP, the proposed Alternating Direction Method of Multipliers for Neural Networks (ADMM-NN) framework, will perform unified weight pruning and quantization, given training data, target accuracy, and target FPGA platform characteristics (performance models, inter-accelerator communication). The training procedure in ADMM-NN is performed on a platform with multiple FPGA accelerators, dictated by the architecture-level optimizations on communication and parallelism. Finally, the optimized FPGA inference design is generated based on the trained DNN model with compression, accounting for FPGA performance modeling. The project will address the following SPX research areas: 1) Algorithms: Bridging the gap between deep learning developments in theory and their system implementations cognizant of performance model of the platform. 2) Applications: Scaling of deep learning for domains such as image processing. 3) Architecture and Systems: Automatic generation of deep learning designs on FPGA optimizing area, energy-efficiency, latency, and throughput.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

近年来，随着人工智能的兴起，深度神经网络（DNN）因其高精度、良好的可扩展性和自适应性而得到广泛应用。许多应用都采用 DNN 作为核心技术，例如人脸检测、语音识别、场景解析。为了满足各种应用的高精度要求，DNN模型变得越来越深、越来越大，并且正在快速发展。它们是计算和内存密集型的，并对计算中使用的传统冯诺依曼架构提出了严峻的挑战。该项目解决的关键问题是如何加速深度学习，不仅是推理，还包括训练和模型压缩，这些在之前的研究中没有得到足够的重视。这项努力有可能实现快速、节能的深度学习系统的设计，其应用广泛存在于我们的日常生活中——从自动驾驶、移动设备到物联网系统，从而造福整个社会。该项目的成果是 FASTLEAP——一个基于现场可编程门阵列 (FPGA) 的平台，用于加速深度学习。该平台将数据集作为输入并输出模型，该模型经过训练、修剪并映射到 FPGA 上，并针对快速推理进行了优化。该项目将利用新兴的 FPGA 技术，该技术可以访问高带宽内存 (HBM) 并由浮点 DSP 单元组成。从纵向来看，FASTLEAP融合了从整个系统堆栈算法、架构到高效的FPGA硬件实现等多个层面的创新。从横向来看，它包括系统的DNN模型压缩和相关的基于FPGA的训练，以及压缩DNN模型的基于FPGA的推理加速。该平台将作为完整的解决方案提供，包括软件工具链和硬件实现，以确保易用性。在 FASTLEAP 的算法层面，提出的神经网络乘法器交替方向法 (ADMM-NN) 框架将在给定训练数据、目标精度和目标 FPGA 平台特性（性能模型、加速器间加速器）的情况下执行统一的权重修剪和量化沟通）。 ADMM-NN 中的训练过程在具有多个 FPGA 加速器的平台上执行，由通信和并行性的架构级优化决定。最后，基于经过压缩训练的 DNN 模型生成优化的 FPGA 推理设计，并考虑 FPGA 性能建模。该项目将涉及以下 SPX 研究领域： 1) 算法：弥合深度学习理论发展与平台性能模型的系统实现之间的差距。 2）应用：深度学习在图像处理等领域的扩展。 3) 架构和系统：自动生成关于 FPGA 优化面积、能效、延迟和吞吐量的深度学习设计。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值和更广泛的影响审查进行评估，被认为值得支持标准。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

SPViT：通过延迟感知软令牌修剪启用更快的视觉转换器

DOI：
10.1007/978-3-031-20083-0_37
发表时间：
2021-12-27
期刊：
影响因子：
0
作者：
Zhenglun Kong;Peiyan Dong;Xiaolong Ma;Xin Meng;Wei Niu;Mengshu Sun;Bin Ren;Minghai Qin;H. Tang
通讯作者：
H. Tang

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

剥洋葱：分层减少数据冗余以实现高效的 Vision Transformer 训练

DOI：
10.48550/arxiv.2211.10801
发表时间：
2022-11-19
期刊：
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
影响因子：
0
作者：
Zhenglun Kong;Haoyu Ma;Geng Yuan;Mengshu Sun;Yanyue Xie;Peiyan Dong;Xin Meng;Xuan Shen;Hao Tang;Minghai Qin;Tianlong Chen;Xiaolong Ma;Xiaohui Xie;Zhangyang Wang;Yanzhi Wang
通讯作者：
Yanzhi Wang

Advancing Model Pruning via Bi-level Optimization

通过双层优化推进模型剪枝

DOI：
10.48550/arxiv.2210.04092
发表时间：
2022-10-08
期刊：
ArXiv
影响因子：
0
作者：
Yihua Zhang;Yuguang Yao;Parikshit Ram;Pu Zhao;Tianlong Chen;Min;Yanzhi Wang;Sijia Liu
通讯作者：
Sijia Liu

ESRU: Extremely Low-Bit and Hardware-Efficient Stochastic Rounding Unit Design for Low-Bit DNN Training

ESRU：用于低位 DNN 训练的极低位和硬件高效的随机舍入单元设计

DOI：
10.23919/date56975.2023.10137222
发表时间：
2023-04
期刊：
Automation & Test in Europe Conference & Exhibition (DATE
影响因子：
0
作者：
Chang, Sung;Yuan, Geng;Lu, Alec;Sun, Mengshu;Li, Yanyu;Ma, Xiaolong;Li, Zhengang;Xie, Yanyue;Qin, Minghai;Lin, Xue;et al
通讯作者：
et al

You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding

您已经拥有了：使用随机舍入的无生成器低精度 DNN 训练框架

DOI：
发表时间：
2022-10
期刊：
2022.
影响因子：
0
作者：
Yuan, Geng;Chang, Sung;Jin, Qing;Lu, Alec;Li, Yanyu;Wu, Yushu;et al.
通讯作者：
et al.

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yanzhi Wang其他文献

Reduced-Complexity Deep Neural Networks Design Using Multi-Level Compression

使用多级压缩降低复杂度的深度神经网络设计

DOI：
10.1109/tsusc.2017.2710178
发表时间：
2019-04-01
期刊：
IEEE Transactions on Sustainable Computing
影响因子：
3.9
作者：
Siyu Liao;Yi Xie;X. Lin;Yanzhi Wang;M. Zhang;Bo Yuan
通讯作者：
Bo Yuan

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning

ClickTrain：通过细粒度的架构保留剪枝实现高效、准确的端到端深度学习训练

DOI：
10.1145/3447818.3459988
发表时间：
2020-11-20
期刊：
Proceedings of the 35th ACM International Conference on Supercomputing
影响因子：
0
作者：
Chengming Zhang;Geng Yuan;Wei Niu;Jiannan Tian;Sian Jin;Donglin Zhuang;Zhe Jiang;Yanzhi Wang;Bin Ren;S. Song;Dingwen Tao
通讯作者：
Dingwen Tao