SHF: Small: Sparsity-Aware Hardware Accelerators for Natural Language Processing with Transformers

SHF：小型：使用 Transformer 进行自然语言处理的稀疏感知硬件加速器

基本信息

批准号：
2007362
负责人：
Peter Milder
金额：
$ 50万
依托单位：
SUNY at Stony Brook
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2007362&HistoricalAwards=false
关键词：
SHF Small Sparsity Aware Hardware

项目摘要

Natural Language Processing (NLP) enables people to interact with machines in the same manner as with each other. More importantly, it provides machines with the ability to access the information and knowledge that are readily available in books, articles, and various unstructured documents. Because the quality and usability of NLP-powered services depends primarily on the quantity of text the system is able to process, the computational demands of advanced NLP applications far exceed the capabilities of general-purpose computers and continue to grow. This project aims to greatly improve the performance of NLP applications based on transformers, a class of neural networks used in most state-of-the-art NLP technology. This project will significantly improve performance and efficiency for NLP applications, enabling their widespread deployment in emerging datacenters and thus enhancing the quality of human interactions with machines and each other.This project advances the state of the art of accelerators (hardware and compilers) for natural language processing, focusing primarily on sparsity-aware inference in large multi-layered self-attention based models, which have so far received limited attention from the architecture community. The project also advances NLP knowledge of sparse attention functions, studies design techniques that allow for repurposing pre-trained models to run faster, and improves the effectiveness in applications which diverge from its training setting. The investigation focuses on the key observation that the massive growth in computational complexity can be mitigated by dynamically identifying inherent sparsity and ineffectual computation in models, refitting the model to induce sparsity with the goal of either approximating or entirely avoiding parts of the computation that have limited impact on the model results. This investigation will demonstrate the performance improvement obtained by these techniques, leveraging sparsity and dynamic predictions within a novel sparsity-aware hardware acceleration framework, implemented on a field-programmable gate array (FPGA).This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

自然语言处理（NLP）使人们能够以与彼此相同的方式与机器互动。更重要的是，它为机器提供了访问书籍，文章和各种非结构化文档中随时可用的信息和知识的能力。由于NLP驱动服务的质量和可用性主要取决于系统能够处理的文本数量，因此高级NLP应用程序的计算需求远远超过了通用计算机的功能并继续增长。该项目旨在大大提高基于变形金刚的NLP应用程序的性能，Transformers是大多数最先进的NLP技术中使用的一类神经网络。该项目将显着提高NLP应用程序的性能和效率，使其在新兴数据中心中广泛部署，从而提高人类与机器的互动质量，彼此之间的交互质量。该项目促进了加速器（硬件和编译器）对自然语言处理，主要基于sparsity-aware aware aware aware interioned interive interive interive interive interive interive interive interive interive interive interion for interion for interion for y rayie layi-lay layi frays lay lay lay a的质量。建筑社区。该项目还提高了NLP注意力稀疏功能的知识，研究设计技术允许重新利用预训练的模型更快地运行，并提高了与训练环境不同的应用程序的有效性。该研究的重点是关键观察，即通过动态识别模型中固有的稀疏性和无效计算，可以减轻计算复杂性的大规模增长，从而促进模型诱导稀疏性，目的是近似或完全避免计算部分对模型结果影响有限的部分。这项调查将证明这些技术获得的性能提高，利用稀疏性和动态预测在一个新颖的意识到的硬件加速框架中实施，该框架在现场可编程的门阵列（FPGA）上实施。本奖奖反映了NSF的法定任务，并通过评估智力效果和广泛的评估，并被视为值得通过评估的支持。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

IrEne: Interpretable Energy Prediction for Transformers

DOI：
10.18653/v1/2021.acl-long.167
发表时间：
2021-06
期刊：
ArXiv
影响因子：
0
作者：
Qingqing Cao;Yash Kumar Lal;H. Trivedi;A. Balasubramanian;Niranjan Balasubramanian
通讯作者：
Qingqing Cao;Yash Kumar Lal;H. Trivedi;A. Balasubramanian;Niranjan Balasubramanian

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

DOI：
10.18653/v1/2021.findings-acl.363
发表时间：
2021-06
期刊：
ArXiv
影响因子：
0
作者：
Tianchu Ji;Shraddhan Jain;M. Ferdman;Peter Milder;H. A. Schwartz;Niranjan Balasubramanian
通讯作者：
Tianchu Ji;Shraddhan Jain;M. Ferdman;Peter Milder;H. A. Schwartz;Niranjan Balasubramanian

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Peter Milder其他文献

"Smart" design space sampling to predict Pareto-optimal solutions

“智能”设计空间采样来预测帕累托最优解决方案

DOI：
10.1145/2248418.2248436
发表时间：
2012
期刊：
影响因子：
0
作者：
M. Zuluaga;Andreas Krause;Peter Milder;Markus Püschel
通讯作者：
Markus Püschel

Domain-specific library generation for parallel software and hardware platforms

用于并行软件和硬件平台的特定领域库生成

DOI：
发表时间：
2008
期刊：
2008 IEEE International Symposium on Parallel and Distributed Processing
影响因子：
0
作者：
F. Franchetti;Y. Voronenko;Peter Milder;S. Chellappa;Marek R. Telgarsky;Hao Shen;P. D'Alberto;Frédéric de Mesmay;J. Hoe;José M. F. Moura;Markus Püschel
通讯作者：
Markus Püschel

Wireless Multicast Rate Control Adaptive to Application Goodput and Loss Requirements

适应应用吞吐量和丢失要求的无线组播速率控制

DOI：
发表时间：
2024
期刊：
International Conference on Internet-of-Things Design and Implementation
影响因子：
0
作者：
Mohammed Elbadry;Fan Ye;Peter Milder
通讯作者：
Peter Milder

Generation and transmission of 85.4 Gb/s real-time 16QAM coherent optical OFDM signals over 400 km SSMF with preamble-less reception.

在 400 km SSMF 上生成和传输 85.4 Gb/s 实时 16QAM 相干光 OFDM 信号，并具有无前导码接收功能。

DOI：
发表时间：
2012
期刊：
Optics Express
影响因子：
3.8
作者：
R. Bouziane;R. Schmogrow;D. Hillerkuss;Peter Milder;C. Koos;W. Freude;J. Leuthold;P. Bayvel;R. Killey
通讯作者：
R. Killey