面向物端应用的深度学习处理器自动设计技术

结题报告

项目介绍

AI项目解读

基本信息

批准号：
61876173
项目类别：
面上项目
资助金额：
62.0万
负责人：
李华伟
依托单位：
中国科学院计算技术研究所
学科分类：
F0608.智能系统与人工智能安全
结题年份：
2022
批准年份：
2018
项目状态：
已结题
起止时间：
2019-01-01 至2022-12-31

项目参与者：
王天成；晁慧娜；许浩博；唐忆滨；邹凯伟；乔晶；汪雍琛；
关键词：
智能芯片设计低功耗智能芯片设计自动化神经网络处理器

项目摘要

The deep learning accelerator is increasingly critical to achieve intelligence in cyber physical applications. The general-purpose deep learning processors fail to meet the stringent energy efficiency requirement of pervasive cyber physical applications; meanwhile the design automation tools bridging deep learning software models and processor architectures are still at their infancy. The main research topics in this proposal include the cross-layer automatic mapping and design optimization across the neural network model layer, the architecture layer, the microarchitectural layer, and the circuit layer; the automatic co-optimizating the interaction of model layer and system software layer. The research achievements include: 1) Dataflow-driven automatic mapping from neural network model to hyper-parallel architectures; 2) Machine learning-driven automatic optimization for microarchitecture parameters; 3) Circuit layer approximate computing units mapping and optimization; 4) Prototype toolset compatible with existing deep learing development envirouments, and application to a demonstrative cyper physical system. This research will innovate the design methodology, provide a suit of key techniques for design automation of cyber physical deep learning processors, produce a series of intellectual properties, and fuel the research and applications towards artificial intelligence oriented emerging computing devices and chip designs.

深度学习处理器对于物端智能应用日益重要。现有的通用深度学习处理器难以满足泛在物端应用的高能效需求；从深度学习软件模型到处理器硬件架构的设计自动化商用工具尚未出现。本项目拟研究针对物端应用的深度学习处理器自动定制框架，包括从神经网络模型层到体系结构层、微体系结构层、电路层的自动映射与设计优化方法，基于硬件神经网络的模型层与系统软件层的自动优化方法。预期研究成果：（1）数据流驱动的神经网络模型层到超并行体系结构层的自动映射方法；（2）机器学习驱动的应用约束下微体系结构层参数自动优化方法；（3）神经网络处理器电路层近似计算单元自动映射与优化方法；（4）构建原型自动定制工具和系统，兼容现有深度学习开发软件。本研究将为物端深度学习处理器的自动定制提供创新方法和关键技术，形成核心知识产权，推动面向人工智能的新型计算器件与芯片设计的研究和应用。

结项摘要

深度学习处理器对于物端智能应用日益重要。通用人工智能芯片编程容易但能效不高、专用深度学习芯片能效高但定制设计成本高与难度大，如何平衡“通用”与“专用”是实现物端计算智能需要解决的两难问题。亟需从深度学习软件模型到处理器硬件架构的设计自动化工具来满足碎片化物端应用对智能处理器的快速定制需求。本项目针对物端应用的深度学习处理器自动定制框架，提出了系列从神经网络模型层到体系结构层、微体系结构层、电路层的自动映射与设计优化方法，发表论文40篇，申请发明专利5项，获得软件著作权2项。其中一些重要创新成果包括：1）在神经网络处理器体系结构层自动映射方面，提出了基于计算存储器的向量检索架构自动设计方法VStore、用于高效设计神经网络加速器的神经网络搜索处理器、面向FPGA的图神经网络硬件加速器自动生成框架等，为神经网络处理器的体系结构自动设计和优化提出了有效解决方案。2）在考虑应用约束的微体系结构参数自动优化方法方面，提出了基于图神经网络的片上网络功耗-性能-面积评估算法、处理压缩视频流的神经网络加速架构Alchemist、基于阻变式存储器的灵活精度调整卷积神经网络加速器等，在满足应用需求的情况下提升了系统的性能、能效和可靠性。3）在面向物端定制神经网络的电路层自动映射与优化方面，提出了面向移动平台的实时性神经网络自动优化设计方法、云端与边缘协同的深度学习任务映射方法、面向状态感知的ReRAM神经网络计算方法等，针对物端资源受限的情况下设计了系列低功耗的神经网络加速器电路。.项目研究扩展了传统的高层次综合概念范畴，是高层次综合在深度学习处理器上的创新性应用，具有重要的科学意义。本项目同时考虑了目标应用的实时性目标与能耗约束，创新了面向物端专用领域的深度学习处理器芯片设计自动化技术，以满足物端对人工智能应用的泛在、多样化需求，可有效助力物端计算的智能化。

项目成果

期刊论文数量（15）

专著数量（0）

科研奖励数量（1）

会议论文数量（25）

专利数量（5）

CAP: Communication-aware Automated Parallelization for Deep Learning Inference on CMP Architectures

CAP：用于 CMP 架构上深度学习推理的通信感知自动并行化

DOI：
10.1109/tc.2021.3099688
发表时间：
2022
期刊：
IEEE Transactions on Computers
影响因子：
3.7
作者：
Kaiwei Zou;Ying Wang;Long Cheng;Songyun Qu;Huawei Li;Xiaowei Li
通讯作者：
Xiaowei Li

R2F: A Remote Retraining Framework for AIoT Processors With Computing Errors

R2F：针对存在计算错误的 AIoT 处理器的远程再训练框架

DOI：
10.1109/tvlsi.2021.3089224
发表时间：
2021
期刊：
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
影响因子：
2.8
作者：
Xu Dawen;He Meng;Liu Cheng;Wang Ying;Cheng Long;Li Huawei;Li Xiaowei;Cheng Kwang-Ting
通讯作者：
Cheng Kwang-Ting

An Edge 3D CNN Accelerator for Low Power Activity Recognition

用于低功耗活动识别的边缘 3D CNN 加速器

DOI：
10.1109/tcad.2020.3011042
发表时间：
2021
期刊：
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
影响因子：
2.9
作者：
Ying Wang;Yongchen Wang;Cong Shi;Long Cheng;Huawei Li;Xiaowei Li
通讯作者：
Xiaowei Li

To cloud or not to cloud: an on-line scheduler for dynamic privacy-protection of deep learning workload on edge devices

到云还是不到云：用于边缘设备上深度学习工作负载的动态隐私保护的在线调度程序

DOI：
10.1007/s42514-020-00052-7
发表时间：
2021
期刊：
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING
影响因子：
--
作者：
Tang Yibin;Wang Ying;Li Huawei;Li Xiaowei
通讯作者：
Li Xiaowei

An Efficient Deep Learning Accelerator Architecture for Compressed Video Analysis

用于压缩视频分析的高效深度学习加速器架构

DOI：
10.1109/tcad.2021.3120076
发表时间：
2022
期刊：
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
影响因子：
2.9
作者：
Yongchen Wang;Ying Wang;Huawei Li;Xiaowei Li
通讯作者：
Xiaowei Li

数据更新时间：{{ journalArticles.updateTime }}