基于视觉感知模型的视频编码关键技术研究

项目介绍

AI项目解读

基本信息

批准号：
61573037
项目类别：
面上项目
资助金额：
66.0万
负责人：
徐迈
依托单位：
北京航空航天大学
学科分类：
F0609.认知与神经科学启发的人工智‍能
结题年份：
2019
批准年份：
2015
项目状态：
已结题
起止时间：
2016-01-01 至2019-12-31

项目参与者：
李景文；丁志国；丁志国；孙兵；任杰；刘景贤；刘哲；邓欣；李胜曦；蒋铼；
关键词：
稀疏表示选择性注意模型视频编码

项目摘要

Most recently, due to the rapid development of smart terminals as well as the popularity of cutting-edge internet video services (e.g., Youtube), the video data delivered over the networks have become increasingly huge. This results in the bottleneck between supply and demand on the network bandwidth. From the viewpoint of coding theories, it is possible to work out the solutions to break through the bandwidth bottleneck, for the future visual communication. In fact, the conventional coding theories evolve in light of digital signal processing, thus approaching to “marginal effect of innovation”. On the other hand, the CPU performance of the computing terminals increases dramatically, which may provide the opportunities to solve the problem on the bandwidth bottleneck using machine learning approaches. Therefore, another way towards video coding is that from the perspective of human cognition, we can utilize the machine learning tools to establish the human visual perception model, and then study on the approaches of visual perception inspired video coding. As such, the bandwidth can be saved, benefiting from intelligent computing. However, in the academic world the research on this topic is still in its infancy at this moment..In this project, on the basis of our previous research achievement, we shall focus on the fundamental research on perception inspired video coding, which can be seen as inter-discipline of cognitive science, computing and communications, for significantly improving the efficiency of video coding. To be more specific, our project will include three key techniques: (1) online-dictionary-learning based image sparse representation, (2) deep learning based visual perception model of video with varying distortion, (3) optimization on rate-complexity-subjective distortion with respect to the visual perception model. In a word, this project will aim at providing the new theoretical and technical foundation for the future video coding technologies.

近年来，随着智能终端的发展以及在线视频等新型业务的普及，网络中所传输的视频数据量呈爆炸式增长的趋势，网络带宽供求矛盾日益尖锐。视频编码理论是突破网络带宽瓶颈的重要途径之一。传统编码理论一直沿着数字信号处理的思路演进，难以突破“边际效应”。事实上，当前终端计算能力飞速发展，为解决带宽供求矛盾提供了契机。因此，另一种新的研究思路是从人类视觉认知角度出发，利用机器学习的计算工具，构建人类视觉感知模型，研究基于视觉感知模型的视频编码关键技术，以智能计算换取带宽；该工作在国内外尚处于起步阶段。本项目将利用已有研究基础，以提高视频压缩效率为目标，重点开展认知、计算与信号处理交叉理论研究，研究内容包括三方面关键技术：（1）基于纹理字典在线学习的图像稀疏表示；（2）基于深度学习的不同失真下视频的视觉注意模型；（3）视觉感知模型下的率-复杂度-感知失真优化。本项目将为视频编码理论提供新的理论依据与技术支持。

结项摘要

本项目为突破视频通信中网络带宽受限的瓶颈难题，从视觉感知角度出发，利用机器学习的计算工具，构建视觉感知模型，研究基于视觉感知的视频编码关键技术，以智能计算换取视频传输带宽，在视频感知与压缩方面取得了理论和技术突破，解决了传统率-失真理论无法优化用户体验的难题，成倍提升了视频压缩效率，达到了预期的研究目标。取得的主要创新成果如下。.1、针对视频表征效率低的问题，提出了基于纹理字典在线学习的图像稀疏表示模型，并将稀疏表示模型应用于图像与视频表征中，显著提升了表征效率、降低了表征数据量。实验结果表明：本项目的表征模型同时提高了图像重构质量与识别精度，突破现有图像结构化表征方法在多处理任务上的泛化能力。.2、为模拟人类视觉注意机制，建立了大规模视频关注点数据库，提出一系列图像/视频显著性预测方法，构建数据驱动下的视觉注意模型，预测人类观看图像/视频的关注点。实验结果表明：与现有工作相比，本项目的方法在多个测试库上均大幅提高了视频显著性检测精度，CC精度平均提升高达63%。.3、在稀疏表示模型与视觉注意模型基础上，研究了基于感知模型的率-复杂度-感知失真优化。建立了感知失真度量准则，设计了率-感知失真优化方程，解决了传统方法无法实现最优码率分配的难题，在满足目标码率的前提下，使视频压缩后感知失真最小；构建了面向四叉树分割的深度学习模型，实现了感知失真优化下视频压缩编解码复杂度控制。实验结果表明：相同用户体验前提下，可将HEVC压缩码率减少约50%、计算复杂度节省约70%。.本项目在IEEE TPAMI、TIP、JSAC、TCSVT等SCI期刊发表论文34篇，在ICCV、CVPR、ECCV、DCC、AAAI等发表EI会议论文23篇；获IEEE会议最佳论文奖1项，提名奖1项。获授权发明专利5项、软件著作权2项；被国际标准采纳技术提案2项。获得中国人工智能学会技术发明一等奖（2017年）、教育部科技进步一等奖（2018年）。项目负责人获得国家自然科学基金优秀青年基金资助（2019年），入选教育部青年长江学者（2018年），获中国电子学会优秀科技工作者荣誉（2017年）；培养的学生于2018年、2019年获得连续2年获得中国电子学会优秀硕士论文奖，并获得2017年中国电子学会优秀硕士论文提名奖。

项目成果

期刊论文数量（31）

专著数量（0）

科研奖励数量（3）

会议论文数量（23）

专利数量（3）

Subjective-Driven Complexity Control Approach for HEVC

HEVC 的主观驱动复杂性控制方法

DOI：
10.1109/tcsvt.2015.2474075
发表时间：
2016
期刊：
IEEE Transactions on Circuits and Systems for Video Technology
影响因子：
8.4
作者：
Xin Deng;Mai Xu;Lai Jiang;Xiaoyan Sun;Zulin Wang
通讯作者：
Zulin Wang

Fast H. 264 to HEVC Transcoding: A Deep Learning Method (DOI: 10.1109/TMM.2018.2885921 )

快速 H. 264 到 HEVC 转码：一种深度学习方法（DOI：10.1109/TMM.2018.2885921）

DOI：
--
发表时间：
2019
期刊：
IEEE Transactions on Multimedia
影响因子：
7.3
作者：
Jingyao Xu;Mai Xu;Yanan Wei;Zulin Wang;Zhenyu Guan
通讯作者：
Zhenyu Guan

Learning-Based Saliency Detection of Face Images

基于学习的人脸图像显着性检测

DOI：
10.1109/access.2017.2689776
发表时间：
2017
期刊：
IEEE Access
影响因子：
3.9
作者：
Yun Ren;Mai Xu;Zulin Wang
通讯作者：
Zulin Wang

Find Who to Look at: Turning From Action to Saliency

寻找看谁：从行动转向显着性

DOI：
10.1109/tip.2018.2837106
发表时间：
2018-05
期刊：
IEEE Transactions on Image Processing
影响因子：
10.6
作者：
Mai Xu;Yufan Liu;Rol;Hu;Feng He
通讯作者：
Feng He

Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach

预测全景视频中的头部运动：深度强化学习方法

DOI：
10.1109/tpami.2018.28587
发表时间：
2019
期刊：
IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI) (ESI高被引论文)
影响因子：
--
作者：
Mai Xu;Yuhang Song;Jianyi Wang;Minglang Qiao;Liangyu Huo;Zulin Wang
通讯作者：
Zulin Wang