大规模核方法积分算子谱分析的模型选择方法

项目介绍

AI项目解读

基本信息

批准号：
61703396
项目类别：
青年科学基金项目
资助金额：
24.0万
负责人：
刘勇
依托单位：
中国人民大学
学科分类：
F0603.机器学习
结题年份：
2020
批准年份：
2017
项目状态：
已结题
起止时间：
2018-01-01 至2020-12-31

项目参与者：
张华；林海伦；史敬；韩来明；杨威；曹雨晨；高宜文；李健；李明明；
关键词：
核方法大间隔方法正则化高效优化方法支持向量机

项目摘要

Whether the inductive bias of the machine learning algorithm maps the problem under consideration directly affects the learning performance of the machine learning algorithm. Model selection of large-scale kernel methods is the process to determine the inductive bias of large-scale kernel methods, which is the bottleneck and key to the theoretical research and application of large-scale kernel methods. In the existing approaches of model selection, the explicit description of the feature space induced by the kernel function is ignored, therefore the model selection of kernel methods lacks of interpretation; Current model selection criteria and algorithms are based on the accurate computation upon the full data set, which suffer poor scalability; In most of existing large-scale kernel methods, a model is selected based on the priori knowledge or the limitation of computational resources, which lacks of theoretical basis. To address these issues, we propose an approach to the model selection of large-scale kernel methods based on the spectral analysis of integral operator. First, we explicitly construct the integral operator space and present an explicit description of the feature space, which can enhance the interpretability of the model. Then, we study the spectral generalization bound from the viewpoints of stability and model complexity, and build the spectral generalization theory of model selection for large-scale kernel methods. Finally, we design spectral model selection criteria and algorithms with interpretation,theoretical guarantee and high computational efficiency. The proposed model selection approach provides an effective way to model selection of large-scale kernel methods, and develops the learning theory and method of large-scale kernel methods.

机器学习算法的归纳偏置是否与问题本身匹配，直接决定学习算法的性能。大规模核方法模型选择是确定大规模核学习算法归纳偏置的过程，是大规模核方法理论研究和实际应用的瓶颈和关键。现有核方法模型选择方法缺乏对核函数诱导的特征空间的精确刻画，模型选择缺乏可解释性，而且大多是基于全数据的精确计算，缺乏可扩展性；大规模核方法大多根据先验知识或计算资源限制经验地进行模型选择，缺乏坚实理论依据。针对上述现状，提出基于积分算子谱分析的大规模核方法模型选择方法。首先，显式构造积分算子空间，从而精确刻画特征空间结构，增强模型可解释性。在此基础上，从积分算子谱稳定性和复杂度角度研究泛化误差，建立大规模积分算子谱模型选择泛化理论。最后，设计兼具可解释性、泛化理论保障和计算高效的谱模型选择准则和算法。本项目提出的积分算子谱分析方法为大规模核方法的模型选择提供一种有效途径，发展大规模核方法的理论和方法。

结项摘要

机器学习算法的归纳偏置是否与所需处理的问题本身匹配，直接决定机器学习算法的学习性能，是机器学习研究的核心问题之一。大规模核方法是当前大数据分析和挖掘常用机器学习算法之一，具有坚实的理论基础，完备的学习框架。大规模核方法模型选择是确定核方法归纳偏置的过程，是保证大规模核方法学习性能的关键，也是大规模核方法理论研究和实际应用的重要问题。已有的核方法模型选择方法是在核诱导的隐式特征空间中进行模型选择，缺乏直观可解释性，而且已有方法大多基于全数据的精确计算，缺乏可扩展性，不适应于大规模问题；现有的大规模核方法，大多根据先验知识或计算资源经验地设置模型参数，缺乏坚实的理论依据，难以保证所选模型的学习性能。为了克服上述问题，本项目在大规模核方法模型选择选、准则和算法层面都进行了研究，并取得了丰富的成果。理论方面：初步构建了核函数选择理论，为该领域提供了坚实的理论支撑，发展了核方法学习理论。准则方面：提出了兼具理论支撑和效率的核函数选择准则，在保证准确率的情况下，比已有方法时间减少一个量级以上，为处理大规模核函数选择提供了一种有效的途径。算法方面：首次提出了泛化误差可达O(1/n)收敛率的核函数选择算法，准确率比已有方法有显著提升。共发表CCF A或SCI一区文章10篇，包括人工智能顶级期刊TPAMI、TNNLS、TCYB和顶级会议NeurIPS、AAAI、IJCAI上，并撰写专利2个。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（2）

会议论文数量（6）

专利数量（2）

Approximate Kernel Selection via Matrix Approximation

通过矩阵近似来近似核选择

DOI：
10.1109/tnnls.2019.2958922
发表时间：
2020-01
期刊：
IEEE Transactions on Neural Networks and Learning Systems
影响因子：
10.4
作者：
Lizhong Ding;Shizhong Liao;Yong Liu;Li Liu;Fan Zhu;Yazhou Yao;Ling Shao;Xin Gao
通讯作者：
Xin Gao

Sketch Kernel Ridge Regression Using Circulant Matrix: Algorithm and Theory

使用循环矩阵绘制核岭回归：算法和理论