基于部分感知模型的贝叶斯强化学习理论及方法

结题报告

项目介绍

AI项目解读

基本信息

批准号：
61772355
项目类别：
面上项目
资助金额：
65.0万
负责人：
刘全
依托单位：
苏州大学
学科分类：
F06.人工智能
结题年份：
2021
批准年份：
2017
项目状态：
已结题
起止时间：
2018-01-01 至2021-12-31

项目参与者：
朱斐；傅启明；钟珊；王浩；钱伟晟；翟建伟；章鹏；徐进；梁斌；
关键词：
模型学习部分感知模型值学习贝叶斯强化学习策略学习

项目摘要

Based on fast model learning, this project proposes a method of Bayesian reinforcement learning with partially observable Markov decision processes. This method solves the problems that the environment is partially observable and the knowledge of the model is unknown. The main contents of study are as follows: i. In the discrete state space, we intend to propose a method of Bayesian dynamic programming, based on intelligent model learning. This method may solve the problems that the noise of partially observable models impacts the computation of value functions, such as the convergent speed and accuracy. ii. In partially observable models, it is difficult to predict the unknown states. This leads to the problems that we obtain a suboptimal policy, not the optimal one. To solve this problem, we intend to construct a Bayesian model of dynamic decision network based on discrete state space. iii. The calculation of optimal value functions rely on the model of environment, but the model is partially observable at the beginning. To solve this problem, we intend to present a method to optimize the model of the environment by cross entropy. iv. We intend to propose a method of adaptive Bayesian programming based on Gaussian processes. It can solve the problems of 'curse of dimensionality' and 'curse of history' in the continuous state space, with the partially observable models. v. For the problems with POMDPs, if we want to extend the discrete state space to the continuous one, there are a lot of problems, such as the computational complexity and performance of convergence. We intend to propose a method without discretization. vi. We intend to design a system to realize the aforementioned theory and optimized algorithms, and apply to the problems of robot navigation. Therefore, partially observable model-based Bayesian reinforcement study, has a certain theoretical value and a wide range of application prospects.

本项目在环境部分感知且环境模型未知的情况下，提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括：1. 针对模型部分感知对值函数计算带来的噪声干扰等问题，提出一种基于智能模型学习的贝叶斯动态规划方法。 2. 针对部分感知模型中未知状态难以预测，导致求解最优策略时出现扰动等问题，提出基于离散状态空间来构造动态决策网络的贝叶斯模型。3.针对计算最优值函数依赖环境模型等问题，提出通过交叉熵优化环境模型的方法。4. 针对在部分感知模型下，连续状态空间强化学习出现的“维数灾”和“经验灾”问题，提出基于高斯过程的自适应贝叶斯规划方法。5.针对离散状态的部分马氏问题扩展到连续状态空间时，出现的计算复杂等问题，提出一种在连续状态空间中采取非离散化解决问题的方法。6. 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究，既具有一定的理论价值，又具有广泛的应用前景。

结项摘要

本项目在环境部分感知且环境模型未知的情况下，提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括：(1) 针对模型部分感知对值函数计算带来的噪声干扰等问题，提出一种基于智能模型学习的贝叶斯动态规划方法。 (2) 针对部分感知模型中难以预测未知状态，导致求解最优策略时出现扰动等问题，提出基于离散状态空间构造动态决策网络的贝叶斯模型。(3) 针对计算最优值函数依赖环境模型，提出通过交叉熵优化环境模型的方法。(4) 针对在部分感知模型下，连续状态空间强化学习出现的“维数灾”和“经验灾”问题，提出基于高斯过程的自适应贝叶斯规划方法。(5) 针对离散状态的部分马氏问题扩展到连续状态空间时，出现的计算复杂等问题，提出一种在连续状态空间中采取非离散化解决问题的方法。(6) 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究，既具有一定的理论价值，又具有广泛的应用前景。