RI: Small: Stochastic Planning and Probabilistic Inference for Factored State and Action Spaces

RI：小：因子状态和行动空间的随机规划和概率推理

基本信息

批准号：
2002393
负责人：
Roni Khardon
金额：
$ 17.77万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-11-01 至 2022-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2002393&HistoricalAwards=false
关键词：
RI Small Stochastic Planning Probabilistic

项目摘要

Many important problems require control of multiple actuators, or agents, in parallel, to achieve a common coordinated goal in a stochastic environment. Examples of such problems include scheduling in a building with multiple elevators, managing a team for fire and rescue operations, managing the inventory of a large company, controlling a robotic soccer team, and controlling a robotic team to manage shelving and orders in a warehouse environment. These problems naturally fit into a formulation as discrete-time central-control problems where we design an algorithm that decides what action each agent takes at any time step in order to optimize the common objective. The corresponding computational problem, known as stochastic planning, is challenging due its sheer size. In particular, the number of possible states (for example, possible positions of robots, shelves and merchandise in a warehouse) and the number of possible joint actions (combinations of actions of individual robots) are huge in any problem instance of interest. State of the art approaches typically fail due to requiring too much time to properly search for a good policy or due to requiring too much memory to store intermediate values. By viewing stochastic planning through the lens of probabilistic inference, this project proposes several novel domain independent algorithmic approaches that take advantage of problem structure to calculate approximate solutions effectively under time constraints. The project funds are largely devoted to support training and research of PhD students therefore directly support human development in an important high impact area for the nation. More concretely, we propose three competing approaches to solving such problems, all taking insight from formulating the finite horizon control problem as probabilistic inference in a corresponding graphical model, also known as a dynamic Bayesian network. The first approach uses the idea of Monte Carlo search, but adds a strong symbolic component by introducing aggregate trajectories. Aggregate trajectories are obtained by simulating a compositional symbolic model under independence assumptions over the random variables. Each aggregate trajectory provides a value estimate that is approximate but can replace numerous individual trajectories. In this way we get fast approximation of values and effective control under time constraints. The second approach uses problem structure to translate the inference problem into an integer linear program, where the objective and quality of the solution can be traded-off for speed through problem decomposition. A novel construction shows how to sidestep the exponential complexity of the problem and obtain a sequence of integer programs that are both small and decomposable so as to yield effective control under time constraints. The third approach, or more accurately framework, builds on the tight connection between stochastic planning and probabilistic inference in the corresponding dynamic Bayesian network. We show that variants of the first two approaches can be viewed in this light, and through this we propose new inference algorithms for solving the stochastic planning problem. In addition, based on this analysis, we propose new algorithms for probabilistic inference, and new generalized inference questions that go beyond current research on marginal map in graphical models.

许多重要的问题需要同时控制多个致动器或代理，以在随机环境中实现共同的协调目标。此类问题的例子包括在具有多个电梯的建筑物中进行安排，管理一个团队进行消防和救援行动，管理大型公司的库存，控制机器人足球团队，并控制机器人团队在仓库环境中管理货架和订单。这些问题自然地适合公式作为离散时间中央控制问题，在该问题中，我们设计了一种算法，该算法决定每个代理在任何时间步骤中采取的措施以优化共同目标。相应的计算问题（称为随机计划）由于其庞大的规模而具有挑战性。特别是，在任何问题的问题实例中，可能的状态数量（例如，机器人，货架和商品的可能位置）和可能的联合行动数量（单个机器人的动作组合）都是巨大的。技术方法通常由于需要太多时间来正确搜索良好的策略或需要过多的内存以存储中间值而失败。通过通过概率推断的镜头查看随机计划，该项目提出了几种新型域独立算法方法，这些方法利用问题结构在时间限制下有效地计算出近似解决方案。因此，该项目资金主要致力于支持博士生的培训和研究，因此直接支持国家重要的高影响力领域的人类发展。更具体地说，我们提出了三种竞争方法来解决此类问题，所有这些方法都从将有限的地平线控制问题提出为概率的推断中，这在相应的图形模型中，也称为动态贝叶斯网络。第一种方法使用蒙特卡洛搜索的概念，但通过引入骨料轨迹来增加强大的符号组件。通过在随机变量上模拟独立假设下的组成符号模型来获得骨料轨迹。每个骨料轨迹提供了一个近似值，但可以替代许多单个轨迹的值估计值。通过这种方式，我们可以在时间限制下快速近似值和有效控制。第二种方法使用问题结构将推理问题转化为整数线性程序，在该程序中，可以通过问题分解来交易解决方案的目标和质量。一种新颖的结构显示了如何避开问题的指数复杂性，并获得一系列既小且可分解的整数程序序列，以便在时间限制下产生有效的控制。第三种方法或更准确的框架是建立在相应动态贝叶斯网络中随机计划与概率推断之间的紧密联系上的。我们表明，可以从这一观点中查看前两种方法的变体，通过此方法，我们提出了解决随机计划问题的新推理算法。此外，基于此分析，我们提出了用于概率推断的新算法，以及在图形模型中对边缘地图的当前研究超出了当前研究的新概括推理问题。