RI: Small: Learning Strategic Behavior in Sequential Decision Tasks

RI：小：学习顺序决策任务中的策略行为

基本信息

批准号：
0915038
负责人：
Risto Miikkulainen
金额：
$ 45.5万
依托单位：
University of Texas at Austin
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2009
资助国家：
美国
起止时间：
2009-09-01 至 2014-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0915038&HistoricalAwards=false
关键词：
RI Small Learning Strategic Behavior

项目摘要

Many routine, real-world tasks can be seen as sequential decision tasks. For instance, navigating a robot through a complex environment, driving a car in congested traffic, and routing packets in a computer network requires making a sequence of decisions that together minimize time and resources used. It would be desirable to automate these tasks, yet it is difficult because the optimal decisions are generally not known. Many existing learning methods lead to reactive behaviors that perform well in short term, but do not amount to intelligent high-level behavior in the long term. This project is developing methods for learning strategic high-level behavior. Strategic methods need to (1) retain information from past states, (2) learn multimodal behavior, (3) choose between the different behaviors based on crucial detail, and (4) implement a sequential high-level strategy based on those behaviors. The neuroevolution methods developed in prior work solve the first problem by evolving (through genetic algorithms) recurrent neural networks to represent the behavior. To solve the remaining problems, these methods are being extended in the proposed work with multi-objective optimization, local nodes with cascaded structure, and with evolution of modules and their combinations. Preliminary results indicate that this approach is indeed feasible. In the long term, developed technology will make it possible to build robust sequential decision systems for real-world tasks. It leads to safer and more efficient vehicle, traffic, and robot control, improved process and manufacturing optimization, and more efficient computer and communication systems. It will also make the next generation of video games possible, with characters that exhibit realistic, strategic behaviors: Such technology should lead to more effective educational and training games in the future. The OpenNERO open source software platform developed in this work will be made available to the research community.

许多常规的现实任务可以看作是顺序决策任务。例如，在复杂的环境中导航机器人，在交通拥堵的流量中驾驶汽车以及计算机网络中的路由数据包需要做出一系列决策，以最大程度地减少所使用的时间和资源。希望自动执行这些任务是可取的，但是很困难，因为最佳决策通常不知道。许多现有的学习方法会导致反应性行为在短期内表现良好，但从长远来看并不等于智能的高级行为。该项目正在开发学习战略高级行为的方法。战略方法需要（1）保留过去状态的信息，（2）学习多模式行为，（3）基于关键细节的不同行为，以及（4）基于这些行为实施顺序的高级策略。在先前工作中开发的神经进化方法通过（通过遗传算法）复发性神经网络来代表行为来解决第一个问题。为了解决剩余的问题，这些方法在拟议的工作中扩展了多目标优化，具有级联结构的局部节点以及模块及其组合的演变。初步结果表明这种方法确实是可行的。从长远来看，开发的技术将使为实际任务构建强大的顺序决策系统成为可能。它导致更安全，更高效的车辆，交通和机器人控制，改进的过程和制造优化以及更有效的计算机和通信系统。这也将使下一代视频游戏成为可能，具有表现出现实的战略行为的角色：这种技术应在未来带来更有效的教育和培训游戏。这项工作中开发的开放式开源软件平台将提供给研究社区。