CIF: Small: Theory and Algorithms for Efficient and Large-Scale Monte Carlo Tree Search

CIF：小型：高效大规模蒙特卡罗树搜索的理论和算法

基本信息

批准号：
2327013
负责人：
Kwang-Sung Jun
金额：
$ 59.92万
依托单位：
University of Arizona
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-12-01 至 2026-11-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2327013&HistoricalAwards=false
关键词：
CIF Small Theory Algorithms Efficient

项目摘要

Monte Carlo tree search (MCTS) is a versatile online planning methodology for sequential decision-making problems such as reinforcement learning that has recently shown empirical success in real-world problems including games, chemical synthesis, materials/drug discovery, and numerical algorithms. However, there is a huge gap between existing MCTS theory and practice because (i) the de facto standard MCTS algorithm called upper confidence bound for trees (UCT) is known to be provably suboptimal, (ii) existing theories are limited to asymptotic or worst-case analyses, and (iii) the optimal performance rates of MCTS algorithms are not known. This implies that the state-of-the-art MCTS methods might still be far from realizing their full potential, and further developments are required to prepare for the next generations of much larger and more complex decision-making problems. This project focuses on bridging the gap between theory and practice in MCTS methodology by developing novel MCTS algorithms with strong mathematical performance guarantees, establishing the optimal performance rates, and evaluating them in real-world applications. This project integrates education into research by developing a course module and building interdisciplinary teams of undergraduates who will work closely with material scientists to evaluate the developed algorithms on materials discovery tasks. The project consists of three main directions: the foundations of MCTS, large-scale MCTS, and the design of experiments for MCTS. Each direction contains several main objectives: (i) for the foundations of MCTS, the focus is to improve maximum mean estimator and leverage tools from a related problem called pure exploration to develop algorithms with strong guarantees and study information-theoretic limits of MCTS; (ii) for the large-scale MCTS, the focus is to analyze and improve existing heuristics for large-scale MCTS problems such as progressive widening, incremental depth expansion, and function approximations; (iii) for the design of experiments for MCTS, the focus is to develop experimental design methods to efficiently train function approximations for MCTS with a small number of samples. In addition to theoretical and algorithmic developments, the project also aims at implementing all algorithms developed as open-source software, evaluating them using benchmark datasets, and applying them to material science tasks via the interdisciplinary teams of undergraduates as part of the educational aim.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

蒙特卡洛树搜索 (MCTS) 是一种用于序列决策问题（例如强化学习）的多功能在线规划方法，最近在游戏、化学合成、材料/药物发现和数值算法等现实世界问题中取得了经验上的成功。然而，现有的 MCTS 理论与实践之间存在巨大差距，因为 (i) 事实上的标准 MCTS 算法（称为树置信上限 (UCT)）被证明是次优的，(ii) 现有理论仅限于渐近或最差-案例分析，以及 (iii) MCTS 算法的最佳性能速率未知。这意味着最先进的 MCTS 方法可能仍远未充分发挥其潜力，需要进一步发展，为下一代更大、更复杂的决策问题做好准备。该项目的重点是通过开发具有强大数学性能保证的新型 MCTS 算法、建立最佳性能率并在实际应用中对其进行评估，来缩小 MCTS 方法理论与实践之间的差距。该项目通过开发课程模块和建立跨学科本科生团队将教育融入研究，这些团队将与材料科学家密切合作，评估材料发现任务中开发的算法。该项目包括三个主要方向：MCTS基础、大规模MCTS和MCTS实验设计。每个方向都包含几个主要目标：（i）对于 MCTS 的基础，重点是改进最大均值估计器并利用称为纯探索的相关问题的工具来开发具有强保证的算法并研究 MCTS 的信息论极限； (ii)对于大规模MCTS，重点是分析和改进现有的大规模MCTS问题的启发式方法，例如渐进加宽、增量深度扩展和函数逼近； (iii)对于MCTS的实验设计，重点是开发实验设计方法，以有效地训练少量样本的MCTS函数逼近。除了理论和算法开发之外，该项目还旨在实现作为开源软件开发的所有算法，使用基准数据集对其进行评估，并通过本科生跨学科团队将其应用于材料科学任务，作为教育目标的一部分。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。