CAREER: Lyapunov Drift Methods for Stochastic Recursions: Applications in Cloud Computing and Reinforcement Learning

职业：随机递归的李亚普诺夫漂移方法：云计算和强化学习中的应用

基本信息

批准号：
2144316
负责人：
Siva Theja Maguluri
金额：
$ 50万
依托单位：
Georgia Tech Research Corporation
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-05-01 至 2027-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2144316&HistoricalAwards=false
关键词：
CAREER Lyapunov Drift Methods Stochastic

项目摘要

Part I:The ongoing Artificial Intelligence revolution is possible due to progresses in two distinct areas. The first is the development of novel algorithms in machine learning paradigms such as Reinforcement Learning, that overcome long-standing challenges; the second is the breakthroughs in cloud computing infrastructure based on large data centers that enables one to collect, store and process large amounts of data very easily and at a short notice. In spite of tremendous success stories in both these areas, fundamental trade-offs and optimal performance is not understand and theory lags far behind practice. In spite of seeming to be very distinct problems, both Reinforcement Learning and Cloud computing can be studied using stochastic recursions. The goal of this CAREER project is to take a unified theoretical viewpoint of both these seemingly distinct areas first developing a general theory of stochastic recursions, and then to use it to study both Reinforcement Learning and Cloud computing. In particular, we will use the theory to develop novel learning algorithms with provably optimal sample complexity across various paradigms such as off-policy learning and actor-critic framework. The theory of stochastic recursions as well as the novel learning algorithms will also be used to develop optimal scheduling algorithms for cloud computing data centers that minimize the tail of delay experienced by the users. The novel algorithms developed during the course of this project will be implemented through collaborations with partners in industry as well as at Georgia Tech’s internal cloud. A Jupyter based open source RL simulation platform will be developed, and the novel algorithms developed during the course of this project will be included in this platform. The platform is used not only in dissemination of the outcome of this project, but also for undergraduate research projects, course projects for a new course on Reinforcement learning, and for STEM outreach activities to K-12 education. In addition to dissemination of research results through conferences and journal publications, we will develop a novel special topics course, and bring out a monograph on the unified Lyapunov framework for stochastic recursions. In addition, training of graduate and undergraduate students forms a core part of the project with special emphasis on mentoring future faculty. Part 2: Intellectual Merit:The proposed work is organized into three interdependent thrusts. Thrust I builds a Lyapunov theory of stochastic recursions, where we obtain finite-time mean square error and exponential tail bounds, as well as characterize the steady-state limiting distribution for a broad class of stochastic recursions. This thrust forms the foundation for the next two thrusts.Thrust II studies the finite-time mean-square bounds, tail probability bounds (aka PAC bounds), sample complexity, and steady-state behavior of RL algorithms under three paradigms, viz., off-policy RL, two time-scale policy space algorithms (such as actor-critic) and average reward RL, and develops novel, fast, RL algorithms with near optimal sample efficiency. Thrust-III studies scheduling problems in data center networks, with the goal of minimizing mean delay and delay tails. Using the Lyapunov theory from Thrust I, we develop novel low complexity algorithms with provable guarantees on steady-state delay in the heavy-traffic asymptotic regime. With these as initial policies, we will deploy RL algorithms from Thrust II to learn new scheduling policies that are optimal even in the preasymptotic regime, which is of practical interest. All the proposed algorithms will be evaluated using real world traffic traces through our collaborations with industry partners. Broader Impacts:The proposed work, and the PI’s ongoing industry collaborations have potential for significant societal impact by making RL and cloud computing more efficient. The proposed Lyapunov theory for Stochastic Recursions is applicable in many other disciplines. And so, the PI will disseminate it widely through a special topics course, a monograph, and tutorials, in addition to conference and journal publications. The project integrates research with educational activities at every level. A Jupyter based RL simulation platform and a library of notebooks that we will build, will serve as an extensive pedagogical resource for these activities. The PI will continue his ongoing involvement in undergraduate research through the REU program and the VIP program at Georgia Tech. In order to fulfill a growing demand, the PI will develop a new interdisciplinary undergraduate level RL course and extensively use the RL simulation platform. To promote STEM activities, the PI will take part in outreach activities to local high schools working with an academic professional in ISyE and will mentor high school teachers through the GIFT program. To support Ph.D. students interested in academic career, the PI runs a future faculty mentorship program. The PI is committed to broadening participation, and currently advises a female Hispanic student, and has advised several URM undergraduate students.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

第一部分：由于两个不同领域的进展，持续的人工智能革命是可能的。首先是在机器学习范式（例如增强学习）中开发新型算法，从而克服了长期的挑战。第二个是基于大数据中心的云计算基础架构中的突破，这些数据中心使人们能够在短时间内非常容易地收集，存储和处理大量数据。尽管在这两个领域都有巨大的成功案例，但基本的权衡和最佳表现却没有理解，理论却远远落后于实践。尽管似乎是非常明显的问题，但可以使用随机递归研究加强学习和云计算。这个职业项目的目的是对两个看似不同的领域的统一理论观点首先开发出随机递归的一般理论，然后使用它来研究增强学习和云计算。特别是，我们将使用该理论来开发新的学习算法，具有在各种范式（例如非政策学习和参与者 - 批评框架）中具有适当最佳样本复杂性。随机递归理论以及新颖的学习算法也将用于开发用于云计算数据中心的最佳调度算法，以最大程度地减少用户的延迟体验的尾巴。在该项目过程中开发的新颖算法将通过与行业合作伙伴以及佐治亚理工学院的内部云合作实施。将开发基于jupyter的开源RL模拟平台，并将在该项目过程中开发的新算法将包含在此平台中。该平台不仅用于传播该项目的结果，还用于本科研究项目，有关强化学习新课程的课程项目以及对K-12教育的STEM外展活动。除了通过会议和期刊出版物传播研究结果外，我们还将开发一个新颖的特殊主题课程，并为统一的Lyapunov框架进行随机递归框架的专着。此外，对研究生和本科生的培训构成了该项目的核心部分，特别着重于心理未来的教师。第2部分：智力优点：拟议的工作被组织成三个相互依存的推力。推力我构建了莱普诺夫随机递归理论，在该理论中，我们获得了有限的均方根误差和指数尾巴边界，并表征了一系列随机递归的稳态限制分布。这种推力为接下来的两个推力构成基础。具有接近最佳的样品效率。推力III研究数据中心网络中的调度问题，目的是最大程度地减少平均延迟和延迟尾巴。使用推力I的Lyapunov理论，我们开发了新型的低复杂性算法，并在重型交通不对称方案中可证明保证了稳态延迟。有了这些作为初始政策，我们将从推力II中部署RL算法，以学习即使在具有实际利益的Pres -Metrymmetric Sengime中，这些新的调度策略也是最佳的。所有提出的算法将通过与行业合作伙伴的合作使用现实世界的交通痕迹进行评估。更广泛的影响：拟议的工作以及PI持续的行业合作，通过使RL和云计算更有效地具有重大社会影响。提出的用于随机递归的Lyapunov理论适用于许多其他学科。因此，除了会议和期刊出版物外，PI还将通过特殊主题课程，专着和教程进行广泛的传播。该项目将研究与各个级别的教育活动相结合。基于jupyter的RL仿真平台和我们将要构建的笔记本库，将作为这些活动的广泛教学资源。 PI将继续通过REU计划和佐治亚理工学院的VIP计划继续他的本科研究。为了满足不断增长的需求，PI将开发新的跨学科本科RL课程，并广泛使用RL模拟平台。为了促进STEM活动，PI将与ISYE的学术专业人士一起参加外展活动，并通过礼物计划将精神高中老师参加。支持博士对学术职业感兴趣的学生，PI开展了未来的教师心态计划。 PI致力于扩大参与，目前为一名西班牙裔女学生提供建议，并已为几位URM本科生提供了建议。该奖项反映了NSF的法定任务，并通过使用该基金会的知识分子的优点和更广泛的影响来审查标准，认为通过评估来表现出宝贵的支持。