CIF: Small: Accelerating Stochastic Approximation for Optimization and Reinforcement Learning
CIF:小型:加速优化和强化学习的随机逼近
基本信息
- 批准号:2306023
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-01 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project concerns the design and analysis of recursive algorithms, which have broad applications in engineering and computer science. Recursive algorithms play a crucial role in machine learning systems like ChatGPT, which rely on large amounts of data for training. Reinforcement learning, a field with numerous famous examples, utilizes recursive algorithms for training computer programs; among the most famous examples include computer programs that excel in games such as GO and chess. Training is interpreted as "learning" optimal responses (e.g., the next move) based on observations (the current configuration of a chessboard). While stochastic approximation is recognized as a mathematical model for recursive algorithms and plays a major role in the mathematical theory of learning, the supporting theory has not kept pace with empirical success. In reinforcement learning, it is often uncertain if training will be successful or how much training is required. Along with fundamental research to create new foundations for algorithmic learning, the research project also involves graduate student mentoring, dissemination of new and existing research results through online video lectures, and also dissemination through the Workshop on Cognition and Control organized by the investigator, which is held annually at the University of Florida attracting speakers from across the U.S. and abroad. Techniques will be developed to ensure stability and accelerate convergence of stochastic approximation algorithms in terms of transients and variance. New approaches to algorithm design will include techniques based on ordinary differential equation methods, recent theory of Markov processes, and approaches to learning based on quasi-random exploration. Much of the work in algorithm design reduces to a feedback control problem, initially posed in continuous time to leverage concepts from nonlinear control and stability theory. A remarkable example is the Newton-Raphson flow which is globally convergent under mild assumptions. A dependable "algorithmic feedback law" in continuous time is then translated into a reliable and efficient algorithm implemented in discrete time. The general theory will be developed within two specific application areas: reinforcement learning and gradient-free optimization. Reinforcement learning presents the greatest challenge because, to-date, there is little theory available to establish the stability of these recursive algorithms outside of very special cases. Moreover, in recent work the investigator with his students have shown that Markovian memory can result in very slow convergence, even when the algorithm is optimized; in such cases it is necessary to change the algorithmic goal without negatively impacting the quality of the final solution delivered by the algorithm. In the case of reinforcement learning the primary objective is to efficiently learn an effective rule for decision making (i.e., a policy). Fortunately, there is great freedom in choosing a criterion of fit for learning the best policy within a given class.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目涉及在工程和计算机科学中广泛应用的递归算法的设计和分析。递归算法在诸如Chatgpt之类的机器学习系统中起着至关重要的作用,该系统依靠大量数据进行培训。强化学习是一个有许多著名示例的领域,它利用递归算法来培训计算机程序;最著名的示例包括在GO和Chess等游戏中表现出色的计算机程序。培训被解释为基于观察结果(当前的棋盘配置)的“学习”最佳响应(例如,下一步)。虽然随机近似被认为是递归算法的数学模型,并且在学习的数学理论中起着重要作用,但支持理论并未与经验成功保持同步。在加强学习中,通常不确定培训是否成功或需要多少培训。 与为算法学习创造新的基础的基本研究,该研究项目还涉及研究生指导,通过在线视频讲座传播新的和现有的研究结果,并通过研究人员组织的认知和控制研讨会传播,该研讨会每年在佛罗里达大学举行,在佛罗里达大学吸引了美国和法国的演讲者。 将开发技术以确保在瞬态和方差方面的稳定性和加速随机近似算法的收敛性。算法设计的新方法将包括基于普通微分方程方法的技术,马尔可夫过程的最新理论以及基于准随机探索的学习方法。 算法设计中的大部分工作都减少了反馈控制问题,最初是在连续时间提出的,以利用非线性控制和稳定性理论的概念。 一个了不起的例子是牛顿 - 拉夫森流,该流程在温和的假设下是全球收敛的。 然后,在连续时间内可靠的“算法反馈法”被转化为在离散时间内实施的可靠,有效算法。一般理论将在两个特定的应用领域内开发:强化学习和无梯度优化。强化学习提出了最大的挑战,因为迄今为止,在非常特殊情况之外,几乎没有理论来确定这些递归算法的稳定性。 此外,在最近的工作中,调查员与他的学生一起表明,即使优化了算法,马尔可夫的记忆也会导致非常缓慢的收敛。在这种情况下,有必要更改算法目标,而不会对算法提供的最终解决方案的质量产生负面影响。 在强化学习的情况下,主要目标是有效学习决策的有效规则(即政策)。幸运的是,选择适合在给定课程中的最佳政策的标准有很大的自由。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛的影响评估标准通过评估来支持的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sean Meyn其他文献
Balancing the Power Grid with Cheap Assets---Tutorial Lecture
用廉价资产平衡电网---教程讲座
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Sean Meyn;Fan Lu;Joel Mathias - 通讯作者:
Joel Mathias
Convex Q-Learning in Continuous Time with Application to Dispatch of Distributed Energy Resources
连续时间凸Q学习在分布式能源调度中的应用
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Fan Lu;Joel Mathias;Sean Meyn;Karan Kalsi - 通讯作者:
Karan Kalsi
Revisiting Step-Size Assumptions in Stochastic Approximation
重新审视随机逼近中的步长假设
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Caio Kalil Lauand;Sean Meyn - 通讯作者:
Sean Meyn
Sean Meyn的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sean Meyn', 18)}}的其他基金
Characterizing capacity of controllable DERs to provide energy storage service to the power grid
表征可控分布式能源为电网提供储能服务的能力
- 批准号:
2122313 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Reinforcement Learning and Kullback-Leibler Stochastic Optimal Control for Complex Networks
复杂网络的强化学习和 Kullback-Leibler 随机最优控制
- 批准号:
1935389 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Distributed Control for Demand Dispatch: The Creation of Virtual Energy Storage from Flexible Loads
需求调度的分布式控制:灵活负载创建虚拟储能
- 批准号:
1609131 - 财政年份:2016
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CPS:Medium:Collaborative Research: Smart Power Systems of the Future: Foundations for Understanding Volatility and Improving Operational Reliability
CPS:中:合作研究:未来的智能电力系统:理解波动性和提高运行可靠性的基础
- 批准号:
1259040 - 财政年份:2012
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CPS:Medium:Collaborative Research: Smart Power Systems of the Future: Foundations for Understanding Volatility and Improving Operational Reliability
CPS:中:合作研究:未来的智能电力系统:理解波动性和提高运行可靠性的基础
- 批准号:
1135598 - 财政年份:2011
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Robust Inference and Communication: Theory, Algorithms and Performance Analysis
稳健的推理和交流:理论、算法和性能分析
- 批准号:
0729031 - 财政年份:2007
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Visualization & Optimization Techniques For Analysis and Design of Complex Systems
可视化
- 批准号:
0217836 - 财政年份:2002
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
US-India Workshop: Learning, Adaptation, and Optimization, Kerala, India, December 2000
美印研讨会:学习、适应和优化,印度喀拉拉邦,2000 年 12 月
- 批准号:
0079744 - 财政年份:2000
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Optimization and Performance Evaluation of Network Models
网络模型的优化和性能评估
- 批准号:
9972957 - 财政年份:1999
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似国自然基金
小胶质细胞NOX2在下丘脑促觉醒神经肽Orexin介导的β淀粉样蛋白加速阿尔茨海默病进程中的作用及分子机制研究
- 批准号:
- 批准年份:2019
- 资助金额:55 万元
- 项目类别:面上项目
小分子多肽跨膜输运序列选择性及其物理机制的理论研究
- 批准号:11804151
- 批准年份:2018
- 资助金额:27.0 万元
- 项目类别:青年科学基金项目
基于深度学习的小物体检测及其异构计算技术研究
- 批准号:61872200
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
内皮抗衰老蛋白SIRT1调控组织因子在巨细胞病毒隐性感染协同高脂血症加速脑小血管血栓形成中的作用
- 批准号:81801384
- 批准年份:2018
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
磁层和近地太阳风里小尺度磁结构的观测研究
- 批准号:41774153
- 批准年份:2017
- 资助金额:70.0 万元
- 项目类别:面上项目
相似海外基金
CC* INTEGRATION-SMALL: ADIABATIC MICROSERVICE LEVEL LOAD BALANCED FORWARDING ON PISA SWITCH FOR ACCELERATING URGENT PROCESSES IN SCIENCE DATA CENTER NETWORKS
CC* 集成小型:PISA 交换机上的绝热微服务级负载平衡转发,用于加速科学数据中心网络中的紧急进程
- 批准号:
2346729 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CSR: Small: Accelerating Data Intensive Scientific Workflows with Consistency Contracts
CSR:小:通过一致性合同加速数据密集型科学工作流程
- 批准号:
2317556 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Tele-FootX: Virtually Supervised Tele-Exercise Platform for Accelerating Plantar Wound Healing
Tele-FootX:用于加速足底伤口愈合的虚拟监督远程锻炼平台
- 批准号:
10701324 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
New Technologies for Accelerating the Discovery and Characterization of Neuroactives that Address Substance Use Disorders
加速发现和表征解决药物使用障碍的神经活性物质的新技术
- 批准号:
10680754 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Accelerating Functional Maturation of Human iPSC-Derived Astrocytes
加速人 iPSC 衍生的星形胶质细胞的功能成熟
- 批准号:
10699505 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别: