Novel Algorithms to Approximate the Future Consequence of Sequential Decisions

近似连续决策的未来后果的新算法

基本信息

批准号：
RGPIN-2017-04877
负责人：
SabouriBaghAbbas, Alireza
金额：
$ 1.46万
依托单位：
University of Calgary
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=677241
关键词：
Novel Algorithms Approximate Future Consequence

项目摘要

Many complex problems arising in business, health care, and transportation can be modelled as sequential decision making problems under uncertainty, meaning that a decision maker has to make decisions periodically while some random events unfold over time. For instance, an airline dynamically changes the fare for different flights over a network of cities without knowing the actual future demand, trying to maximize its revenue while managing the risk of unsold seats. These problems can be conveniently modelled in the form of dynamic programs, a method that finds the best decision by maximizing the sum of immediate reward and the expected future reward. Unfortunately, for many practical problems, the number of future scenarios that one should consider in order to calculate the expected future reward function is exponentially large, making exact calculation of this function intractable. In order to overcome this issue, approximate dynamic programming (ADP) methods have been developed to find an approximate optimal solution. ******A cornerstone of many ADP algorithms is defining a set of basis functions (or an approximation architecture) for approximating the future consequence of present decisions (the expected future reward function). Currently, the choice of basis functions requires prior expert knowledge about the problem, and is usually considered as more of an art than a science. My research program aims to develop, study, and apply novel algorithms that automate generation of basis functions by efficiently selecting a subset of functions from a large pool of potential basis functions, and updating this set as more information becomes known about the problem. The benefit of such algorithms is twofold: first, it reduces the burden to come up with a well-informed set of basis functions that requires significant prior knowledge about the problem; second, since many potential candidates are considered for basis functions, it is expected that the quality of the approximation is improved. My short-term objective includes evaluating the performance of the proposed algorithms in a variety of application areas, such as perishable inventory management, patient scheduling, and revenue management.******ADP is a general method that is commonly used for solving many different problems in a variety of applications. As the quality of the policies generated by these algorithms is dependent on the quality of the basis functions chosen, it would be of great interest, both theoretically and practically, if the process of generating and selecting basis functions can be automated. Therefore, even a small improvement achieved by the findings of my research would have significant practical implications in multiple application areas.

商业、医疗保健和交通运输中出现的许多复杂问题可以建模为不确定性下的顺序决策问题，这意味着决策者必须在一些随机事件随着时间的推移而发生的同时定期做出决策。例如，一家航空公司在不知道未来实际需求的情况下动态改变城市网络中不同航班的票价，试图在管理未售出座位的风险的同时最大化其收入。这些问题可以方便地以动态程序的形式建模，动态程序是一种通过最大化即时奖励和预期未来奖励之和来找到最佳决策的方法。不幸的是，对于许多实际问题，为了计算预期的未来奖励函数而应该考虑的未来场景的数量呈指数级增长，使得该函数的精确计算变得困难。为了克服这个问题，已经开发了近似动态规划（ADP）方法来寻找近似最优解。 ******许多 ADP 算法的基石是定义一组基函数（或近似架构），用于近似当前决策的未来结果（预期的未来奖励函数）。目前，基函数的选择需要有关该问题的事先专业知识，并且通常被认为更多的是一门艺术而不是一门科学。我的研究计划旨在开发、研究和应用新颖的算法，通过从大量潜在的基函数中有效地选择函数子集来自动生成基函数，并随着有关该问题的更多信息被了解而更新该集合。这种算法的好处是双重的：首先，它减少了提出一组消息灵通的基函数的负担，而这些基函数需要有关问题的大量先验知识；其次，由于基函数考虑了许多潜在的候选函数，因此预计近似的质量会得到提高。我的短期目标包括评估所提出的算法在各种应用领域的性能，例如易腐烂库存管理、患者调度和收入管理。******ADP 是一种通用方法，常用于解决问题各种应用中的许多不同问题。由于这些算法生成的策略的质量取决于所选基函数的质量，因此如果生成和选择基函数的过程可以自动化，那么无论在理论上还是在实践上都将引起极大的兴趣。因此，即使我的研究结果取得了微小的改进，也会在多个应用领域产生重大的实际影响。