The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $
eq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $ extit{adding}$ a new state-action pair is considerably easier to implement.
在加强学习领域的持续学习问题(通常称为非平稳的强化学习)被确定为对强化学习的应用的重要挑战。我们证明了最糟糕的复杂性结果,我们认为这会捕捉到这一挑战:在加强学习问题中修改单个州行动对的概率或奖励,几乎需要一定时间,几乎与状态数量一样大,以便除非强大的指数时间假设(SETH)是错误的,否则请保持最新价值函数。塞思(Seth)是P $的广泛接受的加强
等式$ NP猜想。回想一下,当前强化学习应用中的状态数量通常是天文学的。相比之下,我们表明只有$ extit {添加} $一个新的州行动对就更容易实现。