In this paper, online convex optimization (OCO) problems with time-varying objective and constraint functions are studied from the perspective of an agent who takes actions in real time. Information about the current objective and constraint functions is revealed only after the corresponding action is already chosen. Inspired by a fast converging algorithm for time-invariant optimization in the very recent work [1], we develop a novel online algorithm based on virtual queues for constrained OCO. Optimal points of the dynamic optimization problems with full knowledge of the current objective and constraint functions are used as a dynamic benchmark sequence. Upper bounds on the regrets with respect to the dynamic benchmark and the constraint violations are derived for the presented algorithm in terms of the temporal variations of the underlying dynamic optimization problems. It is observed that the proposed algorithm possesses sublinear regret and sublinear constraint violations, as long as the temporal variations of the optimization problems are sublinear, i.e., the objective and constraint functions do not vary too drastically across time. The performance bounds of the proposed algorithm are superior to those of the state-of-the-art OCO method in most scenarios. Besides, different from the saddle point methods widely used in constrained OCO, the stepsize of the proposed algorithm does not rely on the total time horizon, which may be unknown in practice. Finally, the algorithm is applied to a dynamic resource allocation problem in data center networks. Numerical experiments are conducted to corroborate the merit of the developed algorithm and its advantage over the state-of-the-art.
在本文中,从实时采取行动的智能体的角度研究了具有时变目标函数和约束函数的在线凸优化(OCO)问题。关于当前目标函数和约束函数的信息只有在相应的行动已经被选择之后才会被揭示。受近期文献[1]中一种用于时不变优化的快速收敛算法的启发,我们为有约束的OCO开发了一种基于虚拟队列的新型在线算法。将完全了解当前目标函数和约束函数的动态优化问题的最优点用作动态基准序列。根据潜在动态优化问题的时间变化,推导出所提出算法相对于动态基准的遗憾值以及约束违反情况的上界。可以观察到,只要优化问题的时间变化是次线性的,即目标函数和约束函数在时间上不会变化过于剧烈,所提出的算法就具有次线性遗憾值和次线性约束违反情况。在大多数情况下,所提出算法的性能界限优于现有的OCO方法。此外,与在有约束的OCO中广泛使用的鞍点方法不同,所提出算法的步长不依赖于总时间范围,而总时间范围在实际中可能是未知的。最后,将该算法应用于数据中心网络中的动态资源分配问题。进行了数值实验以证实所开发算法的优点及其相对于现有技术的优势。