Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze an asynchronous and decentralized learning algorithm, namely Non-Stationary Competing Bandits (NSCB), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not dominated by the higher ranked agents, which leads to forced exploration. With carefully defined complexity parameters, we characterize this forced exploration and obtain sub-linear (logarithmic) regret of NSCB. Furthermore, we validate our theoretical findings via experiments.
理解双边在线匹配市场的复杂动态(在该市场中,需求方主体竞争与供应方(资源)进行匹配)近来受到了广泛关注。为此,在本文中,我们引入了非平稳(动态)环境下的去中心化双边匹配市场框架。我们遵循序列独裁设置,即需求方主体对供应方(资源)具有未知且不同的偏好,而资源对主体具有固定且已知的偏好。我们提出并分析了一种异步去中心化学习算法,即非平稳竞争老虎机(NSCB),其中主体采用(受限的)连续淘汰式学习算法来了解他们对资源的偏好。理解此类系统的复杂性源于竞争的老虎机以异步方式选择其行动,且排名较低的主体只能从一组资源中学习,这些资源未被排名较高的主体所主导,这导致了强制探索。通过精心定义的复杂性参数,我们描述了这种强制探索,并获得了NSCB的次线性(对数)遗憾值。此外,我们通过实验验证了我们的理论发现。