Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.
最近有大量工作致力于马尔可夫链随机梯度方法(MC - SGMs)的研究,这些研究主要集中在求解最小化问题的收敛性分析上。在本文中,我们在统计学习理论的框架下,从算法稳定性的角度对用于最小化问题和极大极小问题的MC - SGMs进行了全面的泛化分析。对于经验风险最小化(ERM)问题,我们通过引入平均参数稳定性,为光滑和非光滑情况建立了最优的超额总体风险界。对于极大极小问题,我们建立了平均参数稳定性和泛化误差之间的定量联系,这扩展了关于一致稳定性的现有结果\cite{lei2021stability}。我们进一步针对凸 - 凹问题在期望和高概率下得出了首批近乎最优的收敛率,结合我们的稳定性结果表明,对于光滑和非光滑情况都可以达到最优泛化界。据我们所知,这是当梯度从马尔可夫过程中采样时,对随机梯度方法的首次泛化分析。