Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory. Our generalization bounds shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastic gradient Langevin dynamics (SGLD). We demonstrate our bounds through numerical experiments, showing that they can help understand recent empirical observations of the generalization phenomena of neural networks.
在不同的数据分布下,由不同优化算法训练的机器学习模型可能表现出不同的泛化行为。在本文中,我们分析了由含噪迭代算法训练的模型的泛化情况。通过将含噪迭代算法与通信和信息论中的加性噪声信道相联系,我们推导出了与分布相关的泛化界。我们的泛化界对若干应用有所启示,包括差分隐私随机梯度下降(DP - SGD)、联邦学习以及随机梯度朗之万动力学(SGLD)。我们通过数值实验展示了这些界,表明它们有助于理解近期关于神经网络泛化现象的实证观察结果。