Second-order Hessian-free methods for statistical learning and stochastic optimization

用于统计学习和随机优化的二阶无 Hessian 方法

基本信息

批准号：
RGPIN-2022-04400
负责人：
Bastin, Fabian
金额：
$ 3.13万
依托单位：
Université de Montréal
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=757409
关键词：
Second order Hessian free methods

项目摘要

The success of machine learning this last decade has had a deep impact in the mathematical optimization community and renewed interest in methods as stochastic gradient descent. Such an approach has the advantage to provide cheap iterations, allowing fast progress at the beginning of the optimization, and to avoid the storage of dense matrices, prohibited when dealing with a very large number of parameters. They however have difficulties to converge close to the solution, relying to vanishing step sizes to guarantee theoretical convergence. The algorithm can present difficulties to reach a vicinity of solution depending on the starting point. We investigate second-order Hessian-free strategies to capitalize on the existing nonlinear programming theory, while allowing to scale with the number of data and decision variables. The methods rely on adaptive sample average approximations (SAA), controlling the sample size with respect to the achieved estimated objective function reduction when compared to the statistical noise, within a trust-region framework and standard variance reduction techniques. At each iteration, quasi-Newton candidate iterates can be obtained without explicit matrix storage, and we explore how to use the structure of typical estimation problems to improve the approach. A second objective of the proposed research consists in capitalizing on the statistical information that we obtain on the model to develop better early stopping strategies. They are especially important as large samples are required close to the solution, leading to costly iterations. Another benefit is the possibility to provide the modeler with some information about the residual uncertainty at the found solution. We also explore the effect of observations that are not independently and identically distributed, as they could lead to biased solutions, and possibly have a negative impact on some social communities when the model is used to elaborate policies that impact individuals, for instance in transportation or energy. Similarly, model misspecifications are important to analyze, both in terms of algorithm convergence and in terms of solution robustness. Another important aspect that we consider is the feasible set as most of the optimization algorithms used in machine learning are designed for unconstrained problems only. However, many real applications, for instance in energy, include nonlinear constraints whose expressions can depend on the realization of the uncertainty, and the feasible set is not guaranteed to be convex. A standard approach is to turn to methods aiming to find a KKT solution, but stochastic approximation methods have received much less attention in this context, and SAA methods present additional challenges too, as adaptive sampling strategies face more difficulties to exploit the information geometry and the sample can have to be adjusted when it is important to satisfy some constraints for all or nearly all scenarios.

在过去的十年中，机器学习的成功对数学优化社区产生了深远的影响，并重新兴趣作为随机梯度下降。这种方法具有提供廉价迭代的优势，可以在优化开始时快速进步，并避免在处理大量参数时禁止储存密集的矩阵。但是，他们很难依靠消失的步进大小来确保理论收敛。根据起点，该算法可能会带来难以到达解决方案附近。我们研究了二阶无Hessian策略，以利用现有的非线性编程理论，同时允许随数据和决策变量的数量扩展。这些方法依赖于自适应样本平均值（SAA），在与统计噪声相比，在信任区域框架和降低标准方差降低技术中相比，相对于所达到的估计目标功能降低的样本量。在每次迭代中，可以在没有明确矩阵存储的情况下获得准牛顿候选迭代术，我们探索如何使用典型估计问题的结构来改善方法。拟议研究的第二个目标是利用我们在模型上获得的统计信息，以制定更好的早期停止策略。它们尤其重要，因为需要大型样品靠近解决方案，从而导致了昂贵的迭代。另一个好处是，在发现的解决方案下，有可能向建模者提供有关残余不确定性的一些信息。我们还探讨了观察结果并非独立和分布相同的观察结果，因为它们可能导致有偏见的解决方案，并且当该模型被用来详细影响个人的政策，例如在运输或能源方面，可能会对某些社会社区产生负面影响。同样，在算法收敛和解决方案鲁棒性方面，模型拼写错误对于分析都很重要。我们认为的另一个重要方面是可行的集合，因为机器学习中使用的大多数优化算法仅用于无限制问题。但是，许多实际应用，例如在能量中，包括非线性约束，其表达方式可以取决于不确定性的实现，并且不保证可行的集合是凸的。一种标准的方法是转向旨在找到KKT解决方案的方法，但是在这种情况下，随机近似方法受到了更少的关注，而SAA方法也带来了其他挑战，因为自适应采样策略面临更多的困难来利用信息几何形状，并且在满足某些场景或几乎所有场景的情况下，必须调整样品，并且必须调整样本，并且必须受到调整。