605. Stochastic line-search-based optimization for training overparameterized models: convergence conditions and effective approaches to leverage momentum
Invited abstract in session WB-3: Recent Advances in Line-Search Based Optimization, stream Large scale optimization: methods and algorithms.
Wednesday, 10:30-12:30Room: B100/4011
Authors (first author is the speaker)
| 1. | Davide Pucci
|
| Department of Information Engineering, University of Florence | |
| 2. | Matteo Lapucci
|
| Department of Information Engineering, University of Florence |
Abstract
In recent years, the adoption of line-search techniques within incremental gradient-based methods for finite-sums problems has gathered considerable interest among researchers. Most recent works in the literature focus on incorporating line-searches into Stochastic Gradient Descent (SGD), as the use of a descent direction for the mini-batch objective is essential to ensure the line-search terminates in a finite number of steps.
In this talk, we analyze how different search directions can be soundly used alongside stochastic line-searches. We define conditions on the sequence of search directions that guarantee finite termination and provide bounds for the backtracking procedure. Moreover, we shed light on the additional property of directions that is required to prove fast (linear) convergence of this general class of algorithms when applied to PL functions in the interpolation regime.
We then focus on the special case of SGD with Polyak's momentum, analyzing the challenges arising when using line-searches with this search direction, and proposing a solution to overcome them. We present an algorithmic framework that effectively leverages the momentum direction alongside stochastic line-search, using a conjugate-gradient-type rule for the definition of the momentum parameter.
Finally, we present a computational comparison, carried out on convex and nonconvex problems, showing the strong empirical performance of our method, which outperforms state-of-the-art approaches.
Keywords
- Large-scale optimization
- First-order optimization
- Optimization for learning and data analysis
Status: accepted
Back to the list of papers