Treat abstract

> Treat abstract

605. Stochastic line-search-based optimization for training overparameterized models: convergence conditions and effective approaches to leverage momentum

Invited abstract in session WB-3: Recent Advances in Line-Search Based Optimization, stream Large scale optimization: methods and algorithms.

Wednesday, 10:30-12:30
Room: B100/4011

Authors (first author is the speaker)

1.	Davide Pucci
	Department of Information Engineering, University of Florence
2.	Matteo Lapucci
	Department of Information Engineering, University of Florence

Abstract

In recent years, the adoption of line-search techniques within incremental gradient-based methods for finite-sums problems has gathered considerable interest among researchers. Most recent works in the literature focus on incorporating line-searches into Stochastic Gradient Descent (SGD), as the use of a descent direction for the mini-batch objective is essential to ensure the line-search terminates in a finite number of steps.
In this talk, we analyze how different search directions can be soundly used alongside stochastic line-searches. We define conditions on the sequence of search directions that guarantee finite termination and provide bounds for the backtracking procedure. Moreover, we shed light on the additional property of directions that is required to prove fast (linear) convergence of this general class of algorithms when applied to PL functions in the interpolation regime.
We then focus on the special case of SGD with Polyak's momentum, analyzing the challenges arising when using line-searches with this search direction, and proposing a solution to overcome them. We present an algorithmic framework that effectively leverages the momentum direction alongside stochastic line-search, using a conjugate-gradient-type rule for the definition of the momentum parameter.
Finally, we present a computational comparison, carried out on convex and nonconvex problems, showing the strong empirical performance of our method, which outperforms state-of-the-art approaches.

Keywords

Large-scale optimization
First-order optimization
Optimization for learning and data analysis

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy