Treat abstract

> Treat abstract

603. Line Search Methods are Sharpness-Aware and Operate at the Edge of Stability

Invited abstract in session WB-3: Recent Advances in Line-Search Based Optimization, stream Large scale optimization: methods and algorithms.

Wednesday, 10:30-12:30
Room: B100/4011

Authors (first author is the speaker)

1.	Leonardo Galli
	Mathematics, LMU Munich

Abstract

The traditional convergence analysis of Gradient Descent (GD) assumes the step size to be bounded from above by twice the reciprocal of the Lipschitz smoothness constant. However, recent numerical observations on neural networks have shown that GD also converges with larger step sizes. In this case, GD may enter into the so-called edge of stability phase in which the objective function decreases faster than with smaller steps, but nonmonotonically. Interestingly enough, this same behavior was already observed roughly 40 years ago when the first nonmonotone line search was proposed. In this paper, we show that nonmonotone line searches are able to operate at the edge of stability regime right from the start of the training. The numerical observations are supported by a theoretical analysis showing that line search methods yield step sizes that are indeed dependent on the Lipschitz smoothness constant and, consequently, on the sharpness (i.e., the largest eigenvalue of the hessian of the loss). We thus compare nonmonotone line search methods with SAM (Sharpness-Aware Minimization) showing that the first outperform the second in terms of speed of convergence, flatness of the solutions, and per-iteration costs.

Keywords

Optimization for learning and data analysis
Large-scale optimization
Computational mathematical optimization

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy