603. Line Search Methods are Sharpness-Aware and Operate at the Edge of Stability
Invited abstract in session WB-3: Recent Advances in Line-Search Based Optimization, stream Large scale optimization: methods and algorithms.
Wednesday, 10:30-12:30Room: B100/4011
Authors (first author is the speaker)
| 1. | Leonardo Galli
|
| Mathematics, LMU Munich |
Abstract
The traditional convergence analysis of Gradient Descent (GD) assumes the step size to be bounded from above by twice the reciprocal of the Lipschitz smoothness constant. However, recent numerical observations on neural networks have shown that GD also converges with larger step sizes. In this case, GD may enter into the so-called edge of stability phase in which the objective function decreases faster than with smaller steps, but nonmonotonically. Interestingly enough, this same behavior was already observed roughly 40 years ago when the first nonmonotone line search was proposed. In this paper, we show that nonmonotone line searches are able to operate at the edge of stability regime right from the start of the training. The numerical observations are supported by a theoretical analysis showing that line search methods yield step sizes that are indeed dependent on the Lipschitz smoothness constant and, consequently, on the sharpness (i.e., the largest eigenvalue of the hessian of the loss). We thus compare nonmonotone line search methods with SAM (Sharpness-Aware Minimization) showing that the first outperform the second in terms of speed of convergence, flatness of the solutions, and per-iteration costs.
Keywords
- Optimization for learning and data analysis
- Large-scale optimization
- Computational mathematical optimization
Status: accepted
Back to the list of papers