399. Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Invited abstract in session MD-5: Relaxed Smoothness and Convexity Assumptions in Optimization for Machine Learning, stream Optimization for machine learning.
Monday, 16:30-18:30Room: B100/4013
Authors (first author is the speaker)
| 1. | Anton Rodomanov
|
| CISPA Helmholtz Center for Information Security | |
| 2. | Daniil Vankov
|
| Arizona State University | |
| 3. | Angelia Nedich
|
| ECEE, Arizona State University | |
| 4. | Lalitha Sankar
|
| Arizona State University | |
| 5. | Sebastian Stich
|
| CISPA Helmholtz Center for Information Security |
Abstract
We study gradient methods for optimizing $(L_0, L_1)$-smooth functions, a class that
generalizes Lipschitz-smooth functions and has gained attention for its relevance in
machine learning. We provide new insights into the structure of this function class
and develop a principled framework for analyzing optimization methods in this setting. While our convergence rate estimates recover existing results for minimizing
the gradient norm in nonconvex problems, our approach significantly improves the
best-known complexity bounds for convex objectives. Moreover, we show that the
gradient method with Polyak stepsizes and the normalized gradient method achieve
nearly the same complexity guarantees as methods that rely on explicit knowledge
of $(L_0, L_1)$. Finally, we demonstrate that a carefully designed accelerated gradient
method can be applied to $(L_0, L_1)$-smooth functions, further improving all previous
results.
Keywords
- First-order optimization
Status: accepted
Back to the list of papers