428. Stochastic Gradient Descent without Variance Assumption: A Tight Lyapunov Analysis
Invited abstract in session TB-5: Randomized Optimization algorithms I, stream Optimization for machine learning.
Tuesday, 10:30-12:30Room: B100/4013
Authors (first author is the speaker)
| 1. | Lucas Ketels
|
| Ile-de-France, Université Paris Cité | |
| 2. | Daniel Cortild
|
| Faculty of Science and Engineering, University of Groningen | |
| 3. | Guillaume Garrigos
|
| Université Paris Cité | |
| 4. | Juan Peypouquet
|
| Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen |
Abstract
We derive new upper bounds on the performance of the stochastic
gradient descent (SGD) algorithm in the convex and smooth setting under
minimal variance assumptions and with a focus on a wide range of step-
sizes. The analysis of SGD usually assumes the uniform boundedness of the
variance of the gradient noise. Removing this assumption usually degrades
the results in two ways: it restricts the range of step-sizes and increases the complexity by a polynomial factor. Our analysis, however, solely relies on the weak assumption that the variance of the gradient noise is bounded at a solution, and remains valid for a wide range of step-sizes while improving state-of-the-art results. The proof is based on a Lyapunov analysis obtained by solving a suitable semidefinite program obtained through the performance estimation problem framework, as done in Taylor and Bach (2021). We also present numerical and analytical evidence of the tightness of our analysis
Keywords
- First-order optimization
- Computer-aided algorithm analysis
Status: accepted
Back to the list of papers