500. Exploring Step Size Adaptation in Large-Scale Deep Learning Optimization
Invited abstract in session MD-2: Optimization in machine Learning , stream Nonsmooth and nonconvex optimization.
Monday, 16:30-18:30Room: B100/7011
Authors (first author is the speaker)
| 1. | Lorenzo Ciarpaglini
|
| Department of Computer, Control, and Management Engineering A. Ruberti, Sapienza University of Rome | |
| 2. | Laura Palagi
|
| Department of Computer, Control, and Management Engineering A. Ruberti, Sapienza University of Rome | |
| 3. | Diego Scuppa
|
| Department of Computer, Control and Management Engineering, Sapienza University of Rome | |
| 4. | Marco Sciandrone
|
| DIAG, Sapienza Università di Roma |
Abstract
Scaling deep learning optimization remains a fundamental challenge, especially in relation to the choice of step size within first-order methods. In this work, we explore adaptive strategies for learning rate selection designed to improve both convergence behavior and robustness across different architectures, tasks, and datasets. Rather than relying on fixed schedules or heuristic tuning, our approach aims to incorporate meaningful information from the optimization landscape to guide parameter updates. The resulting methods are flexible and can be integrated into standard training procedures with minimal overhead. An empirical study is conducted to assess the performance of different adaptive step size strategies, with a focus on their stability, efficiency, and integration within standard training pipelines.
Keywords
- First-order optimization
- Large-scale optimization
- Linear and nonlinear optimization
Status: accepted
Back to the list of papers