EURO-Online login
- New to EURO? Create an account
- I forgot my username and/or my password.
- Help with cookies
(important for IE8 users)
1048. On the Stochastic Polyak Step Size for Machine Learning: Proximal and Momentum Versions
Invited abstract in session WA-32: Adaptive and Polyak step-size methods, stream Advances in large scale nonlinear optimization.
Wednesday, 8:30-10:00Room: 41 (building: 303A)
Authors (first author is the speaker)
1. | Fabian Schaipp
|
Mathematics, Inria Paris |
Abstract
In this talk, we show how to combine the stochastic Polyak step size with established practical techniques such as regularization and momentum.
In particular, we derive a proximal version of the Polyak step size, and a momentum version which we call MoMo.
MoMo can be seen as an adaptive learning rate for SGD with momentum; in fact we can derive a MoMo version of any momentum method, most importantly MoMo-Adam.
These derivations are possible through the connection between the Polyak step size and model-based stochastic optimization, where the model is truncated at a known lower bound.
In machine learning, such lower bounds are typically known. By construction, our new adaptive learning rates reduce the amount of learning-rate tuning, which is demonstrated through deep learning experiments on the CIFAR, Imagenet, Criteo, and IWSLT14 dataset.
Keywords
- Machine Learning
- Stochastic Optimization
- Non-smooth Optimization
Status: accepted
Back to the list of papers