EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

1048. On the Stochastic Polyak Step Size for Machine Learning: Proximal and Momentum Versions

Invited abstract in session WA-32: Adaptive and Polyak step-size methods, stream Advances in large scale nonlinear optimization.

Wednesday, 8:30-10:00
Room: 41 (building: 303A)

Authors (first author is the speaker)

1. Fabian Schaipp
Mathematics, Inria Paris

Abstract

In this talk, we show how to combine the stochastic Polyak step size with established practical techniques such as regularization and momentum.
In particular, we derive a proximal version of the Polyak step size, and a momentum version which we call MoMo.
MoMo can be seen as an adaptive learning rate for SGD with momentum; in fact we can derive a MoMo version of any momentum method, most importantly MoMo-Adam.
These derivations are possible through the connection between the Polyak step size and model-based stochastic optimization, where the model is truncated at a known lower bound.
In machine learning, such lower bounds are typically known. By construction, our new adaptive learning rates reduce the amount of learning-rate tuning, which is demonstrated through deep learning experiments on the CIFAR, Imagenet, Criteo, and IWSLT14 dataset.

Keywords

Status: accepted


Back to the list of papers