Treat abstract

> Treat abstract

1048. On the Stochastic Polyak Step Size for Machine Learning: Proximal and Momentum Versions

Invited abstract in session WA-32: Adaptive and Polyak step-size methods, stream Advances in large scale nonlinear optimization.

Wednesday, 8:30-10:00
Room: 41 (building: 303A)

Authors (first author is the speaker)

1.	Fabian Schaipp
	Mathematics, Inria Paris

Abstract

In this talk, we show how to combine the stochastic Polyak step size with established practical techniques such as regularization and momentum.
In particular, we derive a proximal version of the Polyak step size, and a momentum version which we call MoMo.
MoMo can be seen as an adaptive learning rate for SGD with momentum; in fact we can derive a MoMo version of any momentum method, most importantly MoMo-Adam.
These derivations are possible through the connection between the Polyak step size and model-based stochastic optimization, where the model is truncated at a known lower bound.
In machine learning, such lower bounds are typically known. By construction, our new adaptive learning rates reduce the amount of learning-rate tuning, which is demonstrated through deep learning experiments on the CIFAR, Imagenet, Criteo, and IWSLT14 dataset.

Keywords

Machine Learning
Stochastic Optimization
Non-smooth Optimization

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy

Username:
Password:

EURO-Online login

1048. On the Stochastic Polyak Step Size for Machine Learning: Proximal and Momentum Versions