Treat abstract

> Treat abstract

1204. Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points

Invited abstract in session MD-34: Preconditioning for Large Scale Nonlinear Optimization, stream Advances in large scale nonlinear optimization.

Monday, 14:30-16:00
Room: 43 (building: 303A)

Authors (first author is the speaker)

1.	Nick Tsipinakis
	Faculty of Mathematics and Computer Science, Uni Distance Suisse
2.	Panos Parpas
	Computing, Imperial College London

Abstract

First-order methods slow down when applied to high-dimensional non-convex functions due to the presence of saddle points. If, additionally, the saddles are surrounded by large plateaus, it is highly likely that the first-order methods will converge to a sub-optimal solution. A natural way to tackle the limitations of first-order methods is to employ second order information from the Hessian. However, methods that incorporate the Hessian do not scale to large models . To address these issues, we propose Simba, a scalable preconditioned gradient method. The method is very simple to implement. It maintains a single precondition matrix that it is constructed as the outer product of the moving average of the gradients. To significantly reduce the computational cost of forming and inverting the preconditioner, we draw links with multilevel optimization methods and construct randomized preconditioners. Our numerical experiments and testing against other state-of-the-art methods verify the scalability of Simba as well as its efficacy near saddles and flat areas. We also analyze Simba and show its linear convergence rate for strongly convex functions.

Keywords

Stochastic Optimization
Machine Learning

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy

Username:
Password:

EURO-Online login

1204. Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points