EURO-Online login
- New to EURO? Create an account
- I forgot my username and/or my password.
- Help with cookies
(important for IE8 users)
1111. Conservation laws for gradient flows
Invited abstract in session TD-32: Algorithms for machine learning and inverse problems: optimisation for neural networks, stream Advances in large scale nonlinear optimization.
Tuesday, 14:30-16:00Room: 41 (building: 303A)
Authors (first author is the speaker)
1. | Sibylle Marcotte
|
DMA, ENS |
Abstract
Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this talk, I will first rigorously expose the definition and basic properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then I will explain how to find the exact number of independent conservation laws by performing finite-dimensional algebraic manipulations. In the specific case of linear and ReLu networks, this procedure recovers the conservation laws known in the literature and shows that there are no other laws.
Keywords
- Machine Learning
- Control Theory
- Dynamical Systems
Status: accepted
Back to the list of papers