EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

1111. Conservation laws for gradient flows

Invited abstract in session TD-32: Algorithms for machine learning and inverse problems: optimisation for neural networks, stream Advances in large scale nonlinear optimization.

Tuesday, 14:30-16:00
Room: 41 (building: 303A)

Authors (first author is the speaker)

1. Sibylle Marcotte
DMA, ENS

Abstract

Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this talk, I will first rigorously expose the definition and basic properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then I will explain how to find the exact number of independent conservation laws by performing finite-dimensional algebraic manipulations. In the specific case of linear and ReLu networks, this procedure recovers the conservation laws known in the literature and shows that there are no other laws.

Keywords

Status: accepted


Back to the list of papers