264. Optimal sampling for stochastic and natural gradient descent
Invited abstract in session WF-6: Stochastic Gradient Methods: Bridging Theory and Practice, stream Challenges in nonlinear programming.
Wednesday, 16:20 - 18:00Room: M:H
Authors (first author is the speaker)
| 1. | Robert Gruhlke
|
| FU Berlin | |
| 2. | Philipp Trunschke
|
| Centrale Nantes & Nantes Université | |
| 3. | Anthony Nouy
|
| Centrale Nantes & Nantes Université |
Abstract
We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss.
This is a classical task in statistics, machine learning and physics-informed machine learning.
A straightforward solution is to replace the exact objective with a Monte Carlo estimate before employing standard first-order methods like gradient descent, which yields the classical stochastic gradient descent method.
But replacing the true objective with an estimate ensues a ``generalisation error''.
Rigorous bounds for this error typically require strong compactness and Lipschitz continuity assumptions while providing a very slow decay with sample size.
We propose a different optimisation strategy relying on a natural gradient descent in which the true gradient is approximated in local linearisations of the model class via (quasi-)projections based on optimal sampling methods.
Under classical assumptions on the loss and the nonlinear model class, we prove that this scheme converges almost surely monotonically to a stationary point of the true objective and we provide convergence rates.
Keywords
- Linear and nonlinear optimization
- Complexity and efficiency of optimization algorithms
- Optimization for learning and data analysis
Status: accepted
Back to the list of papers