331. A variable metric proximal stochastic gradient method with dynamical variance reduction
Invited abstract in session WE-5: Randomized optimization algorithms part 1/2, stream Randomized optimization algorithms.
Wednesday, 14:10 - 15:50Room: M:N
Authors (first author is the speaker)
| 1. | Andrea Sebastiani
|
| Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia | |
| 2. | Pasquale Cascarano
|
| University of Bologna | |
| 3. | Giorgia Franchini
|
| University of Modena and Reggio Emilia | |
| 4. | Erich Kobler
|
| University of Bonn | |
| 5. | Federica Porta
|
| Universita' di Modena e Reggio Emilia |
Abstract
Training deep learning models typically involves minimizing the empirical risk over large datasets, dealing with a potentially non-differentiable regularization. In this work, we present a stochastic gradient method tailored for classification problems, that are ubiquitous in the scientific field. The variance of the objective's gradients is controlled using an automatic sample size selection, along with a variable metric to precondition the stochastic gradient directions. Additionally, a non-monotone line search is employed for the step size selection. The convergence of this first-order algorithm can be derived for both convex and non-convex objective functions. The numerical experiments suggest that the proposed approach performs comparably to state-of-the-art methods in training non only standard statical models for binary classification but also artificial neural networks for multi-class image classification.
Keywords
- Artificial intelligence based optimization methods and appl
- Optimization for learning and data analysis
- Convex and non-smooth optimization
Status: accepted
Back to the list of papers