49. A Simple Stochastic Trust-Region Method for Training Neural Network Classification Models
Invited abstract in session WC-4: Large scale optimization and applications 1 , stream Large scale optimization and applications.
Wednesday, 10:00 - 11:30Room: C105
Authors (first author is the speaker)
| 1. | Mahsa Yousefi
|
| Department of Industrial Engineering, University of Florence | |
| 2. | Stefania Bellavia
|
| Dipartimento di Ingegneria Industriale, Universita di Firenze | |
| 3. | Benedetta Morini
|
| Dipartimento di Ingegneria Industriale, Universita di Firenze |
Abstract
Deep learning (DL), employing deep neural networks, is widely used in tasks like image recognition. Training these networks involves solving finite-sum minimization problems. Due to the impracticality of computing true gradients or Hessians for large-scale DL problems, stochastic methods with subsampling are typically employed. Stochastic gradient descent (SGD) and its variations are popular in deep learning due to their simplicity and low per-iteration cost. Second-order methods are also explored for their potential to utilize curvature information, aiding navigation of complex landscapes efficiently. However, using second-order information entails the computational cost of computing Hessians or their approximations. To combine the advantages of both first- and second-order methods, we introduce a stochastic approach based on a simple trust-region model that utilizes approximate partial second-order curvature information efficiently. We provide convergence assurances and empirical assessments of this method in image classification tasks.
Keywords
- Optimization for learning and data analysis
- Artificial intelligence based optimization methods and appl
- SS - Advances in Nonlinear Optimization and Applications
Status: accepted
Back to the list of papers