VOCAL 2024
Abstract Submission

49. A Simple Stochastic Trust-Region Method for Training Neural Network Classification Models

Invited abstract in session WC-4: Large scale optimization and applications 1 , stream Large scale optimization and applications.

Wednesday, 10:00 - 11:30
Room: C105

Authors (first author is the speaker)

1. Mahsa Yousefi
Department of Industrial Engineering, University of Florence
2. Stefania Bellavia
Dipartimento di Ingegneria Industriale, Universita di Firenze
3. Benedetta Morini
Dipartimento di Ingegneria Industriale, Universita di Firenze

Abstract

Deep learning (DL), employing deep neural networks, is widely used in tasks like image recognition. Training these networks involves solving finite-sum minimization problems. Due to the impracticality of computing true gradients or Hessians for large-scale DL problems, stochastic methods with subsampling are typically employed. Stochastic gradient descent (SGD) and its variations are popular in deep learning due to their simplicity and low per-iteration cost. Second-order methods are also explored for their potential to utilize curvature information, aiding navigation of complex landscapes efficiently. However, using second-order information entails the computational cost of computing Hessians or their approximations. To combine the advantages of both first- and second-order methods, we introduce a stochastic approach based on a simple trust-region model that utilizes approximate partial second-order curvature information efficiently. We provide convergence assurances and empirical assessments of this method in image classification tasks.

Keywords

Status: accepted


Back to the list of papers