Treat abstract

> Treat abstract

49. A Simple Stochastic Trust-Region Method for Training Neural Network Classification Models

Invited abstract in session WC-4: Large scale optimization and applications 1 , stream Large scale optimization and applications.

Wednesday, 10:00 - 11:30
Room: C105

Authors (first author is the speaker)

1.	Mahsa Yousefi
	Department of Industrial Engineering, University of Florence
2.	Stefania Bellavia
	Dipartimento di Ingegneria Industriale, Universita di Firenze
3.	Benedetta Morini
	Dipartimento di Ingegneria Industriale, Universita di Firenze

Abstract

Deep learning (DL), employing deep neural networks, is widely used in tasks like image recognition. Training these networks involves solving finite-sum minimization problems. Due to the impracticality of computing true gradients or Hessians for large-scale DL problems, stochastic methods with subsampling are typically employed. Stochastic gradient descent (SGD) and its variations are popular in deep learning due to their simplicity and low per-iteration cost. Second-order methods are also explored for their potential to utilize curvature information, aiding navigation of complex landscapes efficiently. However, using second-order information entails the computational cost of computing Hessians or their approximations. To combine the advantages of both first- and second-order methods, we introduce a stochastic approach based on a simple trust-region model that utilizes approximate partial second-order curvature information efficiently. We provide convergence assurances and empirical assessments of this method in image classification tasks.

Keywords

Optimization for learning and data analysis
Artificial intelligence based optimization methods and appl
SS - Advances in Nonlinear Optimization and Applications

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy