EURO-Online login
- New to EURO? Create an account
- I forgot my username and/or my password.
- Help with cookies
(important for IE8 users)
1335. Enhancing the convergence speed of line search methods: Applications in Neural Network training
Invited abstract in session TA-34: New Algorithms for Nonlinear Optimization, stream Advances in large scale nonlinear optimization.
Tuesday, 8:30-10:00Room: 43 (building: 303A)
Authors (first author is the speaker)
1. | José Ángel Martín-Baos
|
Mathematics, University of Castilla-La Mancha | |
2. | Ricardo Garcia-Rodenas
|
Escuela Superior de Informatica, Universidad de Castilla La Mancha | |
3. | Luis Rodriguez-Benitez
|
Technologies and Information Systems, University of Castilla-La Mancha | |
4. | Maria Luz Lopez
|
matemáticas, Universidad de Castilla La Mancha |
Abstract
The training of machine learning models, such as neural networks, relies on optimisation techniques that necessitate large volumes of data. The algorithms that have demonstrated satisfactory performance on this task frequently use linear searches on subsets of the data. This ensures that, despite the potential low quality of the search direction, the overall computational cost remains low, making this strategy globally efficient. In these methods, strategies employing a constant learning rate have proven to be particularly effective. This paper introduces a novel scheme designed to significantly expedite the convergence process of line search-based methods. Our approach incorporates additional high-quality linear searches derived from the convergence process of the methods, and by making use of an Armijo rule, it dynamically adjusts the step size through successive reductions or expansions based on the evaluated quality of the descent direction. This strategic adjustment enables more substantial progress in the search direction, potentially reducing the number of iterations needed to reach an optimal solution. We have applied our proposed solution to accelerate the performance of widely-used algorithms such as Gradient Descent (GD), Momentum GD, and Adaptive Moment Estimation (Adam). To illustrate the practical implications and effectiveness of our approach, we present a comprehensive case study focusing on the training of Deep Neural Networks and Kernel Logistic Regression.
Keywords
- Machine Learning
- Large Scale Optimization
- Expert Systems and Neural Networks
Status: accepted
Back to the list of papers