Treat abstract

> Treat abstract

3489. AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

Invited abstract in session WA-32: Adaptive and Polyak step-size methods, stream Advances in large scale nonlinear optimization.

Wednesday, 8:30-10:00
Room: 41 (building: 303A)

Authors (first author is the speaker)

1.	Petr Ostroukhov
	Machine Learning, Mohamed bin Zayed University of Artificial Intelligence

Abstract

This work presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in step size are well-known techniques to "tighten" the area of convergence of SGD and decrease its variance. A range of studies by R. Byrd and J. Nocedal introduced various testing techniques to assess the quality of mini-batch gradient approximations and choose the appropriate batch sizes at every step. Methods that utilized exact tests were observed to converge sublinearly. Conversely, inexact test implementations sometimes resulted in non-convergence and erratic performance. To address these challenges, AdaBatchGrad incorporates both adaptive batch and step sizes, enhancing the method's robustness and stability. This makes AdaBatchGrad markedly more robust and computationally efficient relative to prevailing methods. To substantiate the efficacy of our method, we experimentally show, how the introduction of adaptive step size and adaptive batch size gradually improves the performance of regular SGD. The results imply that AdaBatchGrad surpasses alternative methods, especially when applied to inexact tests.

Keywords

Artificial Intelligence
Convex Optimization
Machine Learning

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy

Username:
Password:

EURO-Online login

3489. AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size