Treat abstract

> Treat abstract

229. Online Learning and Information Exponents: The Importance of Batch size & Time/Complexity Tradeoffs

Invited abstract in session TB-5: Optimization for learning III, stream Optimization for learning.

Thursday, 10:05 - 11:20
Room: M:N

Authors (first author is the speaker)

1.	Stephan Ludovic
	IDEPHICS, EPFL

Abstract

We study the impact of the batch size on the iteration time of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches minimize the training time without changing the total sample complexity. However, larger batch sizes are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, Correlation loss SGD, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.

Keywords

Analysis and engineering of optimization algorithms
Optimization for learning and data analysis

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy