Treat abstract

> Treat abstract

151. On the spectral bias of two-layer linear networks

Invited abstract in session WD-3: Optimization in neural architectures II, stream Optimization in neural architectures: convergence and solution characterization.

Wednesday, 11:25 - 12:40
Room: M:J

Authors (first author is the speaker)

1.	Aditya Varre
	EDIC, EPFL

Abstract

In this talk, we study the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape.
This characterization reveals that, in the small-scale initialization regime, the linear neural network's hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution. We support our findings with numerical experiments illustrating the phenomena.

Keywords

Artificial intelligence based optimization methods and appl
Analysis and engineering of optimization algorithms
Large- and Huge-scale optimization

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy