Treat abstract

> Treat abstract

164. A phase transition between positional and semantic learning in a solvable model of dot-product attention

Invited abstract in session WD-3: Optimization in neural architectures II, stream Optimization in neural architectures: convergence and solution characterization.

Wednesday, 11:25 - 12:40
Room: M:J

Authors (first author is the speaker)

1.	Hugo Cui
	EPFL
2.	Freya Behrens
	EPFL
3.	Florent Krzakala
	EPFL
4.	Lenka Zdeborová
	EPFL

Abstract

We investigate how a dot-product attention layer learns a positional attention matrix (with tokens attending to each other based on their respective positions) and a semantic attention matrix (with tokens attending to each other based on their meaning). For an algorithmic task, we experimentally show how the same simple architecture can learn to implement a solution using either the positional or semantic mechanism. On the theoretical side, we study the learning of a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples, we provide a closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional or a semantic mechanism and evidence an emergent phase transition from the former to the latter with increasing sample complexity.

Keywords

Optimization for learning and data analysis

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy