Treat abstract

> Treat abstract

135. Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization

Invited abstract in session WD-5: Optimization for learning II, stream Optimization for learning.

Wednesday, 11:25 - 12:40
Room: M:N

Authors (first author is the speaker)

1.	Hanmin Li
	Artificial Intelligence Initiative, King Abdullah University of Science and Technology
2.	Avetik Karagulyan
	Artificial Intelligence Initiative, King Abdullah University of Science and Technology
3.	Peter Richtarik
	KAUST

Abstract

This paper introduces a new method for minimizing matrix-smooth non-convex objectives through the use of novel Compressed Gradient Descent (CGD) algorithms enhanced with a matrix-valued stepsize. The proposed algorithms are theoretically analyzed first in the single-node and subsequently in the distributed settings. Our theoretical results reveal that the matrix stepsize in CGD can capture the objective's structure and lead to faster convergence compared to a scalar stepsize. As a byproduct of our general results, we emphasize the importance of selecting the compression mechanism and the matrix stepsize in a layer-wise manner, taking advantage of model structure. Moreover, we provide theoretical guarantees for free compression, by designing specific layer-wise compressors for the non-convex matrix smooth objectives. Our findings are supported with empirical evidence.

Keywords

Optimization for learning and data analysis
Artificial intelligence based optimization methods and appl
Analysis and engineering of optimization algorithms

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy