EUROPT 2024
Abstract Submission

135. Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization

Invited abstract in session WD-5: Optimization for learning II, stream Optimization for learning.

Wednesday, 11:25 - 12:40
Room: M:N

Authors (first author is the speaker)

1. Hanmin Li
Artificial Intelligence Initiative, King Abdullah University of Science and Technology
2. Avetik Karagulyan
Artificial Intelligence Initiative, King Abdullah University of Science and Technology
3. Peter Richtarik
KAUST

Abstract

This paper introduces a new method for minimizing matrix-smooth non-convex objectives through the use of novel Compressed Gradient Descent (CGD) algorithms enhanced with a matrix-valued stepsize. The proposed algorithms are theoretically analyzed first in the single-node and subsequently in the distributed settings. Our theoretical results reveal that the matrix stepsize in CGD can capture the objective's structure and lead to faster convergence compared to a scalar stepsize. As a byproduct of our general results, we emphasize the importance of selecting the compression mechanism and the matrix stepsize in a layer-wise manner, taking advantage of model structure. Moreover, we provide theoretical guarantees for free compression, by designing specific layer-wise compressors for the non-convex matrix smooth objectives. Our findings are supported with empirical evidence.

Keywords

Status: accepted


Back to the list of papers