EUROPT 2024
Abstract Submission

176. MAST: Model-Agnostic Sparsified Training

Invited abstract in session TB-5: Optimization for learning III, stream Optimization for learning.

Thursday, 10:05 - 11:20
Room: M:N

Authors (first author is the speaker)

1. Egor Shulgin
KAUST AI Initiative, King Abdullah University of Science and Technology (KAUST)
2. Peter Richtarik
Computer Science, KAUST

Abstract

We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators, allowing for sparsification of both the model and gradient during training. We establish the insightful properties of the proposed objective function and highlight its connections to the standard formulation. Furthermore, we present several variants of the Stochastic Gradient Descent (SGD) method adapted to the new problem formulation, including SGD with general sampling, a distributed version, and SGD with variance reduction techniques. We achieve tighter convergence rates and relax assumptions, bridging the gap between theoretical principles and practical applications, covering several important techniques such as Dropout and Sparse training. This work presents promising opportunities to enhance the theoretical understanding of model training through a sparsification-aware optimization approach.

Keywords

Status: accepted


Back to the list of papers