Treat abstract

> Treat abstract

176. MAST: Model-Agnostic Sparsified Training

Invited abstract in session TB-5: Optimization for learning III, stream Optimization for learning.

Thursday, 10:05 - 11:20
Room: M:N

Authors (first author is the speaker)

1.	Egor Shulgin
	KAUST AI Initiative, King Abdullah University of Science and Technology (KAUST)
2.	Peter Richtarik
	Computer Science, KAUST

Abstract

We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators, allowing for sparsification of both the model and gradient during training. We establish the insightful properties of the proposed objective function and highlight its connections to the standard formulation. Furthermore, we present several variants of the Stochastic Gradient Descent (SGD) method adapted to the new problem formulation, including SGD with general sampling, a distributed version, and SGD with variance reduction techniques. We achieve tighter convergence rates and relax assumptions, bridging the gap between theoretical principles and practical applications, covering several important techniques such as Dropout and Sparse training. This work presents promising opportunities to enhance the theoretical understanding of model training through a sparsification-aware optimization approach.

Keywords

Optimization for learning and data analysis
Large- and Huge-scale optimization
Complexity and efficiency of optimization algorithms

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy