Treat abstract

> Treat abstract

224. Riemannian gradient descent improves parameter-efficient fine-tuning

Invited abstract in session WB-1: Advances in stochastic and non-euclidean first order methods, stream Zeroth and first-order optimization methods.

Wednesday, 10:30-12:30
Room: B100/1001

Authors (first author is the speaker)

1.	Bingcong Li
	Computer Science, ETH Zurich

Abstract

Low rank adapters (LoRA) leverage the classical Burer-Monteiro (BM) factorization to enable parameter-efficient fine-tuning of large language models (LLMs). However, recent studies have theoretically demonstrated that gradient descent (GD) can suffer from exponential slowdowns in LoRA training when an improper rank is chosen. To address this, we propose semi-orthogonal low-rank Adapters (SoLA), inspired by SVD-based factorization. We prove that Riemannian Gradient Descent (RGD) with semi-orthogonal constraints (i.e., Stiefel manifolds) overcomes slow convergence in such unfavorable cases. Guided by our theoretical insights, we apply SoLA to fine-tune LLMs and demonstrate its efficiency on large-scale tasks.

Keywords

Large-scale optimization

Status: accepted

Back to the list of papers

> Treat abstract

This part of the site is hosted by EURO. Feedback. Privacy policy