224. Riemannian gradient descent improves parameter-efficient fine-tuning
Invited abstract in session WB-1: Advances in stochastic and non-euclidean first order methods, stream Zeroth and first-order optimization methods.
Wednesday, 10:30-12:30Room: B100/1001
Authors (first author is the speaker)
| 1. | Bingcong Li
|
| Computer Science, ETH Zurich |
Abstract
Low rank adapters (LoRA) leverage the classical Burer-Monteiro (BM) factorization to enable parameter-efficient fine-tuning of large language models (LLMs). However, recent studies have theoretically demonstrated that gradient descent (GD) can suffer from exponential slowdowns in LoRA training when an improper rank is chosen. To address this, we propose semi-orthogonal low-rank Adapters (SoLA), inspired by SVD-based factorization. We prove that Riemannian Gradient Descent (RGD) with semi-orthogonal constraints (i.e., Stiefel manifolds) overcomes slow convergence in such unfavorable cases. Guided by our theoretical insights, we apply SoLA to fine-tune LLMs and demonstrate its efficiency on large-scale tasks.
Keywords
- Large-scale optimization
Status: accepted
Back to the list of papers