345. Alternate Through the Epochs Stochastic Gradient for Multi-Task Neural Networks
Invited abstract in session MC-3: First-order methods in modern optimization (Part II), stream Large scale optimization: methods and algorithms.
Monday, 14:00-16:00Room: B100/4011
Authors (first author is the speaker)
| 1. | Stefania Bellavia
|
| Dipartimento di Ingegneria Industriale, Universita di Firenze | |
| 2. | Francesco Della Santa
|
| Dipartimento di Scienze Matematiche, Politecnico di Torino | |
| 3. | Alessandra Papini
|
| Ingegneria Industriale, Università di Firenze |
Abstract
We focus on the training phase of Neural Networks for Multi-Task
Learning. We consider hard-parameter sharing Multi-Task Neural Networks (MTNNs) and discuss alternate stochastic gradient updates. Traditional MTNN training faces challenges in managing conflicting loss gradients, often yielding sub-optimal performance. The proposed alternate training method updates shared and task-specific weights alternately through the epochs, exploiting the multi-head architecture of the model. This approach reduces computational costs per epoch and memory requirements. Convergence properties similar to those of the classical stochastic gradient method are established.
Empirical experiments demonstrate enhanced training regularization
and reduced computational demands.
Keywords
- Optimization for learning and data analysis
- First-order optimization
- Multi-objective optimization
Status: accepted
Back to the list of papers