EUROPT 2025
Abstract Submission

611. Beyond One‑Hot Labels: KL‑Divergence Training with Empirical Distributions for Faster Optimization

Invited abstract in session MC-12: Robust optimisation and its applications, stream Applications: AI, uncertainty management and sustainability.

Monday, 14:00-16:00
Room: B100/8009

Authors (first author is the speaker)

1. Arman Bolatov
Machine Learning, MBZUAI

Abstract

We introduce an alternative training approach for deep learning classification that enhances standard pipelines by minimizing KL divergence against approximated true underlying distributions rather than cross-entropy on one-hot labels. Our work demonstrates two effective strategies: first, training models using KL divergence against distributions that accurately reflect label relationships and potential ambiguities; second, leveraging locality-sensitive hashing to create empirical distributions from semantically similar examples. Experiments on image classification tasks show these approaches lead to more faster and stable optimization. We further establish the effectiveness of teacher-student knowledge transfer, where the teacher models trained with KL divergence on the approximated true distributions successfully guide new networks, outperforming traditional training methods particularly when labels contain noise or ambiguity.

Keywords

Status: accepted


Back to the list of papers