EUROPT 2025
Abstract Submission

303. Bi-level optimization in machine learning: instances, acceleration and implicit bias

Invited abstract in session MC-7: Bilevel Optimization in Data Science, stream Bilevel and multilevel optimization.

Monday, 14:00-16:00
Room: B100/5015

Authors (first author is the speaker)

1. Zhanxing Zhu
ECS, University of Southampton

Abstract

In this talk, I will review several important instances of bi-level optimization in machine learning. These include neural architecture search, hyperparameter optimization, adversarial training, data condensation, meta learning, etc.

I will also introduce several recent works my group did to advance this area. 1) Accelerating large-scale bi-level optimization (NeurIPS’24). We proposed a novel Forward Gradient Unrolling with Forward Gradient, abbreviated as (FG)2U, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimization. (FG)2U circumvents the memory and approximation issues associated with classical bi-level optimization approaches and delivers significantly more accurate gradient estimates than existing approaches. 2) Implicit bias of bi-level optimization (ICLR’22 and TPAMI’25). Adversarial training, as an instance of bi-level optimization, has been empirically demonstrated as an effective strategy to improve the robustness of deep neural networks (DNNs) against adversarial examples. However, the underlying reason of its effectiveness is still non-transparent. We conducted extensive theoretical analysis on the training dynamics of homogeneous DNNs and show that the adversarial training implicitly learns a generalized margin to improve the adversarial robustness, solving the long-standing conjecture in this area.

Keywords

Status: accepted


Back to the list of papers