36. Convergence Analysis of Nonlinear Parabolic PDE Models with Neural Network Terms Trained with Gradient Descent
Invited abstract in session TB-10: First order methods: new perspectives for machine learning , stream Large scale optimization: methods and algorithms.
Tuesday, 10:30-12:30Room: B100/8011
Authors (first author is the speaker)
| 1. | Konstantin Riedl
|
| Mathematical Institute, University of Oxford | |
| 2. | Justin Sirignano
|
| Mathematical Institute, University of Oxford | |
| 3. | Konstantinos Spiliopoulos
|
| Department of Mathematics and Statistics, Boston University |
Abstract
Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks (NNs). The resulting PDE model, a function of the NN parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated by solving an adjoint PDE. In this talk, we discuss the convergence of this adjoint optimization method for training NN-PDE models in the limit where both the number of hidden units and the number of training steps tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs, we prove convergence of the NN-PDE solution to the target data (i.e., a global minimizer). The global convergence proof requires addressing several technical challenges, since the PDE system is both nonlinear and non-local. Although the adjoint PDE is linear, the NN training dynamics involve a non-local kernel operator in the infinite-width hidden layer limit, where the kernel lacks a spectral gap for its eigenvalues. This poses a unique mathematical challenge that is not encountered in finite-dimensional NN convergence analysis. We establish convergence by proving that an appropriate quadratic functional of the adjoint is globally Lipschitz and then applying a cycle of stopping times analysis to prove that the adjoint solution weakly converges to zero. Leveraging the definition of the adjoint PDE, this yields the global convergence of the original NN-PDE.
Keywords
- Optimization in industry, business and finance
- First-order optimization
Status: accepted
Back to the list of papers