Learning to Control PDEs with Differentiable Predictive Control and Time-Integrated Neural Operators

Dibakar Roy Sarkar; J\'an Drgo\v{n}a; Somdatta Goswami

arxiv: 2511.08992 · v2 · submitted 2025-11-12 · 💻 cs.CE

Learning to Control PDEs with Differentiable Predictive Control and Time-Integrated Neural Operators

Dibakar Roy Sarkar , J\'an Drgo\v{n}a , Somdatta Goswami This is my paper

Pith reviewed 2026-05-17 23:07 UTC · model grok-4.3

classification 💻 cs.CE

keywords PDE controlDeep Operator NetworksDifferentiable Predictive Controlneural control policiesmodel predictive controloperator learningdata-driven controlsurrogate models

0 comments

The pith

Neural policies learned via time-integrated Deep Operator Networks inside Differentiable Predictive Control track targets and satisfy constraints for PDEs like heat and Burgers equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework that places Time-Integrated Deep Operator Networks as differentiable surrogate models inside the Differentiable Predictive Control loop. Automatic differentiation through the surrogate computes gradients of an optimal control loss, allowing offline self-supervised training of neural policies. The resulting policies handle target tracking, constraint satisfaction, and curvature minimization on the heat, nonlinear Burgers, and reaction-diffusion equations. They generalize to new initial conditions and parameter values without online re-optimization. Inference runs four orders of magnitude faster than nonlinear model predictive control, removing the need for repeated online solves or supervisory controllers.

Core claim

Integrating TI-DeepONets, which learn temporal derivatives and pair them with numerical integrators, into the DPC algorithm lets neural policies be trained by backpropagating expectations of the control loss through the learned PDE surrogate. This produces policies that achieve target tracking, constraint satisfaction, and curvature minimization objectives while generalizing across distributions of initial conditions and parameters, with four orders of magnitude acceleration at inference compared to nonlinear model predictive control benchmarks.

What carries the argument

Time-Integrated Deep Operator Network (TI-DeepONet) surrogate that supplies differentiable PDE dynamics to the Differentiable Predictive Control (DPC) optimizer for end-to-end policy gradient computation.

If this is right

Policies achieve target tracking, constraint satisfaction, and curvature minimization on heat, Burgers, and reaction-diffusion equations.
Policies generalize to unseen initial conditions and parameter distributions.
Inference accelerates by four orders of magnitude relative to nonlinear model predictive control.
No online optimization or supervisory controller is required after training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate-plus-differentiable-control structure could be tried on higher-dimensional or more strongly nonlinear PDEs where repeated online solves become prohibitive.
Engineering domains such as thermal regulation or fluid transport might adopt the offline training plus fast deployment pattern once surrogate fidelity is verified in closed loop.
Extensions that add uncertainty quantification to the operator network could test robustness of the learned policies under model mismatch.

Load-bearing premise

The TI-DeepONet surrogate must remain accurate and stable enough inside the closed-loop optimization that gradients computed through it produce policies that work on the true PDE.

What would settle it

Deploy the trained neural policy on the original high-fidelity PDE simulator and measure whether target tracking and constraint satisfaction hold without large errors or instability over time.

Figures

Figures reproduced from arXiv: 2511.08992 by Dibakar Roy Sarkar, J\'an Drgo\v{n}a, Somdatta Goswami.

**Figure 1.** Figure 1: Schematic of the proposed Differentiable Predictive Control with Neural Operators. Forward propagation (green dashed arrows) computes control actions via a neural policy and evolves the system dynamics through a time-integrated neural operator. Backward propagation (dashed red arrows) computes gradients by differentiating through the closed-loop system, enabling end-to-end learning of constrained control… view at source ↗

**Figure 2.** Figure 2: HE control performance. Each scenario shows: (left) uncontrolled evolution from initial state (blue) to final state (red) versus target (black dotted); (middle) controlled trajectory achieving target; (right) applied control signals fi(t). 5.3. Burgers’ Equation: Shock Mitigation System Dynamics. Consider inviscid Burgers’ equation (BE) with periodic boundary conditions: ∂u ∂t + u ∂u ∂x = f(x, t), x ∈ [0, … view at source ↗

**Figure 3.** Figure 3: BE shock control. Each row: (left) uncontrolled shock development; (middle) controlled smooth evolution; (right) control signals fi(t); (far right) curvature loss reduction. 5.4. Fisher-KPP Equation: Population Density Control System Dynamics. Consider the Fisher-KPP reaction-diffusion equation (RDE) with Neumann (no-flux) boundaries: ∂u ∂t = α ∂ 2u ∂x2 + ru(1 − u) − f(x, t), x ∈ [0, 1], t ∈ [0, T], ∂u(0, … view at source ↗

**Figure 4.** Figure 4: RDE density control. Each row shows: (left) uncontrolled evolution from initial state (blue) to final state (red) versus target (black dotted); (middle) controlled trajectory achieving target; (right) applied control signals fi(t). Closed-Loop Results [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

We present a data-driven control framework for partial differential equations (PDEs). Our approach integrates Time-Integrated Deep Operator Networks (TI-DeepONets) as differentiable PDE surrogate models within the Differentiable Predictive Control (DPC)-a self-supervised learning framework for constrained neural control policies. The TI-DeepONet architecture learns temporal derivatives and couples them with numerical integrators, while the DPC algorithm uses automatic differentiation to compute policy gradients by backpropagating the expectations of the optimal control loss through the learned TI-DeepONet. This approach enables efficient offline optimization of neural policies without the need for online optimization or supervisory controllers. We empirically demonstrate the proposed method across diverse PDE systems, including the heat, the nonlinear Burgers', and the reaction-diffusion equations. The learned policies achieve target tracking, constraint satisfaction, and curvature minimization objectives, while generalizing across distributions of initial conditions and parameters. Moreover, we demonstrate four orders of magnitude acceleration at inference time compared to nonlinear model predictive control benchmarks. These results highlight the promise of operator learning for scalable model-based control of PDEs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper plugs time-integrated DeepONets into the DPC loop to learn offline neural policies for a few PDEs and reports large inference speedups, but the transfer from surrogate to true dynamics needs tighter checks.

read the letter

The core contribution is a practical pipeline that trains a TI-DeepONet to approximate temporal derivatives of the PDE, integrates those derivatives numerically, and then uses automatic differentiation through the surrogate to optimize a neural policy under the DPC loss. This lets them avoid online MPC solves at inference time. They show the approach on the heat equation, Burgers, and a reaction-diffusion system, with policies that track targets, respect constraints, and reduce curvature while generalizing over initial conditions and parameters. The reported four-order-of-magnitude speedup versus nonlinear MPC is the clearest practical payoff. The coupling itself is a straightforward extension of two existing pieces of work, but the concrete results across multiple distributed-parameter examples give it some engineering weight. Credit is due for shipping an end-to-end demonstration rather than just a theoretical sketch. The main soft spot is the limited visibility into surrogate fidelity once the policy starts generating its own trajectories. The abstract and stress-test note both flag the absence of quantitative error metrics, ablation studies on integration accuracy, or checks for how well the learned policy performs when the true PDE is substituted back in. If small per-step discrepancies compound over the horizon or in nonlinear regimes, the optimizer could be fitting to surrogate artifacts; without those diagnostics it is hard to judge how robust the reported tracking and constraint satisfaction really are. The citation pattern looks standard and the method does not appear to rest on circular fitting. For readers working on data-driven control of PDEs in process engineering or scientific computing, the paper supplies a usable recipe and concrete speed numbers worth examining. It is coherent on its own terms and shows clear thinking about the offline-learning angle. I would send it to peer review so the authors can supply the missing error analysis and closed-loop verification; the empirical claims are testable and the setup is simple enough that referees can check them directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes integrating Time-Integrated Deep Operator Networks (TI-DeepONets) as differentiable PDE surrogates into the Differentiable Predictive Control (DPC) framework for offline learning of neural control policies. It demonstrates the method on the heat equation, nonlinear Burgers' equation, and reaction-diffusion equations, claiming that the resulting policies achieve target tracking, constraint satisfaction, and curvature minimization while generalizing across distributions of initial conditions and parameters, and delivering four orders of magnitude faster inference than nonlinear model predictive control.

Significance. If the central claims hold, the work provides a practical route to scalable, offline policy optimization for infinite-dimensional systems by replacing online nonlinear optimization with learned policies that leverage operator-learning surrogates. The reported inference-time acceleration would be a notable engineering advantage for real-time PDE control applications. The approach also illustrates how automatic differentiation through learned temporal integrators can enable self-supervised policy training without supervisory controllers.

major comments (2)

[Abstract and Section 4 (Numerical Experiments)] Abstract and empirical demonstrations: no quantitative surrogate error metrics (e.g., trajectory-wise L2 or relative errors on states or derivatives) are reported for the TI-DeepONet when evaluated on the closed-loop trajectories produced by the learned policy. This is load-bearing for the transfer claim, because the DPC loss back-propagates expectations through the surrogate; without these metrics it remains possible that policies exploit surrogate discrepancies (especially in the nonlinear regimes of Burgers' or reaction-diffusion) rather than true dynamics.
[Section 3 (Method)] Section 3 (DPC formulation): the manuscript does not analyze or bound how per-step surrogate errors accumulate over the prediction horizon when the policy is optimized via gradients through the TI-DeepONet. For the curvature-minimization and constraint-satisfaction objectives this accumulation could systematically bias the learned policy away from true-PDE behavior; an ablation or sensitivity study on horizon length and surrogate accuracy would directly test this risk.

minor comments (2)

[Figures] Figure captions and axis labels should explicitly state whether performance metrics are averaged over multiple random seeds or initial-condition samples and whether error bars represent standard deviation.
[Section 2.2] The description of the numerical integrator coupled to the TI-DeepONet would benefit from an explicit equation showing the discrete-time update rule used inside the DPC loss.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the work's significance and for the constructive major comments. We address each point below and will incorporate the suggested additions in the revised manuscript to strengthen the validation of the surrogate transfer and error propagation analysis.

read point-by-point responses

Referee: [Abstract and Section 4 (Numerical Experiments)] Abstract and empirical demonstrations: no quantitative surrogate error metrics (e.g., trajectory-wise L2 or relative errors on states or derivatives) are reported for the TI-DeepONet when evaluated on the closed-loop trajectories produced by the learned policy. This is load-bearing for the transfer claim, because the DPC loss back-propagates expectations through the surrogate; without these metrics it remains possible that policies exploit surrogate discrepancies (especially in the nonlinear regimes of Burgers' or reaction-diffusion) rather than true dynamics.

Authors: We agree that quantitative surrogate error metrics evaluated specifically on closed-loop trajectories are necessary to support the transfer claim and rule out exploitation of surrogate discrepancies. In the revised manuscript we will add these metrics, reporting trajectory-wise L2 and relative errors on both states and derivatives for the TI-DeepONet predictions under the learned policies across all three PDE examples (heat, Burgers', and reaction-diffusion). revision: yes
Referee: [Section 3 (Method)] Section 3 (DPC formulation): the manuscript does not analyze or bound how per-step surrogate errors accumulate over the prediction horizon when the policy is optimized via gradients through the TI-DeepONet. For the curvature-minimization and constraint-satisfaction objectives this accumulation could systematically bias the learned policy away from true-PDE behavior; an ablation or sensitivity study on horizon length and surrogate accuracy would directly test this risk.

Authors: We acknowledge that a dedicated analysis of per-step error accumulation over the horizon is valuable, particularly for the curvature and constraint objectives. In the revised manuscript we will add a sensitivity study and ablation that varies prediction horizon length and surrogate accuracy levels, reporting the resulting effects on policy performance, constraint satisfaction, and any observed bias relative to true-PDE rollouts. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent surrogate training followed by empirical policy validation

full rationale

The paper trains a TI-DeepONet to approximate PDE dynamics (temporal derivatives plus integrator) from data, then applies standard automatic differentiation through this fixed surrogate inside the DPC loss to optimize a neural policy. Performance claims (target tracking, constraint satisfaction, generalization, and 10^4 speedup) are presented as post-training empirical results on the true PDE simulators, not as quantities that reduce by construction to the surrogate training loss or to any self-cited uniqueness theorem. No equation equates a fitted parameter to a claimed prediction, and no load-bearing step imports an ansatz or uniqueness result solely from the authors' prior work without external verification. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on the learned operator serving as a faithful differentiable proxy for the true PDE dynamics; no new physical entities are introduced.

free parameters (1)

TI-DeepONet weights and biases
Trained on simulation data to approximate temporal derivatives of the PDE state.

axioms (1)

domain assumption The PDE solution operator can be approximated well enough by a neural operator that gradients through the surrogate remain useful for policy optimization.
Invoked when back-propagating the control loss through the integrated TI-DeepONet.

pith-pipeline@v0.9.0 · 5499 in / 1247 out tokens · 30273 ms · 2026-05-17T23:07:02.362200+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TI-DeepONet architecture learns temporal derivatives and couples them with numerical integrators... backpropagating the expectations of the optimal control loss through the learned TI-DeepONet
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The learned policies achieve target tracking, constraint satisfaction, and curvature minimization objectives

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Employing Deep Neural Operators for PDE control by decoupling training and optimization

PMLR, 04–06 Jun 2025. Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021. Oliver GS Lundqvist and Fabricio Oliveira. Was residual penalty and neural operators all we needed for sol...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

and Dolan, John M

doi: 10.1109/ICRA57147.2024.10610381. Antranik A Siranosian, Miroslav Krstic, Andrey Smyshlyaev, and Matt Bement. Gain scheduling- inspired boundary control for nonlinear partial differential equations.Journal of dynamic systems, measurement, and control, 133(5), 2011. Rafael Vazquez and Miroslav Krstic. Control of 1-d parabolic pdes with volterra nonline...

work page doi:10.1109/icra57147.2024.10610381 2024

[1] [1]

Employing Deep Neural Operators for PDE control by decoupling training and optimization

PMLR, 04–06 Jun 2025. Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021. Oliver GS Lundqvist and Fabricio Oliveira. Was residual penalty and neural operators all we needed for sol...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

and Dolan, John M

doi: 10.1109/ICRA57147.2024.10610381. Antranik A Siranosian, Miroslav Krstic, Andrey Smyshlyaev, and Matt Bement. Gain scheduling- inspired boundary control for nonlinear partial differential equations.Journal of dynamic systems, measurement, and control, 133(5), 2011. Rafael Vazquez and Miroslav Krstic. Control of 1-d parabolic pdes with volterra nonline...

work page doi:10.1109/icra57147.2024.10610381 2024