pith. sign in

arxiv: 2606.29343 · v1 · pith:7UUQR3EOnew · submitted 2026-06-28 · 🧮 math.OC

Turnpike and Sparse Optimal Control for Semiautonomous Neural ODEs

Pith reviewed 2026-06-30 02:38 UTC · model grok-4.3

classification 🧮 math.OC
keywords turnpike propertysparse optimal controlsemiautonomous neural ODEL1 regularizationexponential turnpiketemporal sparsityPontryagin maximum principledissipativity
0
0 comments X

The pith

Optimal state-control pairs for semiautonomous neural ODEs with L1-regularized controls remain exponentially close to a stationary optimum for most of any long time horizon and exhibit one-sided temporal sparsity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that for long-time optimal control problems involving control-affine semiautonomous neural ordinary differential equations regularized with an L1 penalty on the control, the optimal trajectories satisfy an exponential turnpike property. This means they approach and stay close to a stationary optimal pair exponentially fast, with the rate independent of the total time horizon T. Additionally, the L1 penalty forces the control to be active at full strength only on an initial segment whose length does not grow with T, and to be zero thereafter. An integral version of the turnpike property bounds the average deviation uniformly in T. These results rely on dissipativity and adjoint bounds that hold uniformly for the semiautonomous structure.

Core claim

For control-affine semiautonomous neural ODEs subject to L1-regularized optimal control over long horizons, the optimal state-control pairs satisfy an exponential turnpike property with T-independent decay, the controls display one-sided temporal sparsity with a T-independent activation interval, and the time-averaged deviation from the stationary optimum remains bounded independently of T. The proofs rest on dissipativity inequalities derived from the Pontryagin system together with a time-rescaling argument tailored to the semiautonomous architecture.

What carries the argument

The Pontryagin optimality system for the semiautonomous neural ODE, combined with uniform dissipativity inequalities and adjoint bounds independent of the horizon length T.

If this is right

  • The decay rate to the turnpike is independent of horizon length, allowing uniform estimates for arbitrarily long times.
  • The sparsity time T* does not depend on T for large T, enabling control strategies that switch off after a fixed initial period.
  • Time-averaged costs or deviations stay bounded as T grows, supporting long-horizon planning without degradation.
  • Numerical validation on Duffing oscillator and damped pendulum shows the predicted three-phase behavior and 30-fold parameter reduction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These turnpike and sparsity features may extend to other neural ODE architectures if similar dissipativity can be established.
  • The one-sided sparsity could reduce computational cost in real-time control applications by limiting active control phases.
  • The uniform bounds suggest that training such controlled systems on finite horizons approximates infinite-horizon behavior reliably.

Load-bearing premise

The semiautonomous neural ODE architecture must satisfy dissipativity inequalities and uniform adjoint bounds from the Pontryagin system that do not depend on the horizon length T.

What would settle it

A numerical counterexample where, for sufficiently large T, the optimal control remains active beyond some fixed T* that grows with T, or where the exponential closeness rate deteriorates as T increases.

read the original abstract

We study long-time optimal control of control-affine semiautonomous neural ordinary differential equations (SA-NODEs) with $\ell^1$-regularized controls. Three results are established. First, optimal state-control pairs satisfy an \emph{exponential turnpike property}: they remain exponentially close to a stationary optimal pair for most of the time horizon, with decay rate and prefactor independent of the horizon length $T$. Second, $\ell^1$ penalisation induces \emph{one-sided temporal sparsity}: optimal controls are active at full amplitude on an initial arc $[0,T^*]$ and vanish identically on $(T^*,T)$, where $T^*$ is independent of $T$ for $T$ large. Third, an integral turnpike estimate shows the time-averaged deviation from the stationary pair is bounded uniformly in $T$. The proofs combine dissipativity inequalities, uniform adjoint bounds via the Pontryagin optimality system, and a time-rescaling argument adapted to the semiautonomous architecture. Numerical experiments on a Duffing oscillator and a damped pendulum confirm the three-phase turnpike profile and the one-sided sparsity structure, and demonstrate a $30\times$ parameter reduction over vanilla NODEs with no loss of stabilization performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript claims three main results for optimal control of semiautonomous neural ODEs (SA-NODEs) with L1-regularized controls: (1) exponential turnpike property where optimal pairs stay close to a stationary optimum for most of the horizon with T-independent rates; (2) one-sided temporal sparsity where controls are full amplitude on [0,T*] and zero after, with T* independent of T; (3) integral turnpike bounding time-averaged deviation uniformly in T. Proofs rely on dissipativity, Pontryagin adjoint bounds, and time-rescaling; numerics on Duffing and pendulum confirm and show 30x parameter reduction.

Significance. If the dissipativity and uniform bounds hold for the neural architecture, the results provide a theoretical basis for efficient long-horizon sparse control of neural dynamical systems, extending turnpike theory to this setting and highlighting practical benefits in model reduction for stabilization. The numerical demonstration of parameter reduction without loss of performance is a concrete strength.

major comments (1)
  1. [Abstract] Abstract: the claim that the SA-NODE architecture 'admits' dissipativity inequalities and T-independent adjoint bounds extracted from the Pontryagin optimality system is load-bearing for the exponential turnpike, one-sided sparsity, and integral turnpike results, yet the text supplies no explicit structural hypothesis (e.g., bound on the spectral radius or Lipschitz constant of the network Jacobian appearing in the adjoint equation) that would guarantee uniformity in T for arbitrary weights.
minor comments (1)
  1. The abstract refers to 'three theorems' without indicating their numbering or section locations in the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The single major comment identifies a genuine gap in the explicitness of the structural hypotheses. We address it directly below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the SA-NODE architecture 'admits' dissipativity inequalities and T-independent adjoint bounds extracted from the Pontryagin optimality system is load-bearing for the exponential turnpike, one-sided sparsity, and integral turnpike results, yet the text supplies no explicit structural hypothesis (e.g., bound on the spectral radius or Lipschitz constant of the network Jacobian appearing in the adjoint equation) that would guarantee uniformity in T for arbitrary weights.

    Authors: We agree that the manuscript does not currently state an explicit structural hypothesis on the neural network weights that guarantees the dissipativity inequalities and T-independent adjoint bounds for arbitrary weights. The proofs rely on such uniformity, which holds only under suitable conditions on the network (e.g., a bound on the Lipschitz constant of the vector field or the spectral radius of the Jacobian in the adjoint equation). In the revised version we will add a precise Assumption (e.g., Assumption 2.3) quantifying these bounds, update the abstract to reference the assumption, and restate the main theorems under this hypothesis. The numerical examples already satisfy the condition, so the reported results remain valid. revision: yes

Circularity Check

0 steps flagged

No circularity: results derived from standard Pontryagin-based assumptions without reduction to inputs

full rationale

The paper establishes exponential turnpike, one-sided sparsity, and integral turnpike via dissipativity inequalities, T-independent adjoint bounds from the Pontryagin optimality system, and time-rescaling adapted to the semiautonomous architecture. No quoted step equates a claimed prediction or result to a fitted parameter, self-defined quantity, or self-citation chain by construction. The architecture is stated to admit the required inequalities, and proofs are presented as following from these plus standard optimal-control tools; numerical experiments on Duffing and pendulum confirm rather than define the claims. This matches the default case of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the work rests on standard optimal-control assumptions rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Pontryagin optimality system yields uniform adjoint bounds independent of T
    Invoked to obtain the exponential decay rates and sparsity structure.
  • domain assumption The semiautonomous neural ODE satisfies dissipativity inequalities
    Used to establish the turnpike property.

pith-pipeline@v0.9.1-grok · 5749 in / 1345 out tokens · 44782 ms · 2026-06-30T02:38:24.338035+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Neural ordinary differential equations.Advances in neural information process- ing systems, 31, 2018

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information process- ing systems, 31, 2018

  2. [2]

    A proposal on machine learning via dynamical systems.Links, 2024:08–27, 2017

    E Weinan, Yali Duan, Linghua Kong, and Min Guo. A proposal on machine learning via dynamical systems.Links, 2024:08–27, 2017

  3. [3]

    Deep learning: An introduction for applied mathematicians.Siam review, 61(4):860–891, 2019

    Catherine F Higham and Desmond J Higham. Deep learning: An introduction for applied mathematicians.Siam review, 61(4):860–891, 2019

  4. [4]

    Cambridge University Press, 2022

    Steven L Brunton and J Nathan Kutz.Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2022. 3

  5. [5]

    Neural ode control for classification, approximation, and transport.SIAM Review, 65(3):735–773, 2023

    Domenec Ruiz-Balet and Enrique Zuazua. Neural ode control for classification, approximation, and transport.SIAM Review, 65(3):735–773, 2023

  6. [6]

    Universalapproximationproperty of neural ordinary differential equations.arXiv preprint arXiv:2012.02414, 2020

    Takeshi Teshima, Koichi Tojo, Masahiro Ikeda, Isao Ishikawa, and Kenta Oono. Universal approximation property of neural ordinary differential equations. arXiv preprint arXiv:2012.02414, 2020

  7. [7]

    Universal approxi- mation of dynamical systems by semiautonomous neural odes and applications

    Ziqian Li, Kang Liu, Lorenzo Liverani, and Enrique Zuazua. Universal approxi- mation of dynamical systems by semiautonomous neural odes and applications. SIAM Journal on Numerical Analysis, 64(1):193–223, 2026

  8. [8]

    Interpolation and ap- proximation via momentum resnets and neural odes.Systems & Control Letters, 162:105182, 2022

    Domenec Ruiz-Balet, Elisa Affili, and Enrique Zuazua. Interpolation and ap- proximation via momentum resnets and neural odes.Systems & Control Letters, 162:105182, 2022

  9. [9]

    Interpolation, approx- imation, and controllability of deep neural networks.SIAM Journal on Control and Optimization, 63(1):625–649, 2025

    Jingpu Cheng, Qianxiao Li, Ting Lin, and Zuowei Shen. Interpolation, approx- imation, and controllability of deep neural networks.SIAM Journal on Control and Optimization, 63(1):625–649, 2025

  10. [10]

    Interplay between depth and width for interpolation in neural odes.Neural Networks, 180:106640, 2024

    Antonio Álvarez-López, Arselane Hadj Slimane, and Enrique Zuazua. Interplay between depth and width for interpolation in neural odes.Neural Networks, 180:106640, 2024

  11. [11]

    Generalization bounds for neural ordinary differential equations and deep residual networks.Advances in neural information processing systems, 36:48918–48938, 2023

    Pierre Marion. Generalization bounds for neural ordinary differential equations and deep residual networks.Advances in neural information processing systems, 36:48918–48938, 2023

  12. [12]

    Deep neural networks, generic universal interpolation, and controlled odes.SIAM Journal on Mathe- matics of Data Science, 2(3):901–919, 2020

    Christa Cuchiero, Martin Larsson, and Josef Teichmann. Deep neural networks, generic universal interpolation, and controlled odes.SIAM Journal on Mathe- matics of Data Science, 2(3):901–919, 2020

  13. [13]

    Neural ode control for trajectory approximation of continuity equation

    Karthik Elamvazhuthi, Bahman Gharesifard, Andrea L Bertozzi, and Stanley Osher. Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters, 6:3152–3157, 2022

  14. [14]

    Constructive interpolation and generalization rates for neural ODEs: a control perspective

    Antonio Álvarez-López, Lorenzo Liverani, and Enrique Zuazua. Constructive interpolation and generalization rates for neural odes: a control perspective. arXiv preprint arXiv:2606.00469, 2026

  15. [15]

    Learning on manifolds: Universal approximations properties using geometric 4 controllability conditions for neural odes

    Karthik Elamvazhuthi, Xuechen Zhang, Samet Oymak, and Fabio Pasqualetti. Learning on manifolds: Universal approximations properties using geometric 4 controllability conditions for neural odes. InLearning for Dynamics and Control Conference, pages 1–11. PMLR, 2023

  16. [16]

    Sparsity in long-time control of neural odes.Systems & Control Letters, 172:105452, 2023

    Carlos Esteve-Yagüe and Borjan Geshkovski. Sparsity in long-time control of neural odes.Systems & Control Letters, 172:105452, 2023

  17. [17]

    Turnpike in optimal control of pdes, resnets, and beyond.Acta Numerica, 31:135–263, 2022

    Borjan Geshkovski and Enrique Zuazua. Turnpike in optimal control of pdes, resnets, and beyond.Acta Numerica, 31:135–263, 2022

  18. [18]

    Augmented neural odes

    Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. Augmented neural odes. Advances in neural information processing systems, 32, 2019

  19. [19]

    Neuralcontrolled differential equations for irregular time series.Advances in neural information processing systems, 33:6696–6707, 2020

    PatrickKidger, JamesMorrill, JamesFoster, andTerryLyons. Neuralcontrolled differential equations for irregular time series.Advances in neural information processing systems, 33:6696–6707, 2020

  20. [20]

    Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

    Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

  21. [21]

    Stable architectures for deep neural networks

    Eldad Haber and Lars Ruthotto. Stable architectures for deep neural networks. Inverse problems, 34(1):014004, 2018

  22. [22]

    Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Research, 24(89):1–97, 2023

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Research, 24(89):1–97, 2023

  23. [23]

    Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

    Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

  24. [24]

    Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

    Allan Pinkus. Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

  25. [25]

    The barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

    Chao Ma, Lei Wu, et al. The barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

  26. [26]

    Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

    Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018. 5

  27. [27]

    Optimal approximation of zonoids and uniform approxima- tion by shallow neural networks.Constructive Approximation, 62(2):441–469, 2025

    Jonathan W Siegel. Optimal approximation of zonoids and uniform approxima- tion by shallow neural networks.Constructive Approximation, 62(2):441–469, 2025

  28. [28]

    Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks.Foundations of Com- putational Mathematics, 24(2):481–537, 2024

    Jonathan W Siegel and Jinchao Xu. Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks.Foundations of Com- putational Mathematics, 24(2):481–537, 2024

  29. [29]

    Two-layer networks with the relu k activation function: Barron spaces and derivative ap- proximation.Numerische Mathematik, 156(1):319–344, 2024

    Yuanyuan Li, Shuai Lu, Peter Mathé, and Sergei V Pereverzev. Two-layer networks with the relu k activation function: Barron spaces and derivative ap- proximation.Numerische Mathematik, 156(1):319–344, 2024

  30. [30]

    Spectral barron space for deep neural network approximation.SIAM Journal on Mathematics of Data Science, 7(3):1053–1076, 2025

    Yulei Liao and Pingbing Ming. Spectral barron space for deep neural network approximation.SIAM Journal on Mathematics of Data Science, 7(3):1053–1076, 2025

  31. [31]

    Neural operators for accelerating sci- entific simulations and design.Nature Reviews Physics, 6(5):320–328, 2024

    Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, and Anima Anandkumar. Neural operators for accelerating sci- entific simulations and design.Nature Reviews Physics, 6(5):320–328, 2024

  32. [32]

    Laplace neural operator for solving differential equations.Nature Machine Intelligence, 6(6): 631–640, 2024

    Qianying Cao, Somdatta Goswami, and George Em Karniadakis. Laplace neural operator for solving differential equations.Nature Machine Intelligence, 6(6): 631–640, 2024

  33. [33]

    Spectral op- erator learning for parametric pdes without data reliance.Computer Methods in Applied Mechanics and Engineering, 420:116678, 2024

    Junho Choi, Taehyun Yun, Namjung Kim, and Youngjoon Hong. Spectral op- erator learning for parametric pdes without data reliance.Computer Methods in Applied Mechanics and Engineering, 420:116678, 2024

  34. [34]

    Neural operators for adaptive control of freeway traffic.Automatica, 182:112553, 2025

    Kaijing Lv, Junmin Wang, Yihuai Zhang, and Huan Yu. Neural operators for adaptive control of freeway traffic.Automatica, 182:112553, 2025

  35. [35]

    Improved gener- alization with deep neural operators for engineering systems: Path towards dig- ital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

    Kazuma Kobayashi, James Daniell, and Syed Bahauddin Alam. Improved gener- alization with deep neural operators for engineering systems: Path towards dig- ital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

  36. [36]

    Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems

    Kazuma Kobayashi and Syed Bahauddin Alam. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems. Scientific reports, 14(1):2101, 2024. 6

  37. [37]

    Salah A Faroughi, Nikhil M Pawar, Celio Fernandes, Maziar Raissi, Subasish Das, Nima K Kalantari, and Seyed Kourosh Mahjour. Physics-guided, physics- informed, and physics-encoded neural networks and operators in scientific com- puting: Fluid and solid mechanics.Journal of Computing and Information Science in Engineering, 24(4):040802, 2024

  38. [38]

    The admm-pinns algorith- mic framework for nonsmooth pde-constrained optimization: a deep learning approach.SIAM Journal on Scientific Computing, 46(6):C659–C687, 2024

    Yongcun Song, Xiaoming Yuan, and Hangrui Yue. The admm-pinns algorith- mic framework for nonsmooth pde-constrained optimization: a deep learning approach.SIAM Journal on Scientific Computing, 46(6):C659–C687, 2024

  39. [39]

    The hard-constraint pinns for interface optimal control problems.SIAM Journal on Scientific Computing, 47(3):C601–C629, 2025

    Ming-Chih Lai, Yongcun Song, Xiaoming Yuan, Hangrui Yue, and Tianyou Zeng. The hard-constraint pinns for interface optimal control problems.SIAM Journal on Scientific Computing, 47(3):C601–C629, 2025

  40. [40]

    Respecting causality for training physics-informed neural networks.Computer Methods in Applied Me- chanics and Engineering, 421:116813, 2024

    Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks.Computer Methods in Applied Me- chanics and Engineering, 421:116813, 2024

  41. [41]

    Control of neural transport for nor- malising flows.Journal de Mathématiques Pures et Appliquées, 181:58–90, 2024

    Domenec Ruiz-Balet and Enrique Zuazua. Control of neural transport for nor- malising flows.Journal de Mathématiques Pures et Appliquées, 181:58–90, 2024. 7