A trust-region method for optimal control of ODEs with continuous-or-off controls and TV regularization

Gerd Wachsmuth; Markus Friedemann

arxiv: 2508.10692 · v2 · pith:A2MMSDWGnew · submitted 2025-08-14 · 🧮 math.OC

A trust-region method for optimal control of ODEs with continuous-or-off controls and TV regularization

Markus Friedemann , Gerd Wachsmuth This is my paper

Pith reviewed 2026-05-22 13:09 UTC · model grok-4.3

classification 🧮 math.OC

keywords optimal controltrust-region methodproximal gradient methodtotal variation regularizationcontinuous-or-off controlsordinary differential equationsBellman optimality principleSIR model

0 comments

The pith

A trust-region proximal gradient method converges to criticality for optimal control problems with continuous-or-off controls and total variation regularization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an algorithm that combines trust-region and proximal gradient techniques to solve optimal control problems for ordinary differential equations. In these problems, the control input is either continuous or completely off, with a convex cost on the control value and a total variation term to discourage frequent switching. The subproblems at each iteration are solved exactly by applying Bellman's optimality principle from dynamic programming. Convergence of the sequence to a stationary point is shown with respect to a suitable criticality measure. This provides a practical way to compute controls with few switches for applications such as epidemic management.

Core claim

The authors propose a solution algorithm for optimal control problems subject to an ordinary differential equation where controls have a continuous-or-off structure, are priced by a convex function, and are regularized by total variation to penalize switches. The method merges a trust-region approach with a proximal gradient method. Subproblems are solved via Bellman's optimality principle. Convergence with respect to a criticality measure is proven, and the approach is illustrated on a simple optimal control problem involving an SIR model.

What carries the argument

The trust-region proximal gradient iteration, where each subproblem is solved exactly using Bellman's optimality principle exploiting the continuous-or-off control structure and convex pricing.

If this is right

The algorithm generates iterates whose criticality measure converges to zero.
Subproblems can be solved efficiently due to the special control structure.
The method applies to problems like SIR epidemic control with minimal switching.
Proven convergence supports reliable numerical optimization for switched systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar ideas might apply to problems with more general nonsmooth regularizations if subproblems remain tractable.
Extensions to stochastic or distributed parameter systems could follow if the dynamic programming structure generalizes.
The approach may offer advantages over purely gradient-based methods in handling the combinatorial aspect of switching.

Load-bearing premise

The subproblems arising during the trust-region proximal gradient iterations can be solved to optimality using Bellman's optimality principle due to the continuous-or-off structure and convex pricing.

What would settle it

Applying the algorithm to a problem where the subproblems cannot be solved via Bellman's optimality principle and observing that the criticality measure does not converge to zero would falsify the convergence claim.

read the original abstract

A solution algorithm for a special class of optimal control problems subject to an ordinary differential equation is proposed. The controls possess a continuous-or-off structure and are priced by a convex function. Additionally, a total variation regularization is applied to penalize switches. Our solution method combines a trust-region method and a proximal gradient method. The subproblems are solved via Bellman's optimality principle. Convergence with respect to a criticality measure is proven. As a numerical example, we solve a simple optimal control problem involving an SIR model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a trust-region proximal gradient algorithm for optimal control with continuous-or-off controls and TV regularization, solved by Bellman recursion on subproblems.

read the letter

The main takeaway is a trust-region method combined with proximal gradient steps for optimal control problems that have continuous-or-off controls priced by a convex function and penalized by total variation. Subproblems are solved exactly via Bellman's principle, and the authors prove convergence to a criticality measure. They close with a basic numerical test on an SIR model. That combination for this control class is the concrete new piece relative to the cited work on trust-region and proximal methods in control. The structure lets them keep subproblems tractable with dynamic programming, which is a practical fit for the on-off restriction and the TV term. The convergence argument appears self-contained without obvious fitting or circularity. The numerical example is simple but shows the method in action on a concrete ODE. One soft spot is the handling of the TV regularizer inside the Bellman recursion. If the proximal mapping or discretization adds dependence on the full control history, the Markov property needed for standard dynamic programming could break, and the exact subproblem solver would no longer be guaranteed. The abstract is also thin on the exact form of the criticality measure and how the ODE constraint is discretized, so those details matter for the proof. This is for people working on numerical methods for nonsmooth or switching optimal control. A reader already familiar with trust-region globalization or proximal algorithms in control would pick up usable ideas here. It is solid enough on its own terms to deserve a serious referee who can check the subproblem construction and the convergence details.

Referee Report

2 major / 2 minor

Summary. The paper proposes a trust-region proximal gradient algorithm for optimal control of ODEs where controls take a continuous-or-off structure, are priced by a convex function, and are penalized by total variation regularization to limit switches. Subproblems are solved exactly via Bellman's optimality principle, and convergence of the iterates to a criticality measure is established. The method is illustrated on a simple SIR epidemic control example.

Significance. If the convergence result is valid and the subproblems remain exactly solvable, the approach supplies a theoretically supported numerical scheme for a practically relevant subclass of switched optimal control problems. The combination of trust-region globalization with proximal gradient steps and dynamic-programming subproblem solves is a distinctive technical contribution that could extend to other regularized control settings with discrete structure.

major comments (2)

[§3] §3 (Subproblem formulation and solution): The assertion that each trust-region proximal subproblem admits an exact solution by Bellman recursion must be shown to remain valid once the total-variation term is included. The manuscript should explicitly state whether the TV penalty is absorbed into a standard finite-dimensional Markov state or whether an auxiliary state variable (previous control value) is required; if the latter, the dimension of the DP recursion and the claimed exact solvability need to be re-verified.
[Theorem 4.1] Theorem 4.1 (Convergence statement): The criticality measure to which the sequence converges is not defined in the abstract and appears only after the algorithmic description. The proof should clarify whether this measure accounts for both the continuous-or-off constraint and the non-smooth TV term, and whether the proximal mapping of the TV regularizer is shown to be single-valued and computable in closed form under the chosen convex pricing function.

minor comments (2)

[§5] The numerical example in §5 uses a simple SIR model; the manuscript should report the discretization scheme for the ODE, the number of time steps, and the observed number of switches in the computed control to allow reproducibility.
[Algorithm 1] Notation for the proximal operator and the trust-region radius update rule should be introduced consistently before their first use in the algorithm box.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation of the paper's significance and for the detailed, constructive major comments. We address each point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [§3] §3 (Subproblem formulation and solution): The assertion that each trust-region proximal subproblem admits an exact solution by Bellman recursion must be shown to remain valid once the total-variation term is included. The manuscript should explicitly state whether the TV penalty is absorbed into a standard finite-dimensional Markov state or whether an auxiliary state variable (previous control value) is required; if the latter, the dimension of the DP recursion and the claimed exact solvability need to be re-verified.

Authors: We agree that explicit treatment of the total-variation term is necessary for the dynamic-programming argument. The TV penalty depends on consecutive control values and therefore requires an auxiliary state variable that records the control value from the previous time step. This augments the Markov state by one dimension. Because the admissible control set has the continuous-or-off structure and the pricing function is convex, the resulting Bellman recursion on the augmented state remains exactly solvable by dynamic programming; no approximation is introduced. In the revised manuscript we will insert a new paragraph in §3 that (i) introduces the auxiliary state, (ii) states the dimension increase, and (iii) verifies that exact solvability is preserved. revision: yes
Referee: [Theorem 4.1] Theorem 4.1 (Convergence statement): The criticality measure to which the sequence converges is not defined in the abstract and appears only after the algorithmic description. The proof should clarify whether this measure accounts for both the continuous-or-off constraint and the non-smooth TV term, and whether the proximal mapping of the TV regularizer is shown to be single-valued and computable in closed form under the chosen convex pricing function.

Authors: We acknowledge that the criticality measure is introduced only after the algorithm is presented. The measure is the norm of the proximal-gradient residual that incorporates both the projection onto the continuous-or-off set and the proximal operator of the composite nonsmooth term (TV plus convex pricing). Theorem 4.1 proves that the sequence converges to a point at which this residual vanishes. Under the convexity of the pricing function the proximal mapping of the TV regularizer is single-valued and admits a closed-form expression obtained by comparing a finite number of candidate switch configurations at each time step. In the revision we will (a) add an early reference to the criticality measure in the introduction and (b) insert a short remark in the proof of Theorem 4.1 that explicitly records the single-valuedness and closed-form character of the proximal mapping. revision: partial

Circularity Check

0 steps flagged

No circularity: convergence proof is independent of fitted quantities or self-citation loops

full rationale

The paper defines a trust-region proximal gradient algorithm for ODE optimal control with continuous-or-off controls, convex pricing, and TV regularization; subproblems are solved exactly via Bellman's principle exploiting the problem structure, and convergence to a criticality measure is then proven under those assumptions. No equation or step reduces the claimed convergence result to a fitted parameter, a renamed input, or a load-bearing self-citation whose validity depends on the present work. The derivation chain remains self-contained against the stated problem class and does not invoke uniqueness theorems or ansatzes from prior author work that would close a loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract does not introduce new free parameters, invented entities, or non-standard axioms beyond the usual background assumptions of optimal control theory (existence of solutions to the ODE, convexity of the pricing function).

axioms (1)

domain assumption The optimal control problem admits solutions and the subproblems are solvable via Bellman's optimality principle under the given control structure.
Invoked implicitly when stating that subproblems are solved via Bellman's principle.

pith-pipeline@v0.9.0 · 5608 in / 1153 out tokens · 24043 ms · 2026-05-22T13:09:41.268625+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The subproblems are solved via Bellman’s optimality principle. Convergence with respect to a criticality measure is proven.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

model function mu,Δ(w) … + TV(sgn(w)) … solved by value-function recursion Φ(l,α,B)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.