arxiv: 2605.02653 · v1 · submitted 2026-05-04 · 🧮 math.OC

Recognition: 3 theorem links

· Lean Theorem

Mirror Descent for Deterministic Optimal Control

Ye Feng , Jianfeng Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:50 UTC · model grok-4.3

classification 🧮 math.OC

keywords mirror descentoptimal controlPontryagin maximum principleBregman divergencerelative smoothnessconvergence ratesdeterministic controlenergy dissipation

0 comments

The pith

Mirror descent updates using Bregman penalties on the Hamiltonian achieve O(1/n) convergence for deterministic optimal control under smoothness assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an explicit mirror-descent algorithm for finite-horizon deterministic optimal control problems. At each step the state and adjoint equations are solved forward and backward, after which the control is updated by maximizing a first-order approximation to the regularized Hamiltonian penalized by a Bregman divergence. Under global smoothness of the cost and uniform convexity of the mirror map, the authors derive a relative smoothness property for the objective together with an energy-dissipation inequality that holds for small enough steps. Adding concavity of the unregularized Hamiltonian and convexity of the terminal cost yields relative convexity of the regularized problem, which in turn produces the stated convergence rates.

Core claim

Under global smoothness assumptions and uniform convexity of the mirror map, the cost functional satisfies a relative smoothness estimate and the iteration obeys an energy dissipation inequality for sufficiently small step sizes. With the further assumptions that the unregularized Hamiltonian is concave and the terminal cost is convex, the regularized objective is relatively convex. These two properties together deliver an O(1/n) convergence rate in the unregularized convex case and a geometric rate whenever the control regularization parameter is positive.

What carries the argument

The explicit mirror-descent update obtained by maximizing a linearized regularized Hamiltonian penalized by a Bregman divergence after solving the state and adjoint equations.

If this is right

In the unregularized convex case the method converges at rate O(1/n).
When the control regularization parameter is positive the iterates converge geometrically.
The same estimates apply directly to linear-quadratic problems and extend to degenerate-convex and nonlinear high-dimensional examples.
In the Euclidean case the update reduces to a projected gradient step on the control variable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relative-convexity framework could be used to incorporate acceleration techniques such as momentum once the basic descent property is established.
Different choices of mirror map could adapt the method to control sets with non-Euclidean geometry without changing the underlying Pontryagin update.
If analogous smoothness and concavity conditions can be verified, the approach may extend to stochastic or infinite-horizon problems.

Load-bearing premise

Global smoothness of the cost functional and uniform convexity of the mirror map are needed to obtain the relative smoothness and energy dissipation inequality that deliver the convergence rates.

What would settle it

A concrete optimal-control instance satisfying all other assumptions but violating global smoothness, for which the energy-dissipation inequality fails even at arbitrarily small step sizes.

Figures

Figures reproduced from arXiv: 2605.02653 by Jianfeng Lu, Ye Feng.

**Figure 1.** Figure 1: Objective convergence of Algorithm 1 for the one-dimensional linearquadratic example with different values of the regularization parameter τ . The vertical axis shows the cost error |J τ (u n ) − J τ (u ∗ )| on a logarithmic scale. For τ > 0, the unique optimal control is u ∗ t ≡ 0, x∗ t ≡ 0. For τ = 0, every control with zero terminal state is optimal, including u ∗ t ≡ 0. The regularized Hamiltonian is … view at source ↗

**Figure 2.** Figure 2: Objective convergence of Algorithm 1 for the one-dimensional quartic terminal example with different values of the regularization parameter τ . The cost error |J τ (u n ) − J τ (u ∗ )| is shown on semilog (left) and log-log (right) scales, highlighting the difference between exponential convergence for τ > 0 and sublinear behavior when τ = 0. Therefore, this example provides a simple control problem in wh… view at source ↗

**Figure 3.** Figure 3: Objective convergence of Algorithm 1 for the high-dimensional nonlinear example with coupled dynamics, comparing different state dimensions d. The cost error |J τ (u n ) − J τ (u N )|, a surrogate gap relative to the final iterate, is plotted on a semilog scale. All curves exhibit exponential decay, while higher-dimensional problems require more iterations to achieve the same level of accuracy. Appendix A.… view at source ↗

read the original abstract

We study an explicit mirror-descent method for finite-horizon deterministic optimal control problems. The method is motivated by Pontryagin's maximum principle: at each iteration, one solves the state and adjoint equations and updates the control by maximizing a first-order approximation of the regularized Hamiltonian penalized by a Bregman divergence. In the Euclidean case, the update reduces to a projected gradient step in the control variable. Under global smoothness assumptions and uniform convexity of the mirror map, we prove a relative smoothness estimate for the cost functional and derive an energy dissipation inequality for sufficiently small step sizes. Under an additional concavity assumption on the unregularized Hamiltonian and convexity of the terminal cost, we establish relative convexity of the regularized objective. These estimates yield an $O(1/n)$ convergence rate in the unregularized convex case and a geometric rate when the control regularization parameter is positive. Numerical examples illustrate the behavior of the method in linear-quadratic, degenerate convex, and nonlinear high-dimensional settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces an explicit mirror-descent algorithm for finite-horizon deterministic optimal control. At each iteration the state and adjoint equations are solved and the control is updated by maximizing a first-order approximation to the regularized Hamiltonian penalized by a Bregman divergence; in the Euclidean case this reduces to a projected gradient step. Under global smoothness of the cost functional and uniform convexity of the mirror map the authors prove a relative-smoothness estimate and an energy-dissipation inequality for sufficiently small step sizes. Adding concavity of the unregularized Hamiltonian and convexity of the terminal cost yields relative convexity of the regularized objective. These estimates deliver an O(1/n) rate in the unregularized convex case and a geometric rate when the control-regularization parameter is positive. Numerical examples are given for linear-quadratic, degenerate-convex, and nonlinear high-dimensional problems.

Significance. If the stated convergence results hold, the work supplies a new first-order method for optimal control with explicit rates that exploits non-Euclidean geometry via the mirror map. The derivation from Pontryagin's principle, the relative-smoothness/convexity analysis, and the provision of both proofs and numerical illustrations constitute a coherent contribution that could be useful for regularized or structured control problems.

major comments (2)

[§4.2] §4.2 (energy-dissipation inequality): the proof requires the step size to be smaller than an unspecified constant that depends on the global smoothness modulus and the strong-convexity parameter of the mirror map; without an explicit expression or a computable bound the practical applicability of the O(1/n) guarantee is limited.
[§4.3] §4.3 (relative-convexity claim): the additional concavity assumption on the unregularized Hamiltonian is used to obtain relative convexity of the regularized objective, yet the manuscript does not discuss how restrictive this assumption is for typical control Hamiltonians or whether it can be relaxed to local concavity while preserving the geometric rate.

minor comments (3)

[Abstract] Abstract: the convergence statement should read “O(1/n) convergence rate” with consistent mathematical formatting.
[§3.1] §3.1 (algorithm statement): the Bregman divergence term is introduced without an explicit definition or reference to its properties before being used in the update rule.
[Numerical experiments] Numerical section: the high-dimensional nonlinear example would benefit from a direct comparison table against Euclidean gradient descent to quantify the benefit of the chosen mirror map.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the recommendation for minor revision. Below we address the two major comments point by point.

read point-by-point responses

Referee: [§4.2] §4.2 (energy-dissipation inequality): the proof requires the step size to be smaller than an unspecified constant that depends on the global smoothness modulus and the strong-convexity parameter of the mirror map; without an explicit expression or a computable bound the practical applicability of the O(1/n) guarantee is limited.

Authors: We agree that an explicit, computable bound on the step size would strengthen the practical applicability of the O(1/n) result. In the revised manuscript we will derive and state such a bound explicitly in terms of the global smoothness modulus L of the cost functional and the strong-convexity parameter μ of the mirror map. The admissible step-size interval will be given as 0 < η ≤ μ/(2L) (or the precise constant arising from the relative-smoothness estimate), and this will be incorporated into the statement of the energy-dissipation inequality in §4.2 together with a short remark on how the constants can be estimated from problem data. revision: yes
Referee: [§4.3] §4.3 (relative-convexity claim): the additional concavity assumption on the unregularized Hamiltonian is used to obtain relative convexity of the regularized objective, yet the manuscript does not discuss how restrictive this assumption is for typical control Hamiltonians or whether it can be relaxed to local concavity while preserving the geometric rate.

Authors: The concavity assumption on the unregularized Hamiltonian is indeed restrictive; it is satisfied by linear-quadratic problems and by problems with affine dynamics and convex running costs, but fails for many nonlinear systems. In the revision we will add a brief discussion (new paragraph in §4.3 and a remark in the introduction) that clarifies the scope of the assumption and illustrates it on the linear-quadratic and degenerate-convex examples already present in the paper. Relaxing the assumption to mere local concavity would in general only guarantee local geometric convergence; obtaining a global rate under local concavity appears to require a different analysis (e.g., via localization arguments or Lyapunov functions that are not currently developed) and is therefore left for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs the mirror-descent update directly from Pontryagin's maximum principle by maximizing a first-order approximation of the regularized Hamiltonian plus Bregman penalty. It then derives relative smoothness of the cost functional, an energy-dissipation inequality, and relative convexity from explicit global smoothness, uniform convexity of the mirror map, concavity of the unregularized Hamiltonian, and convexity of the terminal cost. These are independent hypotheses, not fitted parameters or self-referential definitions; the O(1/n) and geometric rates follow from the resulting inequalities without reducing to the inputs by construction. No self-citations, ansatzes smuggled via prior work, or renaming of known results appear as load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 4 axioms · 0 invented entities

The central claims rest on four domain assumptions about smoothness and convexity that are standard in optimization but must hold globally for the rates to apply; no free parameters or new entities are introduced.

axioms (4)

domain assumption Global smoothness assumptions on the cost functional
Invoked to obtain the relative smoothness estimate for the cost
domain assumption Uniform convexity of the mirror map
Required to derive the energy dissipation inequality for small step sizes
domain assumption Concavity assumption on the unregularized Hamiltonian
Needed to establish relative convexity of the regularized objective
domain assumption Convexity of the terminal cost
Used together with Hamiltonian concavity for relative convexity

pith-pipeline@v0.9.0 · 5458 in / 1657 out tokens · 48001 ms · 2026-05-08T17:50:40.219461+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.AlphaCoordinateFixation washburn_uniqueness_aczel; cost_alpha_one_eq_jcost unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We consider a regularized optimal control problem of the form J_τ(u) = ∫₀ᵀ (f_t(x_t,u_t) + τ h(u_t)) dt + g(x_T)... The use of a general convex regularizer allows us to formulate the algorithm in terms of Bregman divergences
Cost (J(x) = ½(x + x⁻¹) − 1) Jcost_unit0 unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Take h(u)=½|u|². Then D_h(v|u)=½|v−u|². Hence the mirror step becomes... Euclidean projected gradient descent is the special case of mirror descent associated with the quadratic mirror map h(u)=½|u|².

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 1 canonical work pages

[1]

2010 , publisher=

Optimal control , author=. 2010 , publisher=

2010
[2]

IEEE Transactions on Automatic Control , volume=

On global convergence of an algorithm for optimal control , author=. IEEE Transactions on Automatic Control , volume=. 1980 , publisher=

1980
[3]

SIAM Journal on Control and Optimization , volume=

On an algorithm for optimal control using Pontryagin’s maximum principle , author=. SIAM Journal on Control and Optimization , volume=. 1986 , publisher=

1986
[4]

1999 , publisher=

Stochastic controls: Hamiltonian systems and HJB equations , author=. 1999 , publisher=

1999
[5]

The Annals of Applied Probability , volume=

The modified MSA, a gradient flow and convergence , author=. The Annals of Applied Probability , volume=. 2024 , publisher=

2024
[6]

Stochastic Processes and their Applications , pages=

Mirror descent for stochastic control problems with measure-valued controls , author=. Stochastic Processes and their Applications , pages=. 2025 , publisher=

2025
[7]

arXiv preprint arXiv:2506.02564 , year=

Mirror descent for constrained stochastic control problems , author=. arXiv preprint arXiv:2506.02564 , year=

work page arXiv
[8]

SIAM Journal on Control and Optimization , volume=

Linear convergence of a policy gradient method for some finite horizon continuous time control problems , author=. SIAM Journal on Control and Optimization , volume=. 2023 , publisher=

2023
[9]

Optimization , volume=

An iterative Bregman regularization method for optimal control problems with inequality constraints , author=. Optimization , volume=. 2016 , publisher=

2016
[10]

Numerical Functional Analysis and Optimization , volume=

Inexact iterative Bregman method for optimal control problems , author=. Numerical Functional Analysis and Optimization , volume=. 2018 , publisher=

2018
[11]

Optimal Control Applications and Methods , volume=

Method of successive approximations for solution of optimal control problems , author=. Optimal Control Applications and Methods , volume=. 1982 , publisher=

1982
[12]

USSR Computational Mathematics and Mathematical Physics , volume=

An algorithm for the method of successive approximations in optimal control problems , author=. USSR Computational Mathematics and Mathematical Physics , volume=. 1972 , publisher=

1972
[13]

Applied Mathematics & Optimization , volume=

A modified MSA for stochastic control problems , author=. Applied Mathematics & Optimization , volume=. 2021 , publisher=

2021