pith. sign in

arxiv: 2512.10906 · v2 · submitted 2025-12-11 · 🧮 math.OC · cs.LG· cs.SY· eess.SY

Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets

Pith reviewed 2026-05-16 23:02 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SY
keywords distributionally robust controlregret optimal controllinear quadratic controlmoment ambiguity setsconvex reformulationstochastic controlrobust control
0
0 comments X

The pith

Worst-case regret minimization in linear-quadratic control with distributional ambiguity reduces to a tractable convex program.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers finite-horizon linear-quadratic stochastic control where the noise distribution is unknown but belongs to an ambiguity set defined by norm balls around nominal mean and covariance. It designs causal affine control policies that minimize the worst-case expected regret over this set. The central result is that this minimax problem is equivalent to a convex program that can be solved efficiently and interpreted as a regularized version of the standard nominal control problem. This provides a practical way to obtain controllers robust to uncertainty in the noise statistics.

Core claim

For linear-quadratic control problems with noise distributions in moment-based ambiguity sets, the problem of finding causal affine policies that minimize worst-case expected regret admits an equivalent reformulation as a tractable convex program, interpretable as a regularized nominal linear-quadratic stochastic control problem.

What carries the argument

Causal affine control policies that minimize worst-case expected regret over distributions whose means and covariances lie in norm balls.

Load-bearing premise

Causal affine policies are sufficient to achieve the minimax optimum and the moment-based ambiguity set fully captures the relevant distributional uncertainty.

What would settle it

A non-affine policy or distribution outside the moment balls that produces strictly lower worst-case regret than the value of the computed convex program.

Figures

Figures reproduced from arXiv: 2512.10906 by Eilyan Bitar, Feras Al Taha.

Figure 1
Figure 1. Figure 1: (a) Expected cost as a function of the ambiguity set radius [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Relative duality gap (31) (averaged over ten trials) versus the iteration count for the dual projected subgradient method (Algorithm 1) for different control horizons T. (b) Total exe￾cution time (averaged over ten trials) as a function of the control horizon T for the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse … view at source ↗
read the original abstract

We consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to belong to an ambiguity set consisting of all distributions whose mean and covariance lie within norm balls centered at given nominal values. To cope with this ambiguity, we design causal affine control policies to minimize the worst-case expected regret over all distributions in the ambiguity set. The resulting minimax optimal control problem is shown to admit an equivalent reformulation as a tractable convex program, which can be interpreted as a regularized version of the nominal linear-quadratic stochastic control problem. Based on the dual of this convex reformulation, we develop a scalable projected subgradient method for computing optimal controllers to arbitrary accuracy. Numerical experiments are provided to compare the proposed method with state-of-the-art data-driven control design methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper considers finite-horizon linear-quadratic stochastic control problems where the noise distribution is unknown but lies in a moment-based ambiguity set consisting of norm balls around nominal mean and covariance values. It restricts attention to causal affine policies that minimize the worst-case expected regret over this set, shows that the resulting minimax problem admits an equivalent reformulation as a tractable convex program (interpretable as a regularized nominal LQ problem), derives a scalable projected subgradient algorithm from the dual, and reports numerical comparisons against data-driven control methods.

Significance. If the central equivalence holds, the work supplies a computationally attractive convex-optimization route to distributionally robust regret-optimal control, with the regularization interpretation offering conceptual insight and the subgradient method providing a practical implementation path. The numerical experiments add evidence of applicability, but the significance is limited by the unresolved restriction to affine policies.

major comments (2)
  1. [Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.
  2. [Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.
minor comments (2)
  1. [Problem formulation] The definition of the ambiguity set (norm balls on mean and covariance) would benefit from explicit notation distinguishing the two radii and the choice of matrix norms.
  2. [Numerical experiments] Figure captions in the numerical experiments section should state the state/input dimensions, horizon length, and specific ambiguity radii used in each example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the scope and presentation of our results. We address each major comment below and will revise the manuscript to improve explicitness while preserving the paper's focus on causal affine policies.

read point-by-point responses
  1. Referee: [Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.

    Authors: The manuscript restricts attention to causal affine policies from the outset (see abstract and Section 2), without claiming that they attain the global optimum over all causal policies. The equivalence to the tractable convex program is established specifically within the affine class via dualization of the inner supremum over the moment-based ambiguity set. Because the regret objective prevents a direct quadratic DP argument, we deliberately limit the policy class to affine controllers to retain convexity and scalability. We will revise the abstract, introduction, and reformulation section to state this restriction explicitly and note that optimality over general causal policies is left open. This clarification removes any ambiguity about the scope without altering the technical claims. revision: partial

  2. Referee: [Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.

    Authors: We agree that additional intermediate steps will improve verifiability. In the revised version we will expand the reformulation section to include: (i) the explicit expression for the expected regret under an affine policy, (ii) the rewriting of the worst-case expectation as a function of the mean and covariance deviations, and (iii) the full dualization steps that convert the inner supremum into the convex program. These additions will confirm both convexity and equivalence to the original minimax problem over affine policies. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper restricts the policy class to causal affine controllers at the outset and derives the convex reformulation directly from the resulting minimax objective via standard duality for moment-based ambiguity sets. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the equivalence is obtained from the problem definition using convex optimization techniques without circular reduction. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from convex optimization and stochastic control; the ambiguity radii are user-specified inputs rather than fitted parameters.

free parameters (1)
  • ambiguity set radii for mean and covariance balls
    User-provided parameters that define the size of the uncertainty set; they are inputs to the problem rather than learned from data.
axioms (2)
  • domain assumption Causal affine policies achieve the minimax optimum
    Invoked to reduce the policy search space to tractable affine controllers.
  • standard math The moment ambiguity set is convex and compact
    Norm balls guarantee this property, enabling the convex reformulation.

pith-pipeline@v0.9.0 · 5449 in / 1277 out tokens · 120573 ms · 2026-05-16T23:02:34.210299+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion

    doi: 10.23919/ACC50511.2021.9483023. 12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176,

  2. [2]

    Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization problem in (14)

    It is straightforward to verify that, for each p∈[1,∞], the covariance matrixΣ ⋆ attains the upper bound in (21) and is feasible for the original maximization problem inΣsince it can be shown to satisfy∥Σ ⋆ −bΣ∥p =r 2 andΣ ⋆ ∈S n +. Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization probl...

  3. [3]

    Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N

    Application of Danskin’s theorem.The functionϕ(Λ, K)is jointly continuous in(Λ, K) over the setN × ¯L, and the mapΛ7→ϕ(Λ, K)is linear (hence concave and differentiable) inΛ for eachK∈ ¯L. Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N. Since ¯Lis a compact set, Danskin’s theorem (Bertsekas, 1999, Propositio...

  4. [4]

    Theorem 5LetX∈S n and denote its eigendecomposition byX=Udiag(λ)U ⊤, whereUis an orthonormal matrix

    Because all Schat- ten norms are unitarily invariant, the orthogonal projection of a matrix onto the Schattenp-norm ballS p r :={X∈S n | ∥X∥p ≤r}can be expressed in terms of the orthogonal projection of the corresponding vector of singular values onto theℓ p-norm ballB p r :={x∈R n | ∥x∥p ≤r}.The following result, adapted from (Beck, 2017, Theorem 7.18), ...

  5. [5]

    for different control horizonsT. (b) Total exe- cution time (averaged over ten trials) as a function of the control horizonTfor the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse step size1/η i can be interpreted as a local estimate of the Lipschitz constant of the gra...

  6. [6]

    Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005)

    This adaptive step size rule is closely related to the Barzilai-Borwein step sizes used in spectral projected gradient methods (Birgin et al., 2000). Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005). 20 DISTRIBUTIONALLYROBUSTREGRETO...