Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets

Eilyan Bitar; Feras Al Taha

arxiv: 2512.10906 · v2 · submitted 2025-12-11 · 🧮 math.OC · cs.LG· cs.SY· eess.SY

Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets

Feras Al Taha , Eilyan Bitar This is my paper

Pith reviewed 2026-05-16 23:02 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SY

keywords distributionally robust controlregret optimal controllinear quadratic controlmoment ambiguity setsconvex reformulationstochastic controlrobust control

0 comments

The pith

Worst-case regret minimization in linear-quadratic control with distributional ambiguity reduces to a tractable convex program.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers finite-horizon linear-quadratic stochastic control where the noise distribution is unknown but belongs to an ambiguity set defined by norm balls around nominal mean and covariance. It designs causal affine control policies that minimize the worst-case expected regret over this set. The central result is that this minimax problem is equivalent to a convex program that can be solved efficiently and interpreted as a regularized version of the standard nominal control problem. This provides a practical way to obtain controllers robust to uncertainty in the noise statistics.

Core claim

For linear-quadratic control problems with noise distributions in moment-based ambiguity sets, the problem of finding causal affine policies that minimize worst-case expected regret admits an equivalent reformulation as a tractable convex program, interpretable as a regularized nominal linear-quadratic stochastic control problem.

What carries the argument

Causal affine control policies that minimize worst-case expected regret over distributions whose means and covariances lie in norm balls.

Load-bearing premise

Causal affine policies are sufficient to achieve the minimax optimum and the moment-based ambiguity set fully captures the relevant distributional uncertainty.

What would settle it

A non-affine policy or distribution outside the moment balls that produces strictly lower worst-case regret than the value of the computed convex program.

Figures

Figures reproduced from arXiv: 2512.10906 by Eilyan Bitar, Feras Al Taha.

**Figure 2.** Figure 2: (a) Relative duality gap (31) (averaged over ten trials) versus the iteration count for the dual projected subgradient method (Algorithm 1) for different control horizons T. (b) Total execution time (averaged over ten trials) as a function of the control horizon T for the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse … view at source ↗

read the original abstract

We consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to belong to an ambiguity set consisting of all distributions whose mean and covariance lie within norm balls centered at given nominal values. To cope with this ambiguity, we design causal affine control policies to minimize the worst-case expected regret over all distributions in the ambiguity set. The resulting minimax optimal control problem is shown to admit an equivalent reformulation as a tractable convex program, which can be interpreted as a regularized version of the nominal linear-quadratic stochastic control problem. Based on the dual of this convex reformulation, we develop a scalable projected subgradient method for computing optimal controllers to arbitrary accuracy. Numerical experiments are provided to compare the proposed method with state-of-the-art data-driven control design methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a convex reformulation and subgradient solver for regret-optimal LQ control under moment ambiguity, but the claim that causal affine policies achieve the minimax optimum lacks a clear justification.

read the letter

This paper takes the regret-optimal control problem for finite-horizon LQ systems where the noise distribution is only known to lie in a moment-based ambiguity set and shows it can be recast as a convex program. They then use the dual to build a projected subgradient algorithm that computes the controller to any accuracy. What stands out is the combination of regret and distributional robustness under moment balls, leading to a regularized version of the usual LQ problem. The subgradient method looks practical for computation, and the experiments help show how it stacks up against existing data-driven approaches. One area that needs scrutiny is the restriction to causal affine policies. The usual proof that affine policies are optimal relies on the cost being quadratic and the dynamics linear, but regret subtracts the distribution-dependent optimal cost, which could introduce non-convexity or require more general policies to achieve the true minimax value. If that gap is not zero, the convex reformulation only gives a bound rather than the exact optimum. The abstract presents it as equivalent, so the full paper should have the argument for why affine suffices here. This is relevant for control theorists working on robust and regret-based methods in uncertain environments. Someone looking for new ways to handle ambiguity in stochastic control would find the algorithmic part useful. I would recommend sending it to peer review. The contribution is focused and the reformulation is worth having referees check for correctness.

Referee Report

2 major / 2 minor

Summary. The paper considers finite-horizon linear-quadratic stochastic control problems where the noise distribution is unknown but lies in a moment-based ambiguity set consisting of norm balls around nominal mean and covariance values. It restricts attention to causal affine policies that minimize the worst-case expected regret over this set, shows that the resulting minimax problem admits an equivalent reformulation as a tractable convex program (interpretable as a regularized nominal LQ problem), derives a scalable projected subgradient algorithm from the dual, and reports numerical comparisons against data-driven control methods.

Significance. If the central equivalence holds, the work supplies a computationally attractive convex-optimization route to distributionally robust regret-optimal control, with the regularization interpretation offering conceptual insight and the subgradient method providing a practical implementation path. The numerical experiments add evidence of applicability, but the significance is limited by the unresolved restriction to affine policies.

major comments (2)

[Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.
[Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.

minor comments (2)

[Problem formulation] The definition of the ambiguity set (norm balls on mean and covariance) would benefit from explicit notation distinguishing the two radii and the choice of matrix norms.
[Numerical experiments] Figure captions in the numerical experiments section should state the state/input dimensions, horizon length, and specific ambiguity radii used in each example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the scope and presentation of our results. We address each major comment below and will revise the manuscript to improve explicitness while preserving the paper's focus on causal affine policies.

read point-by-point responses

Referee: [Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.

Authors: The manuscript restricts attention to causal affine policies from the outset (see abstract and Section 2), without claiming that they attain the global optimum over all causal policies. The equivalence to the tractable convex program is established specifically within the affine class via dualization of the inner supremum over the moment-based ambiguity set. Because the regret objective prevents a direct quadratic DP argument, we deliberately limit the policy class to affine controllers to retain convexity and scalability. We will revise the abstract, introduction, and reformulation section to state this restriction explicitly and note that optimality over general causal policies is left open. This clarification removes any ambiguity about the scope without altering the technical claims. revision: partial
Referee: [Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.

Authors: We agree that additional intermediate steps will improve verifiability. In the revised version we will expand the reformulation section to include: (i) the explicit expression for the expected regret under an affine policy, (ii) the rewriting of the worst-case expectation as a function of the mean and covariance deviations, and (iii) the full dualization steps that convert the inner supremum into the convex program. These additions will confirm both convexity and equivalence to the original minimax problem over affine policies. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper restricts the policy class to causal affine controllers at the outset and derives the convex reformulation directly from the resulting minimax objective via standard duality for moment-based ambiguity sets. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the equivalence is obtained from the problem definition using convex optimization techniques without circular reduction. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from convex optimization and stochastic control; the ambiguity radii are user-specified inputs rather than fitted parameters.

free parameters (1)

ambiguity set radii for mean and covariance balls
User-provided parameters that define the size of the uncertainty set; they are inputs to the problem rather than learned from data.

axioms (2)

domain assumption Causal affine policies achieve the minimax optimum
Invoked to reduce the policy search space to tractable affine controllers.
standard math The moment ambiguity set is convex and compact
Norm balls guarantee this property, enabling the convex reformulation.

pith-pipeline@v0.9.0 · 5449 in / 1277 out tokens · 120573 ms · 2026-05-16T23:02:34.210299+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion

doi: 10.23919/ACC50511.2021.9483023. 12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176,

work page doi:10.23919/acc50511.2021.9483023 2021
[2]

Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization problem in (14)

It is straightforward to verify that, for each p∈[1,∞], the covariance matrixΣ ⋆ attains the upper bound in (21) and is feasible for the original maximization problem inΣsince it can be shown to satisfy∥Σ ⋆ −bΣ∥p =r 2 andΣ ⋆ ∈S n +. Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization probl...

work page 2015
[3]

Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N

Application of Danskin’s theorem.The functionϕ(Λ, K)is jointly continuous in(Λ, K) over the setN × ¯L, and the mapΛ7→ϕ(Λ, K)is linear (hence concave and differentiable) inΛ for eachK∈ ¯L. Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N. Since ¯Lis a compact set, Danskin’s theorem (Bertsekas, 1999, Propositio...

work page 1999
[4]

Theorem 5LetX∈S n and denote its eigendecomposition byX=Udiag(λ)U ⊤, whereUis an orthonormal matrix

Because all Schat- ten norms are unitarily invariant, the orthogonal projection of a matrix onto the Schattenp-norm ballS p r :={X∈S n | ∥X∥p ≤r}can be expressed in terms of the orthogonal projection of the corresponding vector of singular values onto theℓ p-norm ballB p r :={x∈R n | ∥x∥p ≤r}.The following result, adapted from (Beck, 2017, Theorem 7.18), ...

work page 2017
[5]

for different control horizonsT. (b) Total exe- cution time (averaged over ten trials) as a function of the control horizonTfor the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse step size1/η i can be interpreted as a local estimate of the Lipschitz constant of the gra...

work page 2008
[6]

Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005)

This adaptive step size rule is closely related to the Barzilai-Borwein step sizes used in spectral projected gradient methods (Birgin et al., 2000). Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005). 20 DISTRIBUTIONALLYROBUSTREGRETO...

work page 2000

[1] [1]

12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion

doi: 10.23919/ACC50511.2021.9483023. 12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176,

work page doi:10.23919/acc50511.2021.9483023 2021

[2] [2]

Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization problem in (14)

It is straightforward to verify that, for each p∈[1,∞], the covariance matrixΣ ⋆ attains the upper bound in (21) and is feasible for the original maximization problem inΣsince it can be shown to satisfy∥Σ ⋆ −bΣ∥p =r 2 andΣ ⋆ ∈S n +. Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization probl...

work page 2015

[3] [3]

Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N

Application of Danskin’s theorem.The functionϕ(Λ, K)is jointly continuous in(Λ, K) over the setN × ¯L, and the mapΛ7→ϕ(Λ, K)is linear (hence concave and differentiable) inΛ for eachK∈ ¯L. Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N. Since ¯Lis a compact set, Danskin’s theorem (Bertsekas, 1999, Propositio...

work page 1999

[4] [4]

Theorem 5LetX∈S n and denote its eigendecomposition byX=Udiag(λ)U ⊤, whereUis an orthonormal matrix

Because all Schat- ten norms are unitarily invariant, the orthogonal projection of a matrix onto the Schattenp-norm ballS p r :={X∈S n | ∥X∥p ≤r}can be expressed in terms of the orthogonal projection of the corresponding vector of singular values onto theℓ p-norm ballB p r :={x∈R n | ∥x∥p ≤r}.The following result, adapted from (Beck, 2017, Theorem 7.18), ...

work page 2017

[5] [5]

for different control horizonsT. (b) Total exe- cution time (averaged over ten trials) as a function of the control horizonTfor the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse step size1/η i can be interpreted as a local estimate of the Lipschitz constant of the gra...

work page 2008

[6] [6]

Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005)

This adaptive step size rule is closely related to the Barzilai-Borwein step sizes used in spectral projected gradient methods (Birgin et al., 2000). Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005). 20 DISTRIBUTIONALLYROBUSTREGRETO...

work page 2000