Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets
Pith reviewed 2026-05-16 23:02 UTC · model grok-4.3
The pith
Worst-case regret minimization in linear-quadratic control with distributional ambiguity reduces to a tractable convex program.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For linear-quadratic control problems with noise distributions in moment-based ambiguity sets, the problem of finding causal affine policies that minimize worst-case expected regret admits an equivalent reformulation as a tractable convex program, interpretable as a regularized nominal linear-quadratic stochastic control problem.
What carries the argument
Causal affine control policies that minimize worst-case expected regret over distributions whose means and covariances lie in norm balls.
Load-bearing premise
Causal affine policies are sufficient to achieve the minimax optimum and the moment-based ambiguity set fully captures the relevant distributional uncertainty.
What would settle it
A non-affine policy or distribution outside the moment balls that produces strictly lower worst-case regret than the value of the computed convex program.
Figures
read the original abstract
We consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to belong to an ambiguity set consisting of all distributions whose mean and covariance lie within norm balls centered at given nominal values. To cope with this ambiguity, we design causal affine control policies to minimize the worst-case expected regret over all distributions in the ambiguity set. The resulting minimax optimal control problem is shown to admit an equivalent reformulation as a tractable convex program, which can be interpreted as a regularized version of the nominal linear-quadratic stochastic control problem. Based on the dual of this convex reformulation, we develop a scalable projected subgradient method for computing optimal controllers to arbitrary accuracy. Numerical experiments are provided to compare the proposed method with state-of-the-art data-driven control design methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers finite-horizon linear-quadratic stochastic control problems where the noise distribution is unknown but lies in a moment-based ambiguity set consisting of norm balls around nominal mean and covariance values. It restricts attention to causal affine policies that minimize the worst-case expected regret over this set, shows that the resulting minimax problem admits an equivalent reformulation as a tractable convex program (interpretable as a regularized nominal LQ problem), derives a scalable projected subgradient algorithm from the dual, and reports numerical comparisons against data-driven control methods.
Significance. If the central equivalence holds, the work supplies a computationally attractive convex-optimization route to distributionally robust regret-optimal control, with the regularization interpretation offering conceptual insight and the subgradient method providing a practical implementation path. The numerical experiments add evidence of applicability, but the significance is limited by the unresolved restriction to affine policies.
major comments (2)
- [Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.
- [Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.
minor comments (2)
- [Problem formulation] The definition of the ambiguity set (norm balls on mean and covariance) would benefit from explicit notation distinguishing the two radii and the choice of matrix norms.
- [Numerical experiments] Figure captions in the numerical experiments section should state the state/input dimensions, horizon length, and specific ambiguity radii used in each example.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the scope and presentation of our results. We address each major comment below and will revise the manuscript to improve explicitness while preserving the paper's focus on causal affine policies.
read point-by-point responses
-
Referee: [Abstract and reformulation section] The central claim that the minimax problem over causal affine policies is equivalent to a tractable convex program (abstract and the reformulation section) rests on the unproven assertion that affine policies attain the global optimum over the larger class of all causal policies. For the regret objective (realized cost minus distribution-dependent optimal cost) under norm-ball moment ambiguity, the effective cost need not remain quadratic, so the standard dynamic-programming argument for affine optimality does not apply directly; this gap is load-bearing for the tractability and optimality claims.
Authors: The manuscript restricts attention to causal affine policies from the outset (see abstract and Section 2), without claiming that they attain the global optimum over all causal policies. The equivalence to the tractable convex program is established specifically within the affine class via dualization of the inner supremum over the moment-based ambiguity set. Because the regret objective prevents a direct quadratic DP argument, we deliberately limit the policy class to affine controllers to retain convexity and scalability. We will revise the abstract, introduction, and reformulation section to state this restriction explicitly and note that optimality over general causal policies is left open. This clarification removes any ambiguity about the scope without altering the technical claims. revision: partial
-
Referee: [Reformulation and dual section] The derivation of the convex program (presumably via dualization of the inner supremum over the ambiguity set) is not accompanied by explicit intermediate steps showing how the regret term is rewritten; without these steps, it is impossible to verify that the resulting program is indeed convex and equivalent to the original minimax problem.
Authors: We agree that additional intermediate steps will improve verifiability. In the revised version we will expand the reformulation section to include: (i) the explicit expression for the expected regret under an affine policy, (ii) the rewriting of the worst-case expectation as a function of the mean and covariance deviations, and (iii) the full dualization steps that convert the inner supremum into the convex program. These additions will confirm both convexity and equivalence to the original minimax problem over affine policies. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper restricts the policy class to causal affine controllers at the outset and derives the convex reformulation directly from the resulting minimax objective via standard duality for moment-based ambiguity sets. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the equivalence is obtained from the problem definition using convex optimization techniques without circular reduction. The derivation remains self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
free parameters (1)
- ambiguity set radii for mean and covariance balls
axioms (2)
- domain assumption Causal affine policies achieve the minimax optimum
- standard math The moment ambiguity set is convex and compact
Reference graph
Works this paper leans on
-
[1]
12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion
doi: 10.23919/ACC50511.2021.9483023. 12 DISTRIBUTIONALLYROBUSTREGRETOPTIMALCONTROL Maurice Sion. On general minimax theorems.Pacific Journal of Mathematics, 8(1):171–176,
-
[2]
It is straightforward to verify that, for each p∈[1,∞], the covariance matrixΣ ⋆ attains the upper bound in (21) and is feasible for the original maximization problem inΣsince it can be shown to satisfy∥Σ ⋆ −bΣ∥p =r 2 andΣ ⋆ ∈S n +. Hence, this upper bound equals the optimal value of the maximization problem inΣ, and (20) reduces to the minimization probl...
work page 2015
-
[3]
Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N
Application of Danskin’s theorem.The functionϕ(Λ, K)is jointly continuous in(Λ, K) over the setN × ¯L, and the mapΛ7→ϕ(Λ, K)is linear (hence concave and differentiable) inΛ for eachK∈ ¯L. Moreover, as shown in Step 1, the minimizerK ⋆(Λ) := argmin K∈ ¯L ϕ(Λ, K)is unique for eachΛ∈ N. Since ¯Lis a compact set, Danskin’s theorem (Bertsekas, 1999, Propositio...
work page 1999
-
[4]
Theorem 5LetX∈S n and denote its eigendecomposition byX=Udiag(λ)U ⊤, whereUis an orthonormal matrix
Because all Schat- ten norms are unitarily invariant, the orthogonal projection of a matrix onto the Schattenp-norm ballS p r :={X∈S n | ∥X∥p ≤r}can be expressed in terms of the orthogonal projection of the corresponding vector of singular values onto theℓ p-norm ballB p r :={x∈R n | ∥x∥p ≤r}.The following result, adapted from (Beck, 2017, Theorem 7.18), ...
work page 2017
-
[5]
for different control horizonsT. (b) Total exe- cution time (averaged over ten trials) as a function of the control horizonTfor the SDP interior point method (red line, square markers) and the dual projected subgradient method (blue line, cross markers). The inverse step size1/η i can be interpreted as a local estimate of the Lipschitz constant of the gra...
work page 2008
-
[6]
This adaptive step size rule is closely related to the Barzilai-Borwein step sizes used in spectral projected gradient methods (Birgin et al., 2000). Gradient methods utilizing such adaptive step sizes enjoy convergence guarantees when combined with saturation limits and nonmonotone line search schemes (Wang et al., 2005). 20 DISTRIBUTIONALLYROBUSTREGRETO...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.