pith. sign in

arxiv: 2604.06158 · v1 · submitted 2026-04-07 · 🧮 math.OC · cs.SY· eess.SY

Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity

Pith reviewed 2026-05-10 18:44 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY
keywords distributionally robust optimizationregret optimizationlinear quadratic regulatorsemidefinite programmingstochastic controlambiguity setsGelbrich distance
0
0 comments X

The pith

Multistage distributionally robust regret-optimal LQR under common stage-law ambiguity admits an exact SDP reformulation over linear disturbance-feedback policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that distributionally robust regret optimization for finite-horizon linear quadratic regulators becomes tractable when disturbances are independent but share an unknown stage law whose mean and covariance lie inside a Gelbrich ball. A sympathetic reader would care because this setup captures realistic uncertainty where past realizations inform future decisions, yet standard robust methods often produce overly cautious controllers. By restricting attention to linear disturbance-feedback policies, the multistage problem converts into a semidefinite program whose solution is the nominal certainty-equivalent LQR law plus a strictly causal correction term driven by empirical means. If correct, the resulting policy preserves a regret guarantee while being substantially less conservative than the corresponding distributionally robust optimal controller under identical ambiguity.

Core claim

Over linear disturbance-feedback policies the multistage DRRO-LQR problem with common stage-law ambiguity (Gelbrich ball) admits an exact semidefinite programming reformulation; the optimal controller equals the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction. Worst-case distributions realizing the optimal value are nonunique.

What carries the argument

Linear disturbance-feedback policies together with the Gelbrich-ball ambiguity set, which together convert the multistage regret objective into an exact SDP.

If this is right

  • The optimal policy reuses the nominal LQR gain and adds a correction that depends only on past realized disturbances.
  • Relative to DRO under the identical ambiguity set, the DRRO controller is often substantially less conservative while retaining the regret guarantee.
  • Worst-case distributions for the DRRO-optimal policy are nonunique.
  • The correction coefficients in the optimal policy empirically approach the certainty-equivalent feedforward term as horizon length grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same SDP route may apply to infinite-horizon or time-varying versions of the problem if the stage-law ambiguity remains common across periods.
  • Regret criteria could reduce conservatism in other multistage control settings where past observations inform future decisions under shared distributional uncertainty.
  • Numerical verification on real plants would test whether the empirical-mean correction measurably improves closed-loop performance over pure certainty-equivalent control.

Load-bearing premise

That linear disturbance-feedback policies are sufficient to achieve both tractability and optimality under the common stage-law Gelbrich-ball model.

What would settle it

A concrete instance in which a nonlinear disturbance-feedback policy achieves strictly lower worst-case regret than the SDP-derived linear policy under the same Gelbrich ambiguity set.

Figures

Figures reproduced from arXiv: 2604.06158 by Jose Blanchet, Lukas-Benedikt Fiechtner.

Figure 1
Figure 1. Figure 1: Left: worst-case regret versus ambiguity radius for [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of Λt and H¯ t for T = 1000. The DRRO row sums lie close to H¯ t , whereas the DRO row sums remain visibly sep￾arated. 6.1 Worst-Case Regret and Ambiguity-Ball Comparison For each δ ∈ [0, 1], we solve the CE, DRO, and DRRO synthesis problems on the same ambiguity ball and evaluate each controller under its worst-case regret law [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
read the original abstract

We study, to our knowledge, the first tractable multistage ex-ante distributionally robust regret optimization (DRRO) formulation for stochastic control. We consider finite-horizon LQR under common stage-law ambiguity: disturbances are independent across time but share an unknown stage law whose mean and covariance lie in a Gelbrich ball around nominal parameters. Unlike the single-stage quadratic case, the nominal certainty-equivalent (CE) controller is generally not regret-optimal, because reuse of the stage law makes past disturbances informative for future decisions. Despite the general NP-hardness of DRRO, we show that over linear disturbance-feedback policies the resulting multistage DRRO-LQR problem admits an exact semidefinite programming reformulation. The optimal controller is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction. We also characterize worst-case distributions and show that those for the DRRO-optimal policy are nonunique. Numerical results show that, relative to the corresponding DRO controller under the same ambiguity set, DRRO is often substantially less conservative while preserving the intended regret guarantee, and that its correction coefficients empirically approach the certainty-equivalent feedforward coefficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents a distributionally robust regret optimization (DRRO) formulation for finite-horizon LQR under common stage-law ambiguity, where disturbances are independent but share an unknown distribution whose mean and covariance lie in a Gelbrich ball around nominal values. The central result is that, when the decision maker is restricted to linear disturbance-feedback policies, the multistage DRRO-LQR problem admits an exact semidefinite programming reformulation. The optimal controller takes the form of the nominal certainty-equivalent LQR law plus a strictly causal correction term driven by the empirical mean. The paper further characterizes the worst-case distributions (showing non-uniqueness for the DRRO-optimal policy) and reports numerical experiments indicating that DRRO is substantially less conservative than the corresponding DRO controller under the same ambiguity set while preserving the regret guarantee.

Significance. If the derivations hold, the result is significant because it delivers the first tractable multistage ex-ante DRRO formulation for stochastic control, converting a generally NP-hard problem into an exact SDP whose solution has an explicit, interpretable structure (nominal CE-LQR plus strictly causal correction). The explicit controller form and the non-uniqueness result for worst-case distributions provide both computational and theoretical value. Numerical evidence of reduced conservatism relative to DRO, while retaining the regret guarantee, suggests practical utility in robust control design under distributional uncertainty.

major comments (2)
  1. [SDP reformulation theorem and proof] The section deriving the SDP reformulation (the main theorem establishing exactness): the multistage extension under common stage-law ambiguity requires explicit verification that the inner supremum over the Gelbrich ball, combined with the regret objective, dualizes to an SDP with no duality gap. The abstract asserts exactness, but the multistage information structure (reuse of the stage law making past disturbances informative) could introduce complications not present in single-stage quadratic cases; the key dualization steps or strong-duality lemma should be highlighted.
  2. [Worst-case distribution analysis] The characterization of worst-case distributions: the claim that they are non-unique for the DRRO-optimal policy is load-bearing for the theoretical contribution. The paper should state whether the non-uniqueness is constructive (explicit families of distributions attaining the supremum) or only existential, and confirm that this does not affect the exactness of the SDP solution.
minor comments (3)
  1. [Introduction] The motivation that the nominal CE controller is generally not regret-optimal should be illustrated with a low-dimensional numerical example or a short analytic counter-example early in the paper, rather than only asserted via the information-structure argument.
  2. [Controller structure] Notation consistency: the abstract uses 'strictly causal empirical-mean correction'; the main text should define this term with an explicit equation (e.g., the form of the correction gain) at first use.
  3. [Numerical experiments] Numerical results: the reported approach of the correction coefficients to the certainty-equivalent feedforward coefficient is interesting, but the paper should report the number of Monte-Carlo trials, seed variability, and quantitative regret or cost differences (not only qualitative 'substantially less conservative') to support the comparison with DRO.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our manuscript. We address each major comment below and will make the indicated revisions to improve clarity.

read point-by-point responses
  1. Referee: [SDP reformulation theorem and proof] The section deriving the SDP reformulation (the main theorem establishing exactness): the multistage extension under common stage-law ambiguity requires explicit verification that the inner supremum over the Gelbrich ball, combined with the regret objective, dualizes to an SDP with no duality gap. The abstract asserts exactness, but the multistage information structure (reuse of the stage law making past disturbances informative) could introduce complications not present in single-stage quadratic cases; the key dualization steps or strong-duality lemma should be highlighted.

    Authors: We appreciate the referee's suggestion to enhance the presentation of the proof. The existing derivation already verifies strong duality for the multistage setting by exploiting the convexity and compactness of the Gelbrich ball together with the quadratic regret objective under linear disturbance-feedback policies; the common stage-law ambiguity is handled via a reformulation that accounts for the information structure without introducing a duality gap. To address the comment, we will add a dedicated remark immediately following the main theorem that explicitly outlines the key dualization steps and references the strong-duality result used, thereby highlighting the multistage extension relative to the single-stage case. revision: yes

  2. Referee: [Worst-case distribution analysis] The characterization of worst-case distributions: the claim that they are non-unique for the DRRO-optimal policy is load-bearing for the theoretical contribution. The paper should state whether the non-uniqueness is constructive (explicit families of distributions attaining the supremum) or only existential, and confirm that this does not affect the exactness of the SDP solution.

    Authors: We thank the referee for this observation. Our characterization is constructive: the manuscript exhibits explicit families of distributions (specific mean and covariance perturbations within the Gelbrich ball) that attain the supremum for the DRRO-optimal policy. This construction is used to establish non-uniqueness while confirming that the attained value matches the SDP optimum. We will revise the relevant section to state explicitly that the non-uniqueness is constructive and to add a sentence confirming that it is compatible with (and does not affect) the exactness of the SDP reformulation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents a direct derivation of an exact SDP reformulation for the multistage DRRO-LQR problem restricted to linear disturbance-feedback policies, starting from the DRRO objective and common stage-law Gelbrich ambiguity set. The optimal controller is characterized as the nominal CE-LQR law plus a strictly causal correction term, obtained via the optimization rather than by fitting parameters or renaming inputs. No load-bearing self-citations, self-definitional steps, or reductions of predictions to fitted quantities are indicated in the abstract or claimed results. The characterization of non-unique worst-case distributions and numerical comparisons to DRO are presented as consequences of the reformulation, not as circular inputs. The derivation remains independent of pre-fitted values or prior author-specific uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The formulation rests on standard LQR quadratic costs and SDP duality, plus domain assumptions on the ambiguity set and policy class; no free parameters or invented entities are introduced beyond the Gelbrich ball radius (treated as given).

axioms (2)
  • domain assumption Disturbances are independent across time but share a common unknown stage law whose mean and covariance lie in a Gelbrich ball around nominal parameters.
    This defines the common stage-law ambiguity central to the DRRO model.
  • domain assumption Linear disturbance-feedback policies suffice for the exact SDP reformulation and optimality.
    The paper restricts the policy class to obtain tractability.

pith-pipeline@v0.9.0 · 5508 in / 1382 out tokens · 51699 ms · 2026-05-10T18:44:45.215300+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations,

    P. Mohajerin Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations,”Math. Program., vol. 171, pp. 115–166, 2018

  2. [2]

    Regret in the Newsvendor model with partial information,

    G. Perakis and G. Roels, “Regret in the Newsvendor model with partial information,”Oper. Res., vol. 56, pp. 188–203, 2008

  3. [3]

    Regret in the Newsvendor model with demand and yield randomness,

    Z. Chen and W. Xie, “Regret in the Newsvendor model with demand and yield randomness,” Prod. Oper. Manag., vol. 30, pp. 4176–4197, 2021

  4. [4]

    Minimax regret optimization for robust machine learning under distribution shift.arXiv preprint arXiv:2202.05436,

    A. Agarwal and T. Zhang, “Minimax regret optimization for robust machine learning under distribution shift,” 2022, arXiv:2202.05436

  5. [5]

    Wasserstein distributionally robust regret minimization,

    Y. Cho and I. Yang, “Wasserstein distributionally robust regret minimization,”IEEE Control Syst. Lett., vol. 8, pp. 820–825, 2024

  6. [6]

    A distributionally robust approach to regret optimal control using the Wasserstein distance,

    F. A. Taha, S. Yan, and E. Bitar, “A distributionally robust approach to regret optimal control using the Wasserstein distance,” 2023, arXiv:2304.06783

  7. [7]

    Wasserstein distributionally robust regret-optimal control under partial observability,

    J. Hajar, T. Kargin, and B. Hassibi, “Wasserstein distributionally robust regret-optimal control under partial observability,” 2023, arXiv:2307.04966. 18

  8. [8]

    Wasserstein distributionally robust regret- optimal control in the infinite horizon,

    T. Kargin, J. Hajar, V. Malik, and B. Hassibi, “Wasserstein distributionally robust regret- optimal control in the infinite horizon,” 2023, arXiv:2312.17376

  9. [9]

    Distributional robustness in output feedback regret-optimal control,

    S. Yan and C. W. Scherer, “Distributional robustness in output feedback regret-optimal control,”IFAC-PapersOnLine, vol. 59, pp. 104–109, 2025, proc. 11th IFAC Symp. Robust Control Des. (ROCOND)

  10. [10]

    Distributionally robust linear quadratic control,

    B. Taskesen, D. Iancu, c. l. Koçyiğit, and D. Kuhn, “Distributionally robust linear quadratic control,” inAdv. Neural Inf. Process. Syst., 2023, pp. 18613–18632

  11. [11]

    Distributionally robust markov decision processes,

    H. Xu and S. Mannor, “Distributionally robust markov decision processes,”Math. Oper. Res., vol. 37, pp. 288–300, 2012

  12. [12]

    A convex optimization approach to distributionally robust markov decision processes with Wasserstein distance,

    I. Yang, “A convex optimization approach to distributionally robust markov decision processes with Wasserstein distance,”IEEE Control Syst. Lett., vol. 1, pp. 164–169, 2017

  13. [13]

    Wasserstein distributionally robust stochastic control: A data-driven approach,

    ——, “Wasserstein distributionally robust stochastic control: A data-driven approach,”IEEE Trans. Autom. Control, vol. 66, pp. 3863–3870, 2021

  14. [14]

    Distributional robustness in minimax linear quadratic control with Wasserstein distance,

    K. Kim and I. Yang, “Distributional robustness in minimax linear quadratic control with Wasserstein distance,”SIAM J. Control Optim., vol. 61, pp. 458–483, 2023

  15. [15]

    Optimal control with learning on the fly: System with unknown drift,

    D. Gurevich, D. Goswami, C. L. Fefferman, and C. W. Rowley, “Optimal control with learning on the fly: System with unknown drift,” inProc. 4th Annu. Learn. Dyn. Control Conf., 2022, pp. 870–880

  16. [16]

    Optimal agnostic control of unknown linear dynamics in a bounded parameter range,

    J. Carruth, M. F. Eggl, C. Fefferman, and C. W. Rowley, “Optimal agnostic control of unknown linear dynamics in a bounded parameter range,”Rev. Mat. Iberoam., 2023

  17. [17]

    Wasserstein distributionally robust regret optimization.arXiv preprint arXiv:2504.10796,

    L.-B. Fiechtner and J. Blanchet, “Wasserstein distributionally robust regret optimization,” 2025, arXiv:2504.10796

  18. [18]

    Chatgpt (gpt-5.4),

    OpenAI, “Chatgpt (gpt-5.4),” 2026, accessed: March 31, 2026. [Online]. Available: https://openai.com 19