Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity
Pith reviewed 2026-05-10 18:44 UTC · model grok-4.3
The pith
Multistage distributionally robust regret-optimal LQR under common stage-law ambiguity admits an exact SDP reformulation over linear disturbance-feedback policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Over linear disturbance-feedback policies the multistage DRRO-LQR problem with common stage-law ambiguity (Gelbrich ball) admits an exact semidefinite programming reformulation; the optimal controller equals the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction. Worst-case distributions realizing the optimal value are nonunique.
What carries the argument
Linear disturbance-feedback policies together with the Gelbrich-ball ambiguity set, which together convert the multistage regret objective into an exact SDP.
If this is right
- The optimal policy reuses the nominal LQR gain and adds a correction that depends only on past realized disturbances.
- Relative to DRO under the identical ambiguity set, the DRRO controller is often substantially less conservative while retaining the regret guarantee.
- Worst-case distributions for the DRRO-optimal policy are nonunique.
- The correction coefficients in the optimal policy empirically approach the certainty-equivalent feedforward term as horizon length grows.
Where Pith is reading between the lines
- The same SDP route may apply to infinite-horizon or time-varying versions of the problem if the stage-law ambiguity remains common across periods.
- Regret criteria could reduce conservatism in other multistage control settings where past observations inform future decisions under shared distributional uncertainty.
- Numerical verification on real plants would test whether the empirical-mean correction measurably improves closed-loop performance over pure certainty-equivalent control.
Load-bearing premise
That linear disturbance-feedback policies are sufficient to achieve both tractability and optimality under the common stage-law Gelbrich-ball model.
What would settle it
A concrete instance in which a nonlinear disturbance-feedback policy achieves strictly lower worst-case regret than the SDP-derived linear policy under the same Gelbrich ambiguity set.
Figures
read the original abstract
We study, to our knowledge, the first tractable multistage ex-ante distributionally robust regret optimization (DRRO) formulation for stochastic control. We consider finite-horizon LQR under common stage-law ambiguity: disturbances are independent across time but share an unknown stage law whose mean and covariance lie in a Gelbrich ball around nominal parameters. Unlike the single-stage quadratic case, the nominal certainty-equivalent (CE) controller is generally not regret-optimal, because reuse of the stage law makes past disturbances informative for future decisions. Despite the general NP-hardness of DRRO, we show that over linear disturbance-feedback policies the resulting multistage DRRO-LQR problem admits an exact semidefinite programming reformulation. The optimal controller is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction. We also characterize worst-case distributions and show that those for the DRRO-optimal policy are nonunique. Numerical results show that, relative to the corresponding DRO controller under the same ambiguity set, DRRO is often substantially less conservative while preserving the intended regret guarantee, and that its correction coefficients empirically approach the certainty-equivalent feedforward coefficient.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a distributionally robust regret optimization (DRRO) formulation for finite-horizon LQR under common stage-law ambiguity, where disturbances are independent but share an unknown distribution whose mean and covariance lie in a Gelbrich ball around nominal values. The central result is that, when the decision maker is restricted to linear disturbance-feedback policies, the multistage DRRO-LQR problem admits an exact semidefinite programming reformulation. The optimal controller takes the form of the nominal certainty-equivalent LQR law plus a strictly causal correction term driven by the empirical mean. The paper further characterizes the worst-case distributions (showing non-uniqueness for the DRRO-optimal policy) and reports numerical experiments indicating that DRRO is substantially less conservative than the corresponding DRO controller under the same ambiguity set while preserving the regret guarantee.
Significance. If the derivations hold, the result is significant because it delivers the first tractable multistage ex-ante DRRO formulation for stochastic control, converting a generally NP-hard problem into an exact SDP whose solution has an explicit, interpretable structure (nominal CE-LQR plus strictly causal correction). The explicit controller form and the non-uniqueness result for worst-case distributions provide both computational and theoretical value. Numerical evidence of reduced conservatism relative to DRO, while retaining the regret guarantee, suggests practical utility in robust control design under distributional uncertainty.
major comments (2)
- [SDP reformulation theorem and proof] The section deriving the SDP reformulation (the main theorem establishing exactness): the multistage extension under common stage-law ambiguity requires explicit verification that the inner supremum over the Gelbrich ball, combined with the regret objective, dualizes to an SDP with no duality gap. The abstract asserts exactness, but the multistage information structure (reuse of the stage law making past disturbances informative) could introduce complications not present in single-stage quadratic cases; the key dualization steps or strong-duality lemma should be highlighted.
- [Worst-case distribution analysis] The characterization of worst-case distributions: the claim that they are non-unique for the DRRO-optimal policy is load-bearing for the theoretical contribution. The paper should state whether the non-uniqueness is constructive (explicit families of distributions attaining the supremum) or only existential, and confirm that this does not affect the exactness of the SDP solution.
minor comments (3)
- [Introduction] The motivation that the nominal CE controller is generally not regret-optimal should be illustrated with a low-dimensional numerical example or a short analytic counter-example early in the paper, rather than only asserted via the information-structure argument.
- [Controller structure] Notation consistency: the abstract uses 'strictly causal empirical-mean correction'; the main text should define this term with an explicit equation (e.g., the form of the correction gain) at first use.
- [Numerical experiments] Numerical results: the reported approach of the correction coefficients to the certainty-equivalent feedforward coefficient is interesting, but the paper should report the number of Monte-Carlo trials, seed variability, and quantitative regret or cost differences (not only qualitative 'substantially less conservative') to support the comparison with DRO.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comments on our manuscript. We address each major comment below and will make the indicated revisions to improve clarity.
read point-by-point responses
-
Referee: [SDP reformulation theorem and proof] The section deriving the SDP reformulation (the main theorem establishing exactness): the multistage extension under common stage-law ambiguity requires explicit verification that the inner supremum over the Gelbrich ball, combined with the regret objective, dualizes to an SDP with no duality gap. The abstract asserts exactness, but the multistage information structure (reuse of the stage law making past disturbances informative) could introduce complications not present in single-stage quadratic cases; the key dualization steps or strong-duality lemma should be highlighted.
Authors: We appreciate the referee's suggestion to enhance the presentation of the proof. The existing derivation already verifies strong duality for the multistage setting by exploiting the convexity and compactness of the Gelbrich ball together with the quadratic regret objective under linear disturbance-feedback policies; the common stage-law ambiguity is handled via a reformulation that accounts for the information structure without introducing a duality gap. To address the comment, we will add a dedicated remark immediately following the main theorem that explicitly outlines the key dualization steps and references the strong-duality result used, thereby highlighting the multistage extension relative to the single-stage case. revision: yes
-
Referee: [Worst-case distribution analysis] The characterization of worst-case distributions: the claim that they are non-unique for the DRRO-optimal policy is load-bearing for the theoretical contribution. The paper should state whether the non-uniqueness is constructive (explicit families of distributions attaining the supremum) or only existential, and confirm that this does not affect the exactness of the SDP solution.
Authors: We thank the referee for this observation. Our characterization is constructive: the manuscript exhibits explicit families of distributions (specific mean and covariance perturbations within the Gelbrich ball) that attain the supremum for the DRRO-optimal policy. This construction is used to establish non-uniqueness while confirming that the attained value matches the SDP optimum. We will revise the relevant section to state explicitly that the non-uniqueness is constructive and to add a sentence confirming that it is compatible with (and does not affect) the exactness of the SDP reformulation. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper presents a direct derivation of an exact SDP reformulation for the multistage DRRO-LQR problem restricted to linear disturbance-feedback policies, starting from the DRRO objective and common stage-law Gelbrich ambiguity set. The optimal controller is characterized as the nominal CE-LQR law plus a strictly causal correction term, obtained via the optimization rather than by fitting parameters or renaming inputs. No load-bearing self-citations, self-definitional steps, or reductions of predictions to fitted quantities are indicated in the abstract or claimed results. The characterization of non-unique worst-case distributions and numerical comparisons to DRO are presented as consequences of the reformulation, not as circular inputs. The derivation remains independent of pre-fitted values or prior author-specific uniqueness theorems.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Disturbances are independent across time but share a common unknown stage law whose mean and covariance lie in a Gelbrich ball around nominal parameters.
- domain assumption Linear disturbance-feedback policies suffice for the exact SDP reformulation and optimality.
Reference graph
Works this paper leans on
-
[1]
P. Mohajerin Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations,”Math. Program., vol. 171, pp. 115–166, 2018
work page 2018
-
[2]
Regret in the Newsvendor model with partial information,
G. Perakis and G. Roels, “Regret in the Newsvendor model with partial information,”Oper. Res., vol. 56, pp. 188–203, 2008
work page 2008
-
[3]
Regret in the Newsvendor model with demand and yield randomness,
Z. Chen and W. Xie, “Regret in the Newsvendor model with demand and yield randomness,” Prod. Oper. Manag., vol. 30, pp. 4176–4197, 2021
work page 2021
-
[4]
A. Agarwal and T. Zhang, “Minimax regret optimization for robust machine learning under distribution shift,” 2022, arXiv:2202.05436
-
[5]
Wasserstein distributionally robust regret minimization,
Y. Cho and I. Yang, “Wasserstein distributionally robust regret minimization,”IEEE Control Syst. Lett., vol. 8, pp. 820–825, 2024
work page 2024
-
[6]
A distributionally robust approach to regret optimal control using the Wasserstein distance,
F. A. Taha, S. Yan, and E. Bitar, “A distributionally robust approach to regret optimal control using the Wasserstein distance,” 2023, arXiv:2304.06783
-
[7]
Wasserstein distributionally robust regret-optimal control under partial observability,
J. Hajar, T. Kargin, and B. Hassibi, “Wasserstein distributionally robust regret-optimal control under partial observability,” 2023, arXiv:2307.04966. 18
-
[8]
Wasserstein distributionally robust regret- optimal control in the infinite horizon,
T. Kargin, J. Hajar, V. Malik, and B. Hassibi, “Wasserstein distributionally robust regret- optimal control in the infinite horizon,” 2023, arXiv:2312.17376
-
[9]
Distributional robustness in output feedback regret-optimal control,
S. Yan and C. W. Scherer, “Distributional robustness in output feedback regret-optimal control,”IFAC-PapersOnLine, vol. 59, pp. 104–109, 2025, proc. 11th IFAC Symp. Robust Control Des. (ROCOND)
work page 2025
-
[10]
Distributionally robust linear quadratic control,
B. Taskesen, D. Iancu, c. l. Koçyiğit, and D. Kuhn, “Distributionally robust linear quadratic control,” inAdv. Neural Inf. Process. Syst., 2023, pp. 18613–18632
work page 2023
-
[11]
Distributionally robust markov decision processes,
H. Xu and S. Mannor, “Distributionally robust markov decision processes,”Math. Oper. Res., vol. 37, pp. 288–300, 2012
work page 2012
-
[12]
I. Yang, “A convex optimization approach to distributionally robust markov decision processes with Wasserstein distance,”IEEE Control Syst. Lett., vol. 1, pp. 164–169, 2017
work page 2017
-
[13]
Wasserstein distributionally robust stochastic control: A data-driven approach,
——, “Wasserstein distributionally robust stochastic control: A data-driven approach,”IEEE Trans. Autom. Control, vol. 66, pp. 3863–3870, 2021
work page 2021
-
[14]
Distributional robustness in minimax linear quadratic control with Wasserstein distance,
K. Kim and I. Yang, “Distributional robustness in minimax linear quadratic control with Wasserstein distance,”SIAM J. Control Optim., vol. 61, pp. 458–483, 2023
work page 2023
-
[15]
Optimal control with learning on the fly: System with unknown drift,
D. Gurevich, D. Goswami, C. L. Fefferman, and C. W. Rowley, “Optimal control with learning on the fly: System with unknown drift,” inProc. 4th Annu. Learn. Dyn. Control Conf., 2022, pp. 870–880
work page 2022
-
[16]
Optimal agnostic control of unknown linear dynamics in a bounded parameter range,
J. Carruth, M. F. Eggl, C. Fefferman, and C. W. Rowley, “Optimal agnostic control of unknown linear dynamics in a bounded parameter range,”Rev. Mat. Iberoam., 2023
work page 2023
-
[17]
Wasserstein distributionally robust regret optimization.arXiv preprint arXiv:2504.10796,
L.-B. Fiechtner and J. Blanchet, “Wasserstein distributionally robust regret optimization,” 2025, arXiv:2504.10796
-
[18]
OpenAI, “Chatgpt (gpt-5.4),” 2026, accessed: March 31, 2026. [Online]. Available: https://openai.com 19
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.