A Bayesian Perspective on the Data-Driven LQR
Pith reviewed 2026-05-10 17:05 UTC · model grok-4.3
The pith
A Bayesian formulation of data-driven LQR decomposes expected cost into certainty-equivalence and variance terms, unifying indirect and direct methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a Bayesian formulation of data-driven LQR, the expected cost decomposes into a certainty-equivalence term and a variance-dependent term. This decomposition shows that indirect and direct ddLQR are equivalent, and allows the direct method to be cast as a tractable semidefinite program independent of data length.
What carries the argument
The Bayesian posterior over the unknown system matrices together with the decomposition of the expected quadratic cost into certainty-equivalence and variance-dependent components.
If this is right
- The direct data-driven controller is obtained from a semidefinite program whose decision variables and constraints do not grow with data length.
- Regularization enters the design automatically through the posterior variance term rather than through an added heuristic parameter.
- Both optimality gap and closed-loop stability improve relative to certainty-equivalence designs, with the largest gains appearing in low-data regimes.
- Indirect and direct formulations become mathematically equivalent once both are required to minimize the same Bayesian expected cost.
Where Pith is reading between the lines
- The variance term supplies an explicit knob for trading nominal performance against robustness to model uncertainty when the prior and noise statistics are chosen appropriately.
- The same decomposition may be useful for deriving regularized versions of other data-driven controllers that currently rely on certainty equivalence.
- In practice, the benefit in low-data regimes will depend on how accurately the posterior can be computed or approximated from the available measurements.
Load-bearing premise
A well-defined posterior over the system matrices must be available from the data, and the problem must obey linear dynamics and quadratic costs.
What would settle it
On a known linear system driven by noise, collect limited batches of input-output data, compute both the Bayesian and the standard certainty-equivalence controllers, and check whether the Bayesian controller produces a strictly smaller optimality gap and higher empirical stability rate as the number of samples decreases.
Figures
read the original abstract
The data-driven linear quadratic regulator (ddLQR) is a widely studied control method for unknown dynamical systems with disturbance. Existing approaches, both indirect, i.e., those that identify a model followed by model-based design, and direct, which bypasses the identification step, often rely on the certainty-equivalence principle and therefore do not explicitly account for model uncertainty. In this paper, we propose a Bayesian formulation for both indirect and direct ddLQR that incorporates posterior uncertainty into the control design. The resulting expected cost decomposes into a certainty-equivalence term and a variance-dependent term, providing a principled interpretation of regularization. We further show that the indirect and direct formulations are equivalent under this perspective. The resulting direct method admits a tractable semidefinite program whose size is independent of the data length. Numerical simulations demonstrate improved optimality gap and closed-loop stability, particularly in low-data regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Bayesian formulation for both indirect and direct data-driven LQR (ddLQR) that incorporates posterior uncertainty over the unknown system matrices into the control design. It claims that the resulting expected cost decomposes exactly into a certainty-equivalence term plus a variance-dependent term (providing a principled view of regularization), that the indirect and direct Bayesian formulations are equivalent, and that the direct formulation reduces to a tractable SDP whose size is independent of data length. Numerical simulations are presented to show improved optimality gap and closed-loop stability, especially in low-data regimes.
Significance. If the exact decomposition, equivalence, and data-length-independent SDP hold under the stated assumptions, the work would offer a theoretically grounded Bayesian approach to uncertainty-aware ddLQR that moves beyond certainty equivalence while remaining computationally scalable. The claimed equivalence between indirect and direct methods and the interpretation of the variance term as regularization are potentially valuable contributions to data-driven control.
major comments (2)
- [§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent SDP are not guaranteed to be exact.
- [§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.
minor comments (2)
- [§5] §5 (Numerical experiments): the simulations are described only qualitatively. Adding a table with quantitative metrics (optimality gap, closed-loop eigenvalues, success rate over Monte-Carlo trials) together with direct comparisons to standard certainty-equivalence ddLQR and other regularized baselines would make the empirical claims verifiable.
- [Abstract and §1] The abstract and introduction should list the precise assumptions (prior, noise model, finite vs. infinite horizon, exact vs. approximate posterior) under which the decomposition and SDP are derived, so that readers can immediately assess applicability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address each major comment below and will revise the manuscript to resolve the identified issues.
read point-by-point responses
-
Referee: [§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent
Authors: We thank the referee for this precise observation. The derivation in §3 is performed for the finite-horizon LQR cost, where the total cost is a finite sum of quadratic terms. Under this setting the posterior expectation separates exactly into the certainty-equivalence term evaluated at the posterior mean plus a term that depends only on the posterior covariance, without requiring the nonlinear Riccati fixed-point. We will revise §3, the abstract, and all related statements to explicitly restrict the claims to the finite-horizon case (or, if the infinite-horizon extension is retained, to state the first-order linearization approximation around the mean and justify its use). The SDP reformulation will be updated to reflect the same assumption, ensuring all claims remain exact under the stated conditions. revision: yes
-
Referee: [§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.
Authors: The equivalence proof in §4 is constructed directly from the posterior distribution and the expected-cost decomposition derived in §3. Once the finite-horizon restriction (or approximation) is stated clearly in §3, the same conditions will be carried into §4. We will revise the proof to include explicit cross-references to the assumptions and to restate the equivalence result under those conditions. revision: yes
Circularity Check
No circularity: Bayesian expected-cost decomposition derived from posterior expectation without reduction to inputs
full rationale
The paper's core claims rest on defining a posterior over system matrices (A, B) from data, then taking the expectation of the LQR cost under that posterior. This yields the stated decomposition into a certainty-equivalence term (evaluated at the posterior mean) plus a variance-dependent correction term by direct application of the expectation operator to the quadratic cost functional. The claimed equivalence between indirect and direct formulations follows from rewriting the same posterior expectation in data-matrix form, and the data-length-independent SDP is obtained by relaxing the resulting matrix inequality using only the first two posterior moments. None of these steps reduce by construction to a fitted parameter renamed as a prediction, nor do they rely on self-citations for the load-bearing algebraic identities. The derivation is self-contained once the prior and noise model are fixed; any limitations on exactness for infinite-horizon Riccati solutions are questions of modeling assumptions rather than circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The underlying plant is a linear time-invariant system driven by additive disturbances
- domain assumption A posterior distribution over the unknown system matrices exists and can be used to compute an expected cost
Reference graph
Works this paper leans on
-
[1]
B. D. Anderson and J. B. Moore,Optimal control: linear quadratic methods. Courier Corporation, 2007
work page 2007
-
[2]
Data-driven control: Part two of two: Hot take: Why not go with models?
F. D ¨orfler, “Data-driven control: Part two of two: Hot take: Why not go with models?”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 27–31, 2023
work page 2023
-
[3]
Certainty equivalence is efficient for linear quadratic control,
H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in neural information processing systems, vol. 32, 2019
work page 2019
-
[4]
Formulas for data-driven control: Stabi- lization, optimality, and robustness,
C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2020
work page 2020
-
[5]
Bridging direct and indirect data-driven control formulations via regularizations and relaxations,
F. D ¨orfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022
work page 2022
-
[6]
On the certainty-equivalence approach to direct data-driven lqr design,
F. D ¨orfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven lqr design,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023
work page 2023
-
[7]
Statistical learning theory for control: A finite-sample perspective,
A. Tsiamis, I. Ziemann, N. Matni, and G. J. Pappas, “Statistical learning theory for control: A finite-sample perspective,”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 67–97, 2023
work page 2023
-
[8]
Behavioral systems theory in data-driven analysis, signal processing, and control,
I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,”Annual Reviews in Control, vol. 52, pp. 42–64, 2021
work page 2021
-
[9]
M. Bartos, J. K ¨ohler, F. D ¨orfler, and M. N. Zeilinger, “Stability of certainty-equivalent adaptive lqr for linear systems with unknown time-varying parameters,”arXiv preprint arXiv:2511.08236, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Data-enabled policy optimization for direct adaptive learning of the lqr,
F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the lqr,”IEEE Transactions on Automatic Control, 2025
work page 2025
-
[11]
G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, L. Ljunget al., Regularized system identification-Learning dynamic models from data. Springer, 2022
work page 2022
-
[12]
Regularization for covariance parameterization of direct data-driven lqr control,
F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for covariance parameterization of direct data-driven lqr control,”IEEE Control Systems Letters, 2025
work page 2025
-
[13]
Harnessing uncertainty for a separation principle in direct data-driven predictive control,
A. Chiuso, M. Fabris, V . Breschi, and S. Formentin, “Harnessing uncertainty for a separation principle in direct data-driven predictive control,”Automatica, vol. 173, p. 112070, 2025
work page 2025
-
[14]
The bayesian separation principle for data-driven control,
G. Baggio, R. Carli, R. A. Grimaldi, and G. Pillonetto, “The bayesian separation principle for data-driven control,”arXiv preprint arXiv:2409.16717, 2024
-
[15]
Data-enabled predictive control: In the shallows of the deepc,
J. Coulson, J. Lygeros, and F. D ¨orfler, “Data-enabled predictive control: In the shallows of the deepc,” in2019 18th European control conference (ECC). IEEE, 2019, pp. 307–312
work page 2019
-
[16]
A note on persistency of excitation,
J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005
work page 2005
-
[17]
A. K. Gupta and D. K. Nagar,Matrix variate distributions. Chapman and Hall/CRC, 2018
work page 2018
-
[18]
arXiv preprint arXiv:2405.12762 (2 024)
P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint arXiv:2405.12762, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.