A Bayesian Perspective on the Data-Driven LQR

Feiran Zhao; Florian D\"orfler; Thierry Schwaller

arxiv: 2604.09074 · v1 · submitted 2026-04-10 · 🧮 math.OC · cs.SY· eess.SY

A Bayesian Perspective on the Data-Driven LQR

Thierry Schwaller , Feiran Zhao , Florian D\"orfler This is my paper

Pith reviewed 2026-05-10 17:05 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords data-driven LQRBayesian control designcertainty equivalenceposterior uncertaintysemidefinite programmingregularizationindirect and direct methods

0 comments

The pith

A Bayesian formulation of data-driven LQR decomposes expected cost into certainty-equivalence and variance terms, unifying indirect and direct methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian approach to data-driven linear quadratic regulation for unknown systems subject to disturbance. By placing a posterior over the unknown system matrices, the expected cost splits into the standard certainty-equivalence term plus an additional term that depends on the posterior variance. This split supplies a principled account of regularization and proves that the indirect route (identify the model, then design) is equivalent to the direct route (work with data alone). The direct formulation further reduces to a semidefinite program whose size remains fixed regardless of how much data is collected.

Core claim

Under a Bayesian formulation of data-driven LQR, the expected cost decomposes into a certainty-equivalence term and a variance-dependent term. This decomposition shows that indirect and direct ddLQR are equivalent, and allows the direct method to be cast as a tractable semidefinite program independent of data length.

What carries the argument

The Bayesian posterior over the unknown system matrices together with the decomposition of the expected quadratic cost into certainty-equivalence and variance-dependent components.

If this is right

The direct data-driven controller is obtained from a semidefinite program whose decision variables and constraints do not grow with data length.
Regularization enters the design automatically through the posterior variance term rather than through an added heuristic parameter.
Both optimality gap and closed-loop stability improve relative to certainty-equivalence designs, with the largest gains appearing in low-data regimes.
Indirect and direct formulations become mathematically equivalent once both are required to minimize the same Bayesian expected cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The variance term supplies an explicit knob for trading nominal performance against robustness to model uncertainty when the prior and noise statistics are chosen appropriately.
The same decomposition may be useful for deriving regularized versions of other data-driven controllers that currently rely on certainty equivalence.
In practice, the benefit in low-data regimes will depend on how accurately the posterior can be computed or approximated from the available measurements.

Load-bearing premise

A well-defined posterior over the system matrices must be available from the data, and the problem must obey linear dynamics and quadratic costs.

What would settle it

On a known linear system driven by noise, collect limited batches of input-output data, compute both the Bayesian and the standard certainty-equivalence controllers, and check whether the Bayesian controller produces a strictly smaller optimality gap and higher empirical stability rate as the number of samples decreases.

Figures

Figures reproduced from arXiv: 2604.09074 by Feiran Zhao, Florian D\"orfler, Thierry Schwaller.

read the original abstract

The data-driven linear quadratic regulator (ddLQR) is a widely studied control method for unknown dynamical systems with disturbance. Existing approaches, both indirect, i.e., those that identify a model followed by model-based design, and direct, which bypasses the identification step, often rely on the certainty-equivalence principle and therefore do not explicitly account for model uncertainty. In this paper, we propose a Bayesian formulation for both indirect and direct ddLQR that incorporates posterior uncertainty into the control design. The resulting expected cost decomposes into a certainty-equivalence term and a variance-dependent term, providing a principled interpretation of regularization. We further show that the indirect and direct formulations are equivalent under this perspective. The resulting direct method admits a tractable semidefinite program whose size is independent of the data length. Numerical simulations demonstrate improved optimality gap and closed-loop stability, particularly in low-data regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames data-driven LQR in Bayesian terms so the expected cost splits into a certainty-equivalence piece plus a variance penalty, and claims indirect and direct versions coincide with a data-length-independent SDP.

read the letter

The main thing to know is that this work treats the unknown system matrices as random with a posterior and writes the expected LQR cost as the usual certainty-equivalence controller plus an extra term driven by posterior variance. It then shows the indirect and direct data-driven formulations become equivalent under that view and that the direct version reduces to an SDP whose size does not grow with the number of samples. That last property is practically useful when data are scarce. The variance term supplies a clean story for why regularization appears and how much of it is needed, which is more principled than the ad-hoc penalties common in the ddLQR literature. If the derivations hold, this gives a direct route from posterior uncertainty to a convex program without having to sample or linearize. The numerical examples are said to show better optimality gaps and stability in low-data regimes, which aligns with where the method should shine. The soft spot is the exactness of the decomposition. Infinite-horizon LQR cost is defined through the Riccati solution, which is nonlinear in the closed-loop dynamics, so the expectation of the cost is not automatically the cost of the expectation plus a simple variance correction. The abstract presents the split as clean, so the paper must either restrict to finite horizon, use a conjugate prior that closes the moments, or introduce an approximation. Without seeing the precise assumptions on the prior and noise model, it is hard to judge how general the equivalence and SDP remain. The citation pattern looks standard for the subfield, with no obvious omissions in the abstract. This paper is for people already working on data-driven or robust LQR who want a Bayesian handle on uncertainty. A reader who cares about low-data regimes or wants to avoid certainty-equivalence controllers will find the formulation and the SDP size claim worth examining. The work is coherent enough on its own terms to deserve a serious referee, even if the decomposition needs careful checking in review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Bayesian formulation for both indirect and direct data-driven LQR (ddLQR) that incorporates posterior uncertainty over the unknown system matrices into the control design. It claims that the resulting expected cost decomposes exactly into a certainty-equivalence term plus a variance-dependent term (providing a principled view of regularization), that the indirect and direct Bayesian formulations are equivalent, and that the direct formulation reduces to a tractable SDP whose size is independent of data length. Numerical simulations are presented to show improved optimality gap and closed-loop stability, especially in low-data regimes.

Significance. If the exact decomposition, equivalence, and data-length-independent SDP hold under the stated assumptions, the work would offer a theoretically grounded Bayesian approach to uncertainty-aware ddLQR that moves beyond certainty equivalence while remaining computationally scalable. The claimed equivalence between indirect and direct methods and the interpretation of the variance term as regularization are potentially valuable contributions to data-driven control.

major comments (2)

[§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent SDP are not guaranteed to be exact.
[§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.

minor comments (2)

[§5] §5 (Numerical experiments): the simulations are described only qualitatively. Adding a table with quantitative metrics (optimality gap, closed-loop eigenvalues, success rate over Monte-Carlo trials) together with direct comparisons to standard certainty-equivalence ddLQR and other regularized baselines would make the empirical claims verifiable.
[Abstract and §1] The abstract and introduction should list the precise assumptions (prior, noise model, finite vs. infinite horizon, exact vs. approximate posterior) under which the decomposition and SDP are derived, so that readers can immediately assess applicability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address each major comment below and will revise the manuscript to resolve the identified issues.

read point-by-point responses

Referee: [§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent

Authors: We thank the referee for this precise observation. The derivation in §3 is performed for the finite-horizon LQR cost, where the total cost is a finite sum of quadratic terms. Under this setting the posterior expectation separates exactly into the certainty-equivalence term evaluated at the posterior mean plus a term that depends only on the posterior covariance, without requiring the nonlinear Riccati fixed-point. We will revise §3, the abstract, and all related statements to explicitly restrict the claims to the finite-horizon case (or, if the infinite-horizon extension is retained, to state the first-order linearization approximation around the mean and justify its use). The SDP reformulation will be updated to reflect the same assumption, ensuring all claims remain exact under the stated conditions. revision: yes
Referee: [§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.

Authors: The equivalence proof in §4 is constructed directly from the posterior distribution and the expected-cost decomposition derived in §3. Once the finite-horizon restriction (or approximation) is stated clearly in §3, the same conditions will be carried into §4. We will revise the proof to include explicit cross-references to the assumptions and to restate the equivalence result under those conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: Bayesian expected-cost decomposition derived from posterior expectation without reduction to inputs

full rationale

The paper's core claims rest on defining a posterior over system matrices (A, B) from data, then taking the expectation of the LQR cost under that posterior. This yields the stated decomposition into a certainty-equivalence term (evaluated at the posterior mean) plus a variance-dependent correction term by direct application of the expectation operator to the quadratic cost functional. The claimed equivalence between indirect and direct formulations follows from rewriting the same posterior expectation in data-matrix form, and the data-length-independent SDP is obtained by relaxing the resulting matrix inequality using only the first two posterior moments. None of these steps reduce by construction to a fitted parameter renamed as a prediction, nor do they rely on self-citations for the load-bearing algebraic identities. The derivation is self-contained once the prior and noise model are fixed; any limitations on exactness for infinite-horizon Riccati solutions are questions of modeling assumptions rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger records the minimal standard assumptions implied by the LQR setting; no free parameters or new entities are identifiable from the given text.

axioms (2)

domain assumption The underlying plant is a linear time-invariant system driven by additive disturbances
Required for the standard LQR cost and for the data-driven identification step referenced throughout the abstract.
domain assumption A posterior distribution over the unknown system matrices exists and can be used to compute an expected cost
Central to the Bayesian formulation; the abstract does not specify the prior or whether the posterior is exact.

pith-pipeline@v0.9.0 · 5454 in / 1495 out tokens · 71160 ms · 2026-05-10T17:05:50.443927+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

B. D. Anderson and J. B. Moore,Optimal control: linear quadratic methods. Courier Corporation, 2007

work page 2007
[2]

Data-driven control: Part two of two: Hot take: Why not go with models?

F. D ¨orfler, “Data-driven control: Part two of two: Hot take: Why not go with models?”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 27–31, 2023

work page 2023
[3]

Certainty equivalence is efficient for linear quadratic control,

H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[4]

Formulas for data-driven control: Stabi- lization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2020

work page 2020
[5]

Bridging direct and indirect data-driven control formulations via regularizations and relaxations,

F. D ¨orfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022

work page 2022
[6]

On the certainty-equivalence approach to direct data-driven lqr design,

F. D ¨orfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven lqr design,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023

work page 2023
[7]

Statistical learning theory for control: A finite-sample perspective,

A. Tsiamis, I. Ziemann, N. Matni, and G. J. Pappas, “Statistical learning theory for control: A finite-sample perspective,”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 67–97, 2023

work page 2023
[8]

Behavioral systems theory in data-driven analysis, signal processing, and control,

I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,”Annual Reviews in Control, vol. 52, pp. 42–64, 2021

work page 2021
[9]

Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

M. Bartos, J. K ¨ohler, F. D ¨orfler, and M. N. Zeilinger, “Stability of certainty-equivalent adaptive lqr for linear systems with unknown time-varying parameters,”arXiv preprint arXiv:2511.08236, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Data-enabled policy optimization for direct adaptive learning of the lqr,

F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the lqr,”IEEE Transactions on Automatic Control, 2025

work page 2025
[11]

Pillonetto, T

G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, L. Ljunget al., Regularized system identification-Learning dynamic models from data. Springer, 2022

work page 2022
[12]

Regularization for covariance parameterization of direct data-driven lqr control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for covariance parameterization of direct data-driven lqr control,”IEEE Control Systems Letters, 2025

work page 2025
[13]

Harnessing uncertainty for a separation principle in direct data-driven predictive control,

A. Chiuso, M. Fabris, V . Breschi, and S. Formentin, “Harnessing uncertainty for a separation principle in direct data-driven predictive control,”Automatica, vol. 173, p. 112070, 2025

work page 2025
[14]

The bayesian separation principle for data-driven control,

G. Baggio, R. Carli, R. A. Grimaldi, and G. Pillonetto, “The bayesian separation principle for data-driven control,”arXiv preprint arXiv:2409.16717, 2024

work page arXiv 2024
[15]

Data-enabled predictive control: In the shallows of the deepc,

J. Coulson, J. Lygeros, and F. D ¨orfler, “Data-enabled predictive control: In the shallows of the deepc,” in2019 18th European control conference (ECC). IEEE, 2019, pp. 307–312

work page 2019
[16]

A note on persistency of excitation,

J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005

work page 2005
[17]

A. K. Gupta and D. K. Nagar,Matrix variate distributions. Chapman and Hall/CRC, 2018

work page 2018
[18]

arXiv preprint arXiv:2405.12762 (2 024)

P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint arXiv:2405.12762, 2024

work page arXiv 2024

[1] [1]

B. D. Anderson and J. B. Moore,Optimal control: linear quadratic methods. Courier Corporation, 2007

work page 2007

[2] [2]

Data-driven control: Part two of two: Hot take: Why not go with models?

F. D ¨orfler, “Data-driven control: Part two of two: Hot take: Why not go with models?”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 27–31, 2023

work page 2023

[3] [3]

Certainty equivalence is efficient for linear quadratic control,

H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[4] [4]

Formulas for data-driven control: Stabi- lization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2020

work page 2020

[5] [5]

Bridging direct and indirect data-driven control formulations via regularizations and relaxations,

F. D ¨orfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022

work page 2022

[6] [6]

On the certainty-equivalence approach to direct data-driven lqr design,

F. D ¨orfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven lqr design,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023

work page 2023

[7] [7]

Statistical learning theory for control: A finite-sample perspective,

A. Tsiamis, I. Ziemann, N. Matni, and G. J. Pappas, “Statistical learning theory for control: A finite-sample perspective,”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 67–97, 2023

work page 2023

[8] [8]

Behavioral systems theory in data-driven analysis, signal processing, and control,

I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,”Annual Reviews in Control, vol. 52, pp. 42–64, 2021

work page 2021

[9] [9]

Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

M. Bartos, J. K ¨ohler, F. D ¨orfler, and M. N. Zeilinger, “Stability of certainty-equivalent adaptive lqr for linear systems with unknown time-varying parameters,”arXiv preprint arXiv:2511.08236, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Data-enabled policy optimization for direct adaptive learning of the lqr,

F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the lqr,”IEEE Transactions on Automatic Control, 2025

work page 2025

[11] [11]

Pillonetto, T

G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, L. Ljunget al., Regularized system identification-Learning dynamic models from data. Springer, 2022

work page 2022

[12] [12]

Regularization for covariance parameterization of direct data-driven lqr control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for covariance parameterization of direct data-driven lqr control,”IEEE Control Systems Letters, 2025

work page 2025

[13] [13]

Harnessing uncertainty for a separation principle in direct data-driven predictive control,

A. Chiuso, M. Fabris, V . Breschi, and S. Formentin, “Harnessing uncertainty for a separation principle in direct data-driven predictive control,”Automatica, vol. 173, p. 112070, 2025

work page 2025

[14] [14]

The bayesian separation principle for data-driven control,

G. Baggio, R. Carli, R. A. Grimaldi, and G. Pillonetto, “The bayesian separation principle for data-driven control,”arXiv preprint arXiv:2409.16717, 2024

work page arXiv 2024

[15] [15]

Data-enabled predictive control: In the shallows of the deepc,

J. Coulson, J. Lygeros, and F. D ¨orfler, “Data-enabled predictive control: In the shallows of the deepc,” in2019 18th European control conference (ECC). IEEE, 2019, pp. 307–312

work page 2019

[16] [16]

A note on persistency of excitation,

J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005

work page 2005

[17] [17]

A. K. Gupta and D. K. Nagar,Matrix variate distributions. Chapman and Hall/CRC, 2018

work page 2018

[18] [18]

arXiv preprint arXiv:2405.12762 (2 024)

P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint arXiv:2405.12762, 2024

work page arXiv 2024