pith. sign in

arxiv: 2604.09074 · v1 · submitted 2026-04-10 · 🧮 math.OC · cs.SY· eess.SY

A Bayesian Perspective on the Data-Driven LQR

Pith reviewed 2026-05-10 17:05 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY
keywords data-driven LQRBayesian control designcertainty equivalenceposterior uncertaintysemidefinite programmingregularizationindirect and direct methods
0
0 comments X

The pith

A Bayesian formulation of data-driven LQR decomposes expected cost into certainty-equivalence and variance terms, unifying indirect and direct methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian approach to data-driven linear quadratic regulation for unknown systems subject to disturbance. By placing a posterior over the unknown system matrices, the expected cost splits into the standard certainty-equivalence term plus an additional term that depends on the posterior variance. This split supplies a principled account of regularization and proves that the indirect route (identify the model, then design) is equivalent to the direct route (work with data alone). The direct formulation further reduces to a semidefinite program whose size remains fixed regardless of how much data is collected.

Core claim

Under a Bayesian formulation of data-driven LQR, the expected cost decomposes into a certainty-equivalence term and a variance-dependent term. This decomposition shows that indirect and direct ddLQR are equivalent, and allows the direct method to be cast as a tractable semidefinite program independent of data length.

What carries the argument

The Bayesian posterior over the unknown system matrices together with the decomposition of the expected quadratic cost into certainty-equivalence and variance-dependent components.

If this is right

  • The direct data-driven controller is obtained from a semidefinite program whose decision variables and constraints do not grow with data length.
  • Regularization enters the design automatically through the posterior variance term rather than through an added heuristic parameter.
  • Both optimality gap and closed-loop stability improve relative to certainty-equivalence designs, with the largest gains appearing in low-data regimes.
  • Indirect and direct formulations become mathematically equivalent once both are required to minimize the same Bayesian expected cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The variance term supplies an explicit knob for trading nominal performance against robustness to model uncertainty when the prior and noise statistics are chosen appropriately.
  • The same decomposition may be useful for deriving regularized versions of other data-driven controllers that currently rely on certainty equivalence.
  • In practice, the benefit in low-data regimes will depend on how accurately the posterior can be computed or approximated from the available measurements.

Load-bearing premise

A well-defined posterior over the system matrices must be available from the data, and the problem must obey linear dynamics and quadratic costs.

What would settle it

On a known linear system driven by noise, collect limited batches of input-output data, compute both the Bayesian and the standard certainty-equivalence controllers, and check whether the Bayesian controller produces a strictly smaller optimality gap and higher empirical stability rate as the number of samples decreases.

Figures

Figures reproduced from arXiv: 2604.09074 by Feiran Zhao, Florian D\"orfler, Thierry Schwaller.

Figure 1
Figure 1. Figure 1: Effects of the regularization for the covariance [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

The data-driven linear quadratic regulator (ddLQR) is a widely studied control method for unknown dynamical systems with disturbance. Existing approaches, both indirect, i.e., those that identify a model followed by model-based design, and direct, which bypasses the identification step, often rely on the certainty-equivalence principle and therefore do not explicitly account for model uncertainty. In this paper, we propose a Bayesian formulation for both indirect and direct ddLQR that incorporates posterior uncertainty into the control design. The resulting expected cost decomposes into a certainty-equivalence term and a variance-dependent term, providing a principled interpretation of regularization. We further show that the indirect and direct formulations are equivalent under this perspective. The resulting direct method admits a tractable semidefinite program whose size is independent of the data length. Numerical simulations demonstrate improved optimality gap and closed-loop stability, particularly in low-data regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Bayesian formulation for both indirect and direct data-driven LQR (ddLQR) that incorporates posterior uncertainty over the unknown system matrices into the control design. It claims that the resulting expected cost decomposes exactly into a certainty-equivalence term plus a variance-dependent term (providing a principled view of regularization), that the indirect and direct Bayesian formulations are equivalent, and that the direct formulation reduces to a tractable SDP whose size is independent of data length. Numerical simulations are presented to show improved optimality gap and closed-loop stability, especially in low-data regimes.

Significance. If the exact decomposition, equivalence, and data-length-independent SDP hold under the stated assumptions, the work would offer a theoretically grounded Bayesian approach to uncertainty-aware ddLQR that moves beyond certainty equivalence while remaining computationally scalable. The claimed equivalence between indirect and direct methods and the interpretation of the variance term as regularization are potentially valuable contributions to data-driven control.

major comments (2)
  1. [§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent SDP are not guaranteed to be exact.
  2. [§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.
minor comments (2)
  1. [§5] §5 (Numerical experiments): the simulations are described only qualitatively. Adding a table with quantitative metrics (optimality gap, closed-loop eigenvalues, success rate over Monte-Carlo trials) together with direct comparisons to standard certainty-equivalence ddLQR and other regularized baselines would make the empirical claims verifiable.
  2. [Abstract and §1] The abstract and introduction should list the precise assumptions (prior, noise model, finite vs. infinite horizon, exact vs. approximate posterior) under which the decomposition and SDP are derived, so that readers can immediately assess applicability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address each major comment below and will revise the manuscript to resolve the identified issues.

read point-by-point responses
  1. Referee: [§3 (Bayesian expected-cost derivation)] The central claim of an exact decomposition E[J(A,B,K)] = J(E[A,B],K) + variance term (abstract and §3) is load-bearing for the entire contribution, including the SDP reformulation. For infinite-horizon LQR the cost J is determined by the Riccati solution P, which satisfies a nonlinear fixed-point equation in the closed-loop dynamics. Taking the posterior expectation does not in general separate into a term depending only on E[A,B] plus a covariance correction, because E[(A+BK)'P(A+BK)] involves the joint distribution of the random closed-loop matrix and the random P that depends on it. The manuscript must either restrict to finite horizon, invoke a conjugate prior that yields closed-form moments, or explicitly state and justify any approximation (e.g., linearization around the mean). Without this clarification the decomposition and the subsequent

    Authors: We thank the referee for this precise observation. The derivation in §3 is performed for the finite-horizon LQR cost, where the total cost is a finite sum of quadratic terms. Under this setting the posterior expectation separates exactly into the certainty-equivalence term evaluated at the posterior mean plus a term that depends only on the posterior covariance, without requiring the nonlinear Riccati fixed-point. We will revise §3, the abstract, and all related statements to explicitly restrict the claims to the finite-horizon case (or, if the infinite-horizon extension is retained, to state the first-order linearization approximation around the mean and justify its use). The SDP reformulation will be updated to reflect the same assumption, ensuring all claims remain exact under the stated conditions. revision: yes

  2. Referee: [§4] §4 (Equivalence of indirect and direct formulations): the proof of equivalence between the two Bayesian problems relies on the same posterior and the same decomposition. If the decomposition in §3 requires additional assumptions or approximations, the equivalence claim is affected and must be revisited with the same caveats.

    Authors: The equivalence proof in §4 is constructed directly from the posterior distribution and the expected-cost decomposition derived in §3. Once the finite-horizon restriction (or approximation) is stated clearly in §3, the same conditions will be carried into §4. We will revise the proof to include explicit cross-references to the assumptions and to restate the equivalence result under those conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: Bayesian expected-cost decomposition derived from posterior expectation without reduction to inputs

full rationale

The paper's core claims rest on defining a posterior over system matrices (A, B) from data, then taking the expectation of the LQR cost under that posterior. This yields the stated decomposition into a certainty-equivalence term (evaluated at the posterior mean) plus a variance-dependent correction term by direct application of the expectation operator to the quadratic cost functional. The claimed equivalence between indirect and direct formulations follows from rewriting the same posterior expectation in data-matrix form, and the data-length-independent SDP is obtained by relaxing the resulting matrix inequality using only the first two posterior moments. None of these steps reduce by construction to a fitted parameter renamed as a prediction, nor do they rely on self-citations for the load-bearing algebraic identities. The derivation is self-contained once the prior and noise model are fixed; any limitations on exactness for infinite-horizon Riccati solutions are questions of modeling assumptions rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger records the minimal standard assumptions implied by the LQR setting; no free parameters or new entities are identifiable from the given text.

axioms (2)
  • domain assumption The underlying plant is a linear time-invariant system driven by additive disturbances
    Required for the standard LQR cost and for the data-driven identification step referenced throughout the abstract.
  • domain assumption A posterior distribution over the unknown system matrices exists and can be used to compute an expected cost
    Central to the Bayesian formulation; the abstract does not specify the prior or whether the posterior is exact.

pith-pipeline@v0.9.0 · 5454 in / 1495 out tokens · 71160 ms · 2026-05-10T17:05:50.443927+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    B. D. Anderson and J. B. Moore,Optimal control: linear quadratic methods. Courier Corporation, 2007

  2. [2]

    Data-driven control: Part two of two: Hot take: Why not go with models?

    F. D ¨orfler, “Data-driven control: Part two of two: Hot take: Why not go with models?”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 27–31, 2023

  3. [3]

    Certainty equivalence is efficient for linear quadratic control,

    H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in neural information processing systems, vol. 32, 2019

  4. [4]

    Formulas for data-driven control: Stabi- lization, optimality, and robustness,

    C. De Persis and P. Tesi, “Formulas for data-driven control: Stabi- lization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2020

  5. [5]

    Bridging direct and indirect data-driven control formulations via regularizations and relaxations,

    F. D ¨orfler, J. Coulson, and I. Markovsky, “Bridging direct and indirect data-driven control formulations via regularizations and relaxations,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 883–897, 2022

  6. [6]

    On the certainty-equivalence approach to direct data-driven lqr design,

    F. D ¨orfler, P. Tesi, and C. De Persis, “On the certainty-equivalence approach to direct data-driven lqr design,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7989–7996, 2023

  7. [7]

    Statistical learning theory for control: A finite-sample perspective,

    A. Tsiamis, I. Ziemann, N. Matni, and G. J. Pappas, “Statistical learning theory for control: A finite-sample perspective,”IEEE Control Systems Magazine, vol. 43, no. 6, pp. 67–97, 2023

  8. [8]

    Behavioral systems theory in data-driven analysis, signal processing, and control,

    I. Markovsky and F. D ¨orfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,”Annual Reviews in Control, vol. 52, pp. 42–64, 2021

  9. [9]

    Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

    M. Bartos, J. K ¨ohler, F. D ¨orfler, and M. N. Zeilinger, “Stability of certainty-equivalent adaptive lqr for linear systems with unknown time-varying parameters,”arXiv preprint arXiv:2511.08236, 2025

  10. [10]

    Data-enabled policy optimization for direct adaptive learning of the lqr,

    F. Zhao, F. D ¨orfler, A. Chiuso, and K. You, “Data-enabled policy optimization for direct adaptive learning of the lqr,”IEEE Transactions on Automatic Control, 2025

  11. [11]

    Pillonetto, T

    G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, L. Ljunget al., Regularized system identification-Learning dynamic models from data. Springer, 2022

  12. [12]

    Regularization for covariance parameterization of direct data-driven lqr control,

    F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for covariance parameterization of direct data-driven lqr control,”IEEE Control Systems Letters, 2025

  13. [13]

    Harnessing uncertainty for a separation principle in direct data-driven predictive control,

    A. Chiuso, M. Fabris, V . Breschi, and S. Formentin, “Harnessing uncertainty for a separation principle in direct data-driven predictive control,”Automatica, vol. 173, p. 112070, 2025

  14. [14]

    The bayesian separation principle for data-driven control,

    G. Baggio, R. Carli, R. A. Grimaldi, and G. Pillonetto, “The bayesian separation principle for data-driven control,”arXiv preprint arXiv:2409.16717, 2024

  15. [15]

    Data-enabled predictive control: In the shallows of the deepc,

    J. Coulson, J. Lygeros, and F. D ¨orfler, “Data-enabled predictive control: In the shallows of the deepc,” in2019 18th European control conference (ECC). IEEE, 2019, pp. 307–312

  16. [16]

    A note on persistency of excitation,

    J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005

  17. [17]

    A. K. Gupta and D. K. Nagar,Matrix variate distributions. Chapman and Hall/CRC, 2018

  18. [18]

    arXiv preprint arXiv:2405.12762 (2 024)

    P. J. Goulart and Y . Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives,”arXiv preprint arXiv:2405.12762, 2024