pith. sign in

arxiv: 2604.24442 · v2 · pith:GPMBMTWMnew · submitted 2026-04-27 · 📡 eess.SY · cs.SY

The Fragility of Learning LQG Controllers

Pith reviewed 2026-05-21 01:07 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords LQG controlsample complexityinformation-theoretic lower boundsrobust control fragilityoffline learningpartial observationsFisher informationcertainty equivalence
0
0 comments X

The pith

Any algorithm learning a stabilizing LQG controller from offline data has excess cost lower-bounded by the product of the LQG cost Hessian and the inverse Fisher information of the exploration policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves an information-theoretic lower bound on how much worse any learned LQG controller must perform compared to the optimal one, when the controller is obtained from a finite offline dataset. The bound is a local minimax quantity that scales with the sensitivity of the quadratic cost to changes in the system matrices and with how little information the data-generating policy provides about those matrices. If correct, this shows that classic fragile control problems, such as variants of the Doyle LQG example or non-minimum-phase plants, become high-sample-complexity problems for learning methods. A reader cares because the result supplies a concrete reason why certainty-equivalent controllers can be near-optimal yet still require careful data collection or co-design in the partially observed case.

Core claim

We prove an ε-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. System-theoretic characterizations of these objects enable transparent construction of hard instances, and instantiating the bound on classical fragile robust-control examples demonstrates when robust control fragility translates into high sample complexity for learning-enabled control.

What carries the argument

The ε-local minimax excess-cost lower bound, which lower-bounds the performance gap of any learned stabilizing controller via the Hessian of the LQG cost with respect to parameters times the inverse Fisher information of the fixed linear exploration policy.

If this is right

  • Certainty-equivalent synthesis is asymptotically optimal as the dataset size grows.
  • Fragile robust-control problems map directly to high sample-complexity regimes for any learning procedure.
  • Task-directed choice of the exploration policy is required to keep the inverse Fisher information from inflating the lower bound.
  • System co-design that reduces cost sensitivity can lower the sample requirement for learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Exploration policies could be optimized by maximizing the Fisher information projected onto the directions of largest cost Hessian eigenvalues.
  • The same style of bound may supply guidance for choosing identification experiments in other linear control settings where partial observations are present.

Load-bearing premise

Offline trajectories come from a single fixed linear exploration policy whose Fisher information matrix is invertible and whose distribution satisfies the technical conditions required for the local minimax analysis.

What would settle it

An explicit algorithm and finite dataset on a fragile LQG instance (such as a Doyle counterexample) that produces a stabilizing controller whose excess cost falls below the numerical value of the Hessian-inverse-Fisher bound would contradict the claimed lower bound.

read the original abstract

Learning methods are increasingly used to synthesize controllers from data, yet existing sample-complexity characterizations for continuous control are sharp only in the fully observed setting. This paper studies the partially observed case by deriving information-theoretic lower bounds for learning Linear Quadratic Gaussian (LQG) controllers from offline trajectories generated by a (linear) exploration policy. We prove an $\varepsilon$-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. We further provide system-theoretic characterizations of these objects, enabling transparent construction of hard instances. Instantiating the bound on classical fragile robust-control examples, including variants of the Doyle LQG fragility counterexample and non-minimum-phase systems, demonstrates when fragile robust control problems translate into high sample complexity for learning-enabled control. These results suggest the asymptotic optimality of certainty-equivalent synthesis and motivate the importance of both task-directed experiment design and system co-design for sample-efficient learning in partially observed control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript derives an ε-local minimax excess-cost lower bound that applies to any algorithm mapping an offline dataset (generated by a fixed linear exploration policy) to a stabilizing linear controller in the partially observed LQG setting. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters multiplied by the inverse Fisher information induced by the exploration policy. System-theoretic characterizations of the Hessian and Fisher objects are provided to enable construction of hard instances, which are instantiated on fragile robust-control examples (variants of the Doyle LQG counterexample and non-minimum-phase systems) to illustrate high sample complexity and motivate task-directed experiment design.

Significance. If the derivation is correct under the stated assumptions, the result is significant for the field of learning-enabled control. It supplies information-theoretic lower bounds that link robust-control fragility to sample complexity in the partially observed case, where existing characterizations are less sharp. The system-theoretic characterizations of the Hessian and Fisher information are a clear strength, as they permit transparent construction of hard instances rather than opaque parameter choices. This supports the suggestion of asymptotic optimality for certainty-equivalent synthesis and underscores the value of co-design and directed exploration.

major comments (2)
  1. [Abstract / bound derivation] Abstract and bound-derivation paragraph: the ε-local minimax lower bound is obtained by applying a local minimax theorem that requires the induced distribution to satisfy regularity conditions (local identifiability, twice continuous differentiability of the risk in a neighborhood, and positive-definiteness of the Fisher information matrix in the relevant parameter directions). The manuscript asserts that the fixed linear exploration policy has invertible Fisher information and meets the needed technical conditions, but provides no explicit verification or auxiliary lemma confirming that these conditions continue to hold inside the stability region for the fragile instances (Doyle counterexample and non-minimum-phase systems), where the Hessian can become ill-conditioned near stability boundaries.
  2. [System-theoretic characterizations] Section on system-theoretic characterizations (presumably §4): while the Hessian of the LQG cost and the Fisher information are given system-theoretic expressions, the manuscript does not demonstrate that the resulting objects remain well-defined and satisfy the local-minimax regularity conditions uniformly for all stabilizing controllers in the neighborhood of the fragile examples. This is load-bearing for the claim that the bound applies to the motivating high-sample-complexity instances.
minor comments (2)
  1. [Notation and preliminaries] Notation for the Hessian H and Fisher information matrix I could be introduced with explicit definitions and dimensions in the main text (rather than assuming familiarity with the information-theoretic objects) to improve readability for control-theoretic readers.
  2. [Instantiation on examples] The abstract states that the bound demonstrates 'when fragile robust control problems translate into high sample complexity,' but the manuscript would benefit from a short table or numerical example quantifying the scaling of the lower bound for the Doyle instance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below and will revise the manuscript to provide the requested explicit verifications of the regularity conditions for the fragile instances.

read point-by-point responses
  1. Referee: [Abstract / bound derivation] Abstract and bound-derivation paragraph: the ε-local minimax lower bound is obtained by applying a local minimax theorem that requires the induced distribution to satisfy regularity conditions (local identifiability, twice continuous differentiability of the risk in a neighborhood, and positive-definiteness of the Fisher information matrix in the relevant parameter directions). The manuscript asserts that the fixed linear exploration policy has invertible Fisher information and meets the needed technical conditions, but provides no explicit verification or auxiliary lemma confirming that these conditions continue to hold inside the stability region for the fragile instances (Doyle counterexample and non-minimum-phase systems), where the Hessian can become ill-conditioned near stability boundaries.

    Authors: We agree that explicit verification strengthens the result. In the revised manuscript we will add an auxiliary lemma in the appendix confirming local identifiability and positive-definiteness of the Fisher information for the linear exploration policies used in the Doyle counterexample and non-minimum-phase systems, within a neighborhood of the nominal parameters that remains inside the stability region. Twice continuous differentiability of the risk follows from the analyticity of the LQG cost in the interior of the stability set. These additions ensure the local minimax theorem applies directly to the motivating examples. revision: yes

  2. Referee: [System-theoretic characterizations] Section on system-theoretic characterizations (presumably §4): while the Hessian of the LQG cost and the Fisher information are given system-theoretic expressions, the manuscript does not demonstrate that the resulting objects remain well-defined and satisfy the local-minimax regularity conditions uniformly for all stabilizing controllers in the neighborhood of the fragile examples. This is load-bearing for the claim that the bound applies to the motivating high-sample-complexity instances.

    Authors: The system-theoretic expressions are derived under stabilizability and detectability, which hold throughout the interior of the stability region. To address the concern we will add a remark together with explicit calculations in the revised manuscript showing that, for the specific fragile examples, both the Hessian and Fisher information remain well-defined and satisfy positive-definiteness in a sufficiently small neighborhood around the nominal parameters. Uniformity over the entire set of stabilizing controllers is neither claimed nor required; the local minimax bound only needs the conditions inside a local ball, which the added calculations will confirm for the high-sample-complexity instances. revision: yes

Circularity Check

0 steps flagged

No circularity: lower bound derived from independent information-theoretic quantities

full rationale

The paper's central result is an ε-local minimax excess-cost lower bound for any algorithm that maps offline trajectories to a stabilizing linear controller. This bound is expressed directly in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher information induced by a fixed linear exploration policy. Both the Hessian and Fisher information are defined independently of the final controller output and of the learning algorithm itself. The derivation invokes standard local minimax analysis under regularity conditions (local identifiability, twice differentiability, positive-definiteness of the information matrix) that are stated as assumptions on the exploration policy and are not constructed from the bound or from any fitted controller. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The result is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard information-theoretic inequalities and linear-Gaussian system assumptions that are not introduced ad hoc by the paper.

axioms (2)
  • domain assumption The underlying system is linear time-invariant with Gaussian process and measurement noise.
    Invoked when defining the LQG cost and the data-generating process (abstract).
  • domain assumption The exploration policy is linear and produces trajectories whose Fisher information matrix is well-defined and invertible.
    Required for the inverse-Fisher term in the lower bound (abstract).

pith-pipeline@v0.9.0 · 5731 in / 1467 out tokens · 32248 ms · 2026-05-21T01:07:05.208170+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Guaranteed margins for LQG regulators,

    J. C. Doyle, “Guaranteed margins for LQG regulators,”IEEE Transac- tions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978

  2. [2]

    Human-level control through deep reinforcement learning,

    V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

  3. [3]

    Regret bounds for the adaptive control of linear quadratic systems,

    Y . Abbasi-Yadkori and C. Szepesv ´ari, “Regret bounds for the adaptive control of linear quadratic systems,” inProceedings of the 24th Annual Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 2011, pp. 1–26

  4. [4]

    On the sample com- plexity of the linear quadratic regulator,

    S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com- plexity of the linear quadratic regulator,”Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020

  5. [5]

    Certainty equivalence is efficient for linear quadratic control,

    H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in Neural Information Processing Systems, vol. 32, 2019

  6. [6]

    Naive exploration is optimal for online LQR,

    M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 8937–8948

  7. [7]

    Linear systems can be hard to learn,

    A. Tsiamis and G. J. Pappas, “Linear systems can be hard to learn,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 2903–2910

  8. [8]

    Learning to control linear systems can be hard,

    A. Tsiamis, I. M. Ziemann, M. Morari, N. Matni, and G. J. Pappas, “Learning to control linear systems can be hard,” inConference on Learning Theory. PMLR, 2022, pp. 3820–3857

  9. [9]

    How are policy gradient methods affected by the limits of control?

    I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 5992– 5999

  10. [10]

    Black-box control for linear dynamical sys- tems,

    X. Chen and E. Hazan, “Black-box control for linear dynamical sys- tems,” inConference on Learning Theory. PMLR, 2021, pp. 1114– 1143

  11. [11]

    Task-optimal exploration in linear dynamical systems,

    A. J. Wagenmaker, M. Simchowitz, and K. Jamieson, “Task-optimal exploration in linear dynamical systems,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 10 641–10 652

  12. [12]

    Optimal exploration for model-based RL in nonlinear systems,

    A. Wagenmaker, G. Shi, and K. G. Jamieson, “Optimal exploration for model-based RL in nonlinear systems,”Advances in Neural Information Processing Systems, vol. 36, pp. 15 406–15 455, 2023

  13. [13]

    Active learning for control-oriented identification of nonlinear systems,

    B. D. Lee, I. Ziemann, G. J. Pappas, and N. Matni, “Active learning for control-oriented identification of nonlinear systems,” in2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 3011–3018

  14. [14]

    Regret lower bounds for learning linear quadratic gaussian systems,

    I. Ziemann and H. Sandberg, “Regret lower bounds for learning linear quadratic gaussian systems,”IEEE Transactions on Automatic Control, 2024

  15. [15]

    Logarith- mic regret bound in partially observable linear dynamical systems,

    S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar, “Logarith- mic regret bound in partially observable linear dynamical systems,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 20 876– 20 888, 2020

  16. [16]

    State-space solutions to standardH 2 andH ∞ control problems,

    J. Doyle, K. Glover, P. Khargonekar, and B. Francis, “State-space solutions to standardH 2 andH ∞ control problems,” in1988 American Control Conference. IEEE, 1988, pp. 1691–1696

  17. [17]

    Robust control of ill-conditioned plants: High-purity distillation,

    S. Skogestad, M. Morari, and J. Doyle, “Robust control of ill-conditioned plants: High-purity distillation,”IEEE transactions on automatic control, vol. 33, no. 12, pp. 1092–1105, 1988

  18. [18]

    J. C. Doyle, B. A. Francis, and A. R. Tannenbaum,Feedback control theory. Courier Corporation, 2013

  19. [19]

    Understanding robust control theory via stick balancing,

    Y . P. Leong and J. C. Doyle, “Understanding robust control theory via stick balancing,” in2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 1508–1514

  20. [20]

    How are learned perception- based controllers impacted by the limits of robust control?

    J. Xu, B. Lee, N. Matni, and D. Jayaraman, “How are learned perception- based controllers impacted by the limits of robust control?” inLearning for Dynamics and Control. PMLR, 2021, pp. 954–966

  21. [21]

    On the hardness of learning to stabilize linear systems,

    X. Zeng, Z. Liu, Z. Du, N. Ozay, and M. Sznaier, “On the hardness of learning to stabilize linear systems,” in2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 6622–6628

  22. [22]

    Identification for control: From the early achievements to the revival of experiment design,

    M. Gevers, “Identification for control: From the early achievements to the revival of experiment design,”European journal of control, vol. 11, no. 4-5, pp. 335–352, 2005

  23. [23]

    ‘Plant- friendly’ system identification: a challenge for the process industries,

    D. E. Rivera, H. Lee, M. W. Braun, and H. D. Mittelmann, “‘Plant- friendly’ system identification: a challenge for the process industries,” IFAC Proceedings Volumes, vol. 36, no. 16, pp. 891–896, 2003

  24. [24]

    From experiment design to closed-loop control,

    H. Hjalmarsson, “From experiment design to closed-loop control,” Automatica, vol. 41, no. 3, pp. 393–438, 2005

  25. [25]

    Robust optimal experiment design for system identification,

    C. R. Rojas, J. S. Welsh, G. C. Goodwin, and A. Feuer, “Robust optimal experiment design for system identification,”Automatica, vol. 43, no. 6, pp. 993–1008, 2007

  26. [26]

    System identification of complex and structured sys- tems,

    H. Hjalmarsson, “System identification of complex and structured sys- tems,”European journal of control, vol. 15, no. 3-4, pp. 275–310, 2009

  27. [27]

    Optimal experiment design for open and closed-loop system identification,

    X. Bombois, M. Gevers, R. Hildebrand, and G. Solari, “Optimal experiment design for open and closed-loop system identification,” Communications in Information and Systems, vol. 11, no. 3, pp. 197– 224, 2011

  28. [28]

    High-purity distillation,

    D. E. Rivera, H. Lee, H. D. Mittelmann, and M. W. Braun, “High-purity distillation,”IEEE Control Systems Magazine, vol. 27, no. 5, pp. 72–89, 2007

  29. [29]

    Lecture notes for statistics 311/electrical engineering 377,

    J. Duchi, “Lecture notes for statistics 311/electrical engineering 377,” 2016

  30. [30]

    The fundamental limitations of learning linear-quadratic regulators,

    B. D. Lee, I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “The fundamental limitations of learning linear-quadratic regulators,” in2023 17 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 4053–4060

  31. [31]

    K. Zhou, J. C. Doyle, and K. Glover,Robust and optimal control. Prentice Hall, 1996

  32. [32]

    Morari and E

    M. Morari and E. Zafiriou,Robust process control. Prentice Hall, 1989

  33. [33]

    Applications of the van Trees inequality: a Bayesian Cram ´er-Rao bound,

    R. D. Gill and B. Y . Levit, “Applications of the van Trees inequality: a Bayesian Cram ´er-Rao bound,” 1995

  34. [34]

    System identification,

    L. Ljung, “System identification,” inSignal analysis and prediction. Springer, 1998, pp. 163–173

  35. [35]

    High effort, low gain: Fundamental limits of active learning for linear dynamical systems,

    N. Chatzikiriakos, K. Jamieson, and A. Iannelli, “High effort, low gain: Fundamental limits of active learning for linear dynamical systems,” arXiv preprint arXiv:2509.11907, 2025

  36. [36]

    S ¨oderstr¨om,Discrete-time stochastic systems: estimation and control

    T. S ¨oderstr¨om,Discrete-time stochastic systems: estimation and control. Springer Science & Business Media, 2012

  37. [37]

    Robustness with observers,

    J. Doyle and G. Stein, “Robustness with observers,”IEEE transactions on automatic control, vol. 24, no. 4, pp. 607–611, 2003

  38. [38]

    Bas ¸ar and P

    T. Bas ¸ar and P. Bernhard,H-infinity optimal control and related minimax design problems: a dynamic game approach. Springer Science & Business Media, 2008

  39. [39]

    On the necessity of identifying the true parameter in adaptive LQ control,

    J. W. Polderman, “On the necessity of identifying the true parameter in adaptive LQ control,”Systems & control letters, vol. 8, no. 2, pp. 87–91, 1986

  40. [40]

    On the equivalence of Youla, system-level, and input–output param- eterizations,

    Y . Zheng, L. Furieri, A. Papachristodoulou, N. Li, and M. Kamgarpour, “On the equivalence of Youla, system-level, and input–output param- eterizations,”IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 413–420, 2020. APPENDIXI NON-MINIMUM PHASE EXAMPLE Consider system A= 1 1 θ1 , B= 0 1 , C= −ξ1 . The system has a non-minimum phase zero at1+ξ. ...

  41. [41]

    Substituting ˙A= 0and ˙B=Bthis simplifies to ˙F=−F−2Ψ −1B⊤P(A+BF)−Ψ −1B⊤P ′(A+BF)

    as ˙F=−(B ⊤P B+R) −1( ˙B⊤P(A+BF) +B ⊤P( ˙A + ˙BF) +B ⊤P ′(A+BF)), where ˙P=dlyap(A+BF,(A+BF) ⊤P( ˙A+ ˙BF) + ( ˙A+ ˙BF) ⊤P(A+BF). Substituting ˙A= 0and ˙B=Bthis simplifies to ˙F=−F−2Ψ −1B⊤P(A+BF)−Ψ −1B⊤P ′(A+BF). It holds that Ψ−1B⊤P(A+BF) = Ψ −1RB⊤P A=O(σ). For the remaining term in the expression of ˙F, observe that the second argument defining the Lyapu...