The Fragility of Learning LQG Controllers
Pith reviewed 2026-05-21 01:07 UTC · model grok-4.3
The pith
Any algorithm learning a stabilizing LQG controller from offline data has excess cost lower-bounded by the product of the LQG cost Hessian and the inverse Fisher information of the exploration policy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove an ε-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. System-theoretic characterizations of these objects enable transparent construction of hard instances, and instantiating the bound on classical fragile robust-control examples demonstrates when robust control fragility translates into high sample complexity for learning-enabled control.
What carries the argument
The ε-local minimax excess-cost lower bound, which lower-bounds the performance gap of any learned stabilizing controller via the Hessian of the LQG cost with respect to parameters times the inverse Fisher information of the fixed linear exploration policy.
If this is right
- Certainty-equivalent synthesis is asymptotically optimal as the dataset size grows.
- Fragile robust-control problems map directly to high sample-complexity regimes for any learning procedure.
- Task-directed choice of the exploration policy is required to keep the inverse Fisher information from inflating the lower bound.
- System co-design that reduces cost sensitivity can lower the sample requirement for learning.
Where Pith is reading between the lines
- Exploration policies could be optimized by maximizing the Fisher information projected onto the directions of largest cost Hessian eigenvalues.
- The same style of bound may supply guidance for choosing identification experiments in other linear control settings where partial observations are present.
Load-bearing premise
Offline trajectories come from a single fixed linear exploration policy whose Fisher information matrix is invertible and whose distribution satisfies the technical conditions required for the local minimax analysis.
What would settle it
An explicit algorithm and finite dataset on a fragile LQG instance (such as a Doyle counterexample) that produces a stabilizing controller whose excess cost falls below the numerical value of the Hessian-inverse-Fisher bound would contradict the claimed lower bound.
read the original abstract
Learning methods are increasingly used to synthesize controllers from data, yet existing sample-complexity characterizations for continuous control are sharp only in the fully observed setting. This paper studies the partially observed case by deriving information-theoretic lower bounds for learning Linear Quadratic Gaussian (LQG) controllers from offline trajectories generated by a (linear) exploration policy. We prove an $\varepsilon$-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. We further provide system-theoretic characterizations of these objects, enabling transparent construction of hard instances. Instantiating the bound on classical fragile robust-control examples, including variants of the Doyle LQG fragility counterexample and non-minimum-phase systems, demonstrates when fragile robust control problems translate into high sample complexity for learning-enabled control. These results suggest the asymptotic optimality of certainty-equivalent synthesis and motivate the importance of both task-directed experiment design and system co-design for sample-efficient learning in partially observed control.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives an ε-local minimax excess-cost lower bound that applies to any algorithm mapping an offline dataset (generated by a fixed linear exploration policy) to a stabilizing linear controller in the partially observed LQG setting. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters multiplied by the inverse Fisher information induced by the exploration policy. System-theoretic characterizations of the Hessian and Fisher objects are provided to enable construction of hard instances, which are instantiated on fragile robust-control examples (variants of the Doyle LQG counterexample and non-minimum-phase systems) to illustrate high sample complexity and motivate task-directed experiment design.
Significance. If the derivation is correct under the stated assumptions, the result is significant for the field of learning-enabled control. It supplies information-theoretic lower bounds that link robust-control fragility to sample complexity in the partially observed case, where existing characterizations are less sharp. The system-theoretic characterizations of the Hessian and Fisher information are a clear strength, as they permit transparent construction of hard instances rather than opaque parameter choices. This supports the suggestion of asymptotic optimality for certainty-equivalent synthesis and underscores the value of co-design and directed exploration.
major comments (2)
- [Abstract / bound derivation] Abstract and bound-derivation paragraph: the ε-local minimax lower bound is obtained by applying a local minimax theorem that requires the induced distribution to satisfy regularity conditions (local identifiability, twice continuous differentiability of the risk in a neighborhood, and positive-definiteness of the Fisher information matrix in the relevant parameter directions). The manuscript asserts that the fixed linear exploration policy has invertible Fisher information and meets the needed technical conditions, but provides no explicit verification or auxiliary lemma confirming that these conditions continue to hold inside the stability region for the fragile instances (Doyle counterexample and non-minimum-phase systems), where the Hessian can become ill-conditioned near stability boundaries.
- [System-theoretic characterizations] Section on system-theoretic characterizations (presumably §4): while the Hessian of the LQG cost and the Fisher information are given system-theoretic expressions, the manuscript does not demonstrate that the resulting objects remain well-defined and satisfy the local-minimax regularity conditions uniformly for all stabilizing controllers in the neighborhood of the fragile examples. This is load-bearing for the claim that the bound applies to the motivating high-sample-complexity instances.
minor comments (2)
- [Notation and preliminaries] Notation for the Hessian H and Fisher information matrix I could be introduced with explicit definitions and dimensions in the main text (rather than assuming familiarity with the information-theoretic objects) to improve readability for control-theoretic readers.
- [Instantiation on examples] The abstract states that the bound demonstrates 'when fragile robust control problems translate into high sample complexity,' but the manuscript would benefit from a short table or numerical example quantifying the scaling of the lower bound for the Doyle instance.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and will revise the manuscript to provide the requested explicit verifications of the regularity conditions for the fragile instances.
read point-by-point responses
-
Referee: [Abstract / bound derivation] Abstract and bound-derivation paragraph: the ε-local minimax lower bound is obtained by applying a local minimax theorem that requires the induced distribution to satisfy regularity conditions (local identifiability, twice continuous differentiability of the risk in a neighborhood, and positive-definiteness of the Fisher information matrix in the relevant parameter directions). The manuscript asserts that the fixed linear exploration policy has invertible Fisher information and meets the needed technical conditions, but provides no explicit verification or auxiliary lemma confirming that these conditions continue to hold inside the stability region for the fragile instances (Doyle counterexample and non-minimum-phase systems), where the Hessian can become ill-conditioned near stability boundaries.
Authors: We agree that explicit verification strengthens the result. In the revised manuscript we will add an auxiliary lemma in the appendix confirming local identifiability and positive-definiteness of the Fisher information for the linear exploration policies used in the Doyle counterexample and non-minimum-phase systems, within a neighborhood of the nominal parameters that remains inside the stability region. Twice continuous differentiability of the risk follows from the analyticity of the LQG cost in the interior of the stability set. These additions ensure the local minimax theorem applies directly to the motivating examples. revision: yes
-
Referee: [System-theoretic characterizations] Section on system-theoretic characterizations (presumably §4): while the Hessian of the LQG cost and the Fisher information are given system-theoretic expressions, the manuscript does not demonstrate that the resulting objects remain well-defined and satisfy the local-minimax regularity conditions uniformly for all stabilizing controllers in the neighborhood of the fragile examples. This is load-bearing for the claim that the bound applies to the motivating high-sample-complexity instances.
Authors: The system-theoretic expressions are derived under stabilizability and detectability, which hold throughout the interior of the stability region. To address the concern we will add a remark together with explicit calculations in the revised manuscript showing that, for the specific fragile examples, both the Hessian and Fisher information remain well-defined and satisfy positive-definiteness in a sufficiently small neighborhood around the nominal parameters. Uniformity over the entire set of stabilizing controllers is neither claimed nor required; the local minimax bound only needs the conditions inside a local ball, which the added calculations will confirm for the high-sample-complexity instances. revision: yes
Circularity Check
No circularity: lower bound derived from independent information-theoretic quantities
full rationale
The paper's central result is an ε-local minimax excess-cost lower bound for any algorithm that maps offline trajectories to a stabilizing linear controller. This bound is expressed directly in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher information induced by a fixed linear exploration policy. Both the Hessian and Fisher information are defined independently of the final controller output and of the learning algorithm itself. The derivation invokes standard local minimax analysis under regularity conditions (local identifiability, twice differentiability, positive-definiteness of the information matrix) that are stated as assumptions on the exploration policy and are not constructed from the bound or from any fitted controller. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The result is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The underlying system is linear time-invariant with Gaussian process and measurement noise.
- domain assumption The exploration policy is linear and produces trajectories whose Fisher information matrix is well-defined and invertible.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove an ε-local minimax excess-cost lower bound ... expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The bound is expressed in terms of the Hessian ... and the inverse Fisher Information
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Guaranteed margins for LQG regulators,
J. C. Doyle, “Guaranteed margins for LQG regulators,”IEEE Transac- tions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978
work page 1978
-
[2]
Human-level control through deep reinforcement learning,
V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015
work page 2015
-
[3]
Regret bounds for the adaptive control of linear quadratic systems,
Y . Abbasi-Yadkori and C. Szepesv ´ari, “Regret bounds for the adaptive control of linear quadratic systems,” inProceedings of the 24th Annual Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 2011, pp. 1–26
work page 2011
-
[4]
On the sample com- plexity of the linear quadratic regulator,
S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com- plexity of the linear quadratic regulator,”Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020
work page 2020
-
[5]
Certainty equivalence is efficient for linear quadratic control,
H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,”Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[6]
Naive exploration is optimal for online LQR,
M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 8937–8948
work page 2020
-
[7]
Linear systems can be hard to learn,
A. Tsiamis and G. J. Pappas, “Linear systems can be hard to learn,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 2903–2910
work page 2021
-
[8]
Learning to control linear systems can be hard,
A. Tsiamis, I. M. Ziemann, M. Morari, N. Matni, and G. J. Pappas, “Learning to control linear systems can be hard,” inConference on Learning Theory. PMLR, 2022, pp. 3820–3857
work page 2022
-
[9]
How are policy gradient methods affected by the limits of control?
I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “How are policy gradient methods affected by the limits of control?” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 5992– 5999
work page 2022
-
[10]
Black-box control for linear dynamical sys- tems,
X. Chen and E. Hazan, “Black-box control for linear dynamical sys- tems,” inConference on Learning Theory. PMLR, 2021, pp. 1114– 1143
work page 2021
-
[11]
Task-optimal exploration in linear dynamical systems,
A. J. Wagenmaker, M. Simchowitz, and K. Jamieson, “Task-optimal exploration in linear dynamical systems,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 10 641–10 652
work page 2021
-
[12]
Optimal exploration for model-based RL in nonlinear systems,
A. Wagenmaker, G. Shi, and K. G. Jamieson, “Optimal exploration for model-based RL in nonlinear systems,”Advances in Neural Information Processing Systems, vol. 36, pp. 15 406–15 455, 2023
work page 2023
-
[13]
Active learning for control-oriented identification of nonlinear systems,
B. D. Lee, I. Ziemann, G. J. Pappas, and N. Matni, “Active learning for control-oriented identification of nonlinear systems,” in2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 3011–3018
work page 2024
-
[14]
Regret lower bounds for learning linear quadratic gaussian systems,
I. Ziemann and H. Sandberg, “Regret lower bounds for learning linear quadratic gaussian systems,”IEEE Transactions on Automatic Control, 2024
work page 2024
-
[15]
Logarith- mic regret bound in partially observable linear dynamical systems,
S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar, “Logarith- mic regret bound in partially observable linear dynamical systems,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 20 876– 20 888, 2020
work page 2020
-
[16]
State-space solutions to standardH 2 andH ∞ control problems,
J. Doyle, K. Glover, P. Khargonekar, and B. Francis, “State-space solutions to standardH 2 andH ∞ control problems,” in1988 American Control Conference. IEEE, 1988, pp. 1691–1696
work page 1988
-
[17]
Robust control of ill-conditioned plants: High-purity distillation,
S. Skogestad, M. Morari, and J. Doyle, “Robust control of ill-conditioned plants: High-purity distillation,”IEEE transactions on automatic control, vol. 33, no. 12, pp. 1092–1105, 1988
work page 1988
-
[18]
J. C. Doyle, B. A. Francis, and A. R. Tannenbaum,Feedback control theory. Courier Corporation, 2013
work page 2013
-
[19]
Understanding robust control theory via stick balancing,
Y . P. Leong and J. C. Doyle, “Understanding robust control theory via stick balancing,” in2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 1508–1514
work page 2016
-
[20]
How are learned perception- based controllers impacted by the limits of robust control?
J. Xu, B. Lee, N. Matni, and D. Jayaraman, “How are learned perception- based controllers impacted by the limits of robust control?” inLearning for Dynamics and Control. PMLR, 2021, pp. 954–966
work page 2021
-
[21]
On the hardness of learning to stabilize linear systems,
X. Zeng, Z. Liu, Z. Du, N. Ozay, and M. Sznaier, “On the hardness of learning to stabilize linear systems,” in2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 6622–6628
work page 2023
-
[22]
Identification for control: From the early achievements to the revival of experiment design,
M. Gevers, “Identification for control: From the early achievements to the revival of experiment design,”European journal of control, vol. 11, no. 4-5, pp. 335–352, 2005
work page 2005
-
[23]
‘Plant- friendly’ system identification: a challenge for the process industries,
D. E. Rivera, H. Lee, M. W. Braun, and H. D. Mittelmann, “‘Plant- friendly’ system identification: a challenge for the process industries,” IFAC Proceedings Volumes, vol. 36, no. 16, pp. 891–896, 2003
work page 2003
-
[24]
From experiment design to closed-loop control,
H. Hjalmarsson, “From experiment design to closed-loop control,” Automatica, vol. 41, no. 3, pp. 393–438, 2005
work page 2005
-
[25]
Robust optimal experiment design for system identification,
C. R. Rojas, J. S. Welsh, G. C. Goodwin, and A. Feuer, “Robust optimal experiment design for system identification,”Automatica, vol. 43, no. 6, pp. 993–1008, 2007
work page 2007
-
[26]
System identification of complex and structured sys- tems,
H. Hjalmarsson, “System identification of complex and structured sys- tems,”European journal of control, vol. 15, no. 3-4, pp. 275–310, 2009
work page 2009
-
[27]
Optimal experiment design for open and closed-loop system identification,
X. Bombois, M. Gevers, R. Hildebrand, and G. Solari, “Optimal experiment design for open and closed-loop system identification,” Communications in Information and Systems, vol. 11, no. 3, pp. 197– 224, 2011
work page 2011
-
[28]
D. E. Rivera, H. Lee, H. D. Mittelmann, and M. W. Braun, “High-purity distillation,”IEEE Control Systems Magazine, vol. 27, no. 5, pp. 72–89, 2007
work page 2007
-
[29]
Lecture notes for statistics 311/electrical engineering 377,
J. Duchi, “Lecture notes for statistics 311/electrical engineering 377,” 2016
work page 2016
-
[30]
The fundamental limitations of learning linear-quadratic regulators,
B. D. Lee, I. Ziemann, A. Tsiamis, H. Sandberg, and N. Matni, “The fundamental limitations of learning linear-quadratic regulators,” in2023 17 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 4053–4060
work page 2023
-
[31]
K. Zhou, J. C. Doyle, and K. Glover,Robust and optimal control. Prentice Hall, 1996
work page 1996
- [32]
-
[33]
Applications of the van Trees inequality: a Bayesian Cram ´er-Rao bound,
R. D. Gill and B. Y . Levit, “Applications of the van Trees inequality: a Bayesian Cram ´er-Rao bound,” 1995
work page 1995
-
[34]
L. Ljung, “System identification,” inSignal analysis and prediction. Springer, 1998, pp. 163–173
work page 1998
-
[35]
High effort, low gain: Fundamental limits of active learning for linear dynamical systems,
N. Chatzikiriakos, K. Jamieson, and A. Iannelli, “High effort, low gain: Fundamental limits of active learning for linear dynamical systems,” arXiv preprint arXiv:2509.11907, 2025
-
[36]
S ¨oderstr¨om,Discrete-time stochastic systems: estimation and control
T. S ¨oderstr¨om,Discrete-time stochastic systems: estimation and control. Springer Science & Business Media, 2012
work page 2012
-
[37]
J. Doyle and G. Stein, “Robustness with observers,”IEEE transactions on automatic control, vol. 24, no. 4, pp. 607–611, 2003
work page 2003
-
[38]
T. Bas ¸ar and P. Bernhard,H-infinity optimal control and related minimax design problems: a dynamic game approach. Springer Science & Business Media, 2008
work page 2008
-
[39]
On the necessity of identifying the true parameter in adaptive LQ control,
J. W. Polderman, “On the necessity of identifying the true parameter in adaptive LQ control,”Systems & control letters, vol. 8, no. 2, pp. 87–91, 1986
work page 1986
-
[40]
On the equivalence of Youla, system-level, and input–output param- eterizations,
Y . Zheng, L. Furieri, A. Papachristodoulou, N. Li, and M. Kamgarpour, “On the equivalence of Youla, system-level, and input–output param- eterizations,”IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 413–420, 2020. APPENDIXI NON-MINIMUM PHASE EXAMPLE Consider system A= 1 1 θ1 , B= 0 1 , C= −ξ1 . The system has a non-minimum phase zero at1+ξ. ...
work page 2020
-
[41]
Substituting ˙A= 0and ˙B=Bthis simplifies to ˙F=−F−2Ψ −1B⊤P(A+BF)−Ψ −1B⊤P ′(A+BF)
as ˙F=−(B ⊤P B+R) −1( ˙B⊤P(A+BF) +B ⊤P( ˙A + ˙BF) +B ⊤P ′(A+BF)), where ˙P=dlyap(A+BF,(A+BF) ⊤P( ˙A+ ˙BF) + ( ˙A+ ˙BF) ⊤P(A+BF). Substituting ˙A= 0and ˙B=Bthis simplifies to ˙F=−F−2Ψ −1B⊤P(A+BF)−Ψ −1B⊤P ′(A+BF). It holds that Ψ−1B⊤P(A+BF) = Ψ −1RB⊤P A=O(σ). For the remaining term in the expression of ˙F, observe that the second argument defining the Lyapu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.