A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
Pith reviewed 2026-05-22 10:37 UTC · model grok-4.3
The pith
A PAC-Bayes bound gives high-probability performance guarantees for any learned stochastic controller on unknown linear discrete-time systems, including cases with unbounded quadratic costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a PAC-Bayes analysis produces a data-dependent high-probability bound on the expected cost of any learned stochastic controller for an unknown stochastic linear discrete-time system whose parameters are drawn from a fixed unknown distribution, and that this bound remains valid for unbounded quadratic costs; the same analysis yields practical learning algorithms that come with guarantees for both finite and infinite controller spaces.
What carries the argument
The data-dependent PAC-Bayes bound on the expected quadratic cost of a stochastic controller drawn from a posterior distribution over controller parameters.
If this is right
- The bound supplies a concrete certificate that can be checked after data collection before a controller is deployed.
- The algorithms scale to infinite controller spaces without requiring explicit enumeration.
- Performance guarantees continue to hold when the cost function is an unbounded quadratic form.
- In the LQG setting the learned controllers empirically match the performance of the optimal solution.
Where Pith is reading between the lines
- The same PAC-Bayes construction could be used to obtain guarantees for policies that are updated online as new data arrives.
- Extending the framework to nonlinear or partially observed systems would require only changes to the cost and dynamics models inside the bound.
- The approach naturally lends itself to robust variants by replacing the prior with a worst-case distribution over possible parameter laws.
Load-bearing premise
The system parameters are drawn from a fixed but unknown distribution.
What would settle it
Run many independent trials in which system parameters are sampled from the same distribution, learn a controller from data each time, and check whether the observed cost exceeds the derived bound more frequently than the claimed probability allows.
Figures
read the original abstract
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a PAC-Bayes framework for learning stochastic controllers for unknown linear discrete-time systems whose parameters are drawn from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the expected infinite-horizon quadratic cost of any learned controller, proposes efficient algorithms with guarantees for both finite and infinite controller spaces, and claims the bound applies even when the quadratic cost is unbounded (unlike prior work). Numerical experiments indicate that the learned controllers achieve performance comparable to LQG when LQG is optimal.
Significance. If the integrability of the unbounded quadratic cost under the data-dependent posterior is rigorously established, the result would meaningfully extend PAC-Bayes control bounds to the standard LQR/LQG setting with unbounded costs. The provision of algorithms for both finite and infinite controller spaces and the data-dependent nature of the bound are strengths that could support practical learning from trajectory data.
major comments (1)
- [§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.
minor comments (2)
- [§2] Notation for the closed-loop matrix and the infinite-horizon cost functional should be introduced earlier and used consistently; the current placement after the main theorem makes the unbounded-cost claim harder to follow.
- [§5] The numerical section would benefit from reporting the empirical frequency of closed-loop instability across posterior samples, which would directly address the integrability concern raised above.
Simulated Author's Rebuttal
We thank the referee for their careful review of our manuscript. The major comment raises an important point about ensuring the finiteness of the expected cost under the learned posterior. We address this below and plan to revise the manuscript accordingly to strengthen the presentation.
read point-by-point responses
-
Referee: [§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.
Authors: We thank the referee for this insightful comment. Our derivation of the PAC-Bayes bound assumes that the expected cost is finite, which is ensured by our choice of prior distribution over controllers that are stabilizing with respect to the distribution of system parameters. Because the posterior is absolutely continuous w.r.t. the prior, this property carries over to the posterior, guaranteeing that E[cost] < ∞ almost surely. Nevertheless, we agree that an explicit statement would improve clarity. In the revised manuscript, we will add a supporting lemma in §3.2 establishing the finite moment condition under our assumptions, possibly using a truncation argument for the cost function. This revision will be made. revision: yes
Circularity Check
No circularity; derivation applies standard PAC-Bayes to controller performance with independent stability considerations.
full rationale
The paper derives a data-dependent high-probability bound on controller performance for linear systems with parameters drawn from an unknown distribution, extending prior PAC-Bayes results to unbounded quadratic costs. No quoted equations or steps reduce the bound to a fitted quantity by construction, nor does any self-citation chain serve as the sole justification for a uniqueness claim or ansatz. The central result relies on PAC-Bayes inequalities applied to the expected cost under the posterior, which is an independent application rather than a renaming or self-definition. The provided abstract and skeptic notes raise a potential integrability question for the unbounded cost, but this is a correctness or proof-completeness issue rather than circularity, as no derivation step is shown to be tautological with its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption System parameters are drawn from a fixed but unknown distribution
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller... our bound holds for unbounded quadratic cost.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adaptive linear quadratic Gaussian control: The cost-biased approach revisited , author=. SIAM J. Control Optim. , volume=
-
[2]
Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Trans. Autom. Control , volume=
-
[3]
Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Trans. Autom. Control , volume=
-
[4]
Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Adv. Neural Inf. Process. Syst. , volume=
-
[5]
Learning robust data-based LQG controllers from noisy data , author=. IEEE Trans. Autom. Control , volume=
-
[6]
Adaptive dual control of discrete-time LQG problems with unknown-but-bounded parameter , author=. Asian J. Control , volume=
-
[7]
Deep reinforcement learning for home energy management system control , author=. Energy AI , volume=
-
[8]
Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=
-
[9]
User-friendly introduction to PAC-Bayes bounds , author=. Found. Trends Mach. Learn. , volume=
-
[10]
PAC-Bayesian bounds based on the R
B. PAC-Bayesian bounds based on the R. Proc. AISTATS , pages=
-
[11]
PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Proc. CoRL , pages=
-
[12]
A PAC-Bayesian framework for optimal control with stability guarantees , author=. Proc. IEEE CDC , pages=
-
[13]
Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Proc. AISTATS , pages=
-
[14]
Grant, Michael and Boyd, Stephen , title=
-
[15]
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems , author=. 2026 , eprint=
work page 2026
-
[16]
Optimal adaptive control of an LQG system , author=. Proc. IEEE CDC , volume=
-
[17]
Sub-Gaussian random variables , author=. Ukr. Math. J. , volume=
-
[18]
Optimal Control: Linear Quadratic Methods , author=. 2007 , publisher=
work page 2007
-
[19]
LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. Proc. IEEE ICRA , pages=
-
[20]
Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian control , author=. IEEE Access , volume=
-
[21]
LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. Int. J. Robot. Res. , volume=
-
[22]
Learning latent representations to co-adapt to humans , author=. Auton. Robots , volume=
-
[23]
arXiv preprint arXiv:2409.00536 , year=
Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.