pith. sign in

arxiv: 2605.10493 · v2 · pith:JXL4H4BWnew · submitted 2026-05-11 · 🧮 math.OC · cs.SY· eess.SY· stat.ML

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

Pith reviewed 2026-05-22 10:37 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SYstat.ML
keywords PAC-Bayeslinear discrete-time systemsstochastic controllersunknown parametersquadratic costhigh-probability boundslearning algorithms
0
0 comments X

The pith

A PAC-Bayes bound gives high-probability performance guarantees for any learned stochastic controller on unknown linear discrete-time systems, including cases with unbounded quadratic costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a PAC-Bayes framework for learning controllers when the linear system parameters come from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the performance of any stochastic controller obtained from data. This bound applies even when the quadratic cost is unbounded, extending beyond earlier results. The authors also give efficient algorithms that carry theoretical guarantees and work whether the set of candidate controllers is finite or infinite. In the special case where linear-quadratic-Gaussian control is optimal, the learned controllers reach performance close to that of the optimal LQG solution.

Core claim

The central claim is that a PAC-Bayes analysis produces a data-dependent high-probability bound on the expected cost of any learned stochastic controller for an unknown stochastic linear discrete-time system whose parameters are drawn from a fixed unknown distribution, and that this bound remains valid for unbounded quadratic costs; the same analysis yields practical learning algorithms that come with guarantees for both finite and infinite controller spaces.

What carries the argument

The data-dependent PAC-Bayes bound on the expected quadratic cost of a stochastic controller drawn from a posterior distribution over controller parameters.

If this is right

  • The bound supplies a concrete certificate that can be checked after data collection before a controller is deployed.
  • The algorithms scale to infinite controller spaces without requiring explicit enumeration.
  • Performance guarantees continue to hold when the cost function is an unbounded quadratic form.
  • In the LQG setting the learned controllers empirically match the performance of the optimal solution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same PAC-Bayes construction could be used to obtain guarantees for policies that are updated online as new data arrives.
  • Extending the framework to nonlinear or partially observed systems would require only changes to the cost and dynamics models inside the bound.
  • The approach naturally lends itself to robust variants by replacing the prior with a worst-case distribution over possible parameter laws.

Load-bearing premise

The system parameters are drawn from a fixed but unknown distribution.

What would settle it

Run many independent trials in which system parameters are sampled from the same distribution, learn a controller from data each time, and check whether the observed cost exceeds the derived bound more frequently than the claimed probability allows.

Figures

Figures reproduced from arXiv: 2605.10493 by Jingge Zhu, Jonathan H. Manton, Ye Pu, Yujia Luo.

Figure 1
Figure 1. Figure 1: Comparison of PAC-Bayes upper bounds and expected cost, across varying training trajectories per controller, for a time-invariant linear discrete-time system with a finite controller space. Example 2 (Controller evaluation). To further evaluate the con￾troller learned by our PAC-Bayes approach, we consider a mod￾ified version of Example 1 in which the classical finite-horizon LQG controller is globally opt… view at source ↗
Figure 1
Figure 1. Figure 1: Comparison of PAC-Bayes upper bounds and expected cost, across varying training trajectories per controller, for a time-invariant linear discrete-time system with a finite controller space [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of expected costs under the prior P0 and the learned pos￾terior Pθ , together with the PAC-Bayes bound, across varying iterations. cost of P (Iter) θ has dropped from its initial value of about 1290 (corresponding to the starting choice Pθ0 = P0) to around 7 and then remains at this low level in all subsequent iterations. This indicates that Algorithm 2 not only effectively learns a posterior Pθ… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of the expected costs of the prior P0, the learned posterior P ⋆ PAC, and the finite-horizon LQG controller, denoted by C¯(P0), C¯(P ⋆ PAC), and J(K(t) LQG;A,B), respectively, as the horizon length T varies with a fixed number of training trajectories per controller n = 10 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of expected costs under the prior P0 and the learned posterior Pθ as the number of training trajectories per controller varies [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
read the original abstract

This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper develops a PAC-Bayes framework for learning stochastic controllers for unknown linear discrete-time systems whose parameters are drawn from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the expected infinite-horizon quadratic cost of any learned controller, proposes efficient algorithms with guarantees for both finite and infinite controller spaces, and claims the bound applies even when the quadratic cost is unbounded (unlike prior work). Numerical experiments indicate that the learned controllers achieve performance comparable to LQG when LQG is optimal.

Significance. If the integrability of the unbounded quadratic cost under the data-dependent posterior is rigorously established, the result would meaningfully extend PAC-Bayes control bounds to the standard LQR/LQG setting with unbounded costs. The provision of algorithms for both finite and infinite controller spaces and the data-dependent nature of the bound are strengths that could support practical learning from trajectory data.

major comments (1)
  1. [§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.
minor comments (2)
  1. [§2] Notation for the closed-loop matrix and the infinite-horizon cost functional should be introduced earlier and used consistently; the current placement after the main theorem makes the unbounded-cost claim harder to follow.
  2. [§5] The numerical section would benefit from reporting the empirical frequency of closed-loop instability across posterior samples, which would directly address the integrability concern raised above.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review of our manuscript. The major comment raises an important point about ensuring the finiteness of the expected cost under the learned posterior. We address this below and plan to revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses
  1. Referee: [§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.

    Authors: We thank the referee for this insightful comment. Our derivation of the PAC-Bayes bound assumes that the expected cost is finite, which is ensured by our choice of prior distribution over controllers that are stabilizing with respect to the distribution of system parameters. Because the posterior is absolutely continuous w.r.t. the prior, this property carries over to the posterior, guaranteeing that E[cost] < ∞ almost surely. Nevertheless, we agree that an explicit statement would improve clarity. In the revised manuscript, we will add a supporting lemma in §3.2 establishing the finite moment condition under our assumptions, possibly using a truncation argument for the cost function. This revision will be made. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation applies standard PAC-Bayes to controller performance with independent stability considerations.

full rationale

The paper derives a data-dependent high-probability bound on controller performance for linear systems with parameters drawn from an unknown distribution, extending prior PAC-Bayes results to unbounded quadratic costs. No quoted equations or steps reduce the bound to a fitted quantity by construction, nor does any self-citation chain serve as the sole justification for a uniqueness claim or ansatz. The central result relies on PAC-Bayes inequalities applied to the expected cost under the posterior, which is an independent application rather than a renaming or self-definition. The provided abstract and skeptic notes raise a potential integrability question for the unbounded cost, but this is a correctness or proof-completeness issue rather than circularity, as no derivation step is shown to be tautological with its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the modeling assumption that system parameters are drawn from a fixed but unknown distribution; no free parameters, invented entities, or additional axioms are mentioned in the abstract.

axioms (1)
  • domain assumption System parameters are drawn from a fixed but unknown distribution
    Stated in the abstract as the setting for which the PAC-Bayes bound is derived.

pith-pipeline@v0.9.0 · 5631 in / 1200 out tokens · 28101 ms · 2026-05-22T10:37:38.456054+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Adaptive linear quadratic Gaussian control: The cost-biased approach revisited , author=. SIAM J. Control Optim. , volume=

  2. [2]

    IEEE Trans

    Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Trans. Autom. Control , volume=

  3. [3]

    IEEE Trans

    Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Trans. Autom. Control , volume=

  4. [4]

    Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Adv. Neural Inf. Process. Syst. , volume=

  5. [5]

    IEEE Trans

    Learning robust data-based LQG controllers from noisy data , author=. IEEE Trans. Autom. Control , volume=

  6. [6]

    Adaptive dual control of discrete-time LQG problems with unknown-but-bounded parameter , author=. Asian J. Control , volume=

  7. [7]

    Energy AI , volume=

    Deep reinforcement learning for home energy management system control , author=. Energy AI , volume=

  8. [8]

    Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=

  9. [9]

    User-friendly introduction to PAC-Bayes bounds , author=. Found. Trends Mach. Learn. , volume=

  10. [10]

    PAC-Bayesian bounds based on the R

    B. PAC-Bayesian bounds based on the R. Proc. AISTATS , pages=

  11. [11]

    PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Proc. CoRL , pages=

  12. [12]

    A PAC-Bayesian framework for optimal control with stability guarantees , author=. Proc. IEEE CDC , pages=

  13. [13]

    Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Proc. AISTATS , pages=

  14. [14]

    Grant, Michael and Boyd, Stephen , title=

  15. [15]

    2026 , eprint=

    A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems , author=. 2026 , eprint=

  16. [16]

    Optimal adaptive control of an LQG system , author=. Proc. IEEE CDC , volume=

  17. [17]

    Sub-Gaussian random variables , author=. Ukr. Math. J. , volume=

  18. [18]

    2007 , publisher=

    Optimal Control: Linear Quadratic Methods , author=. 2007 , publisher=

  19. [19]

    LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. Proc. IEEE ICRA , pages=

  20. [20]

    IEEE Access , volume=

    Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian control , author=. IEEE Access , volume=

  21. [21]

    LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. Int. J. Robot. Res. , volume=

  22. [22]

    Learning latent representations to co-adapt to humans , author=. Auton. Robots , volume=

  23. [23]

    arXiv preprint arXiv:2409.00536 , year=

    Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=