A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

Jingge Zhu; Jonathan H. Manton; Ye Pu; Yujia Luo

arxiv: 2605.10493 · v2 · pith:JXL4H4BWnew · submitted 2026-05-11 · 🧮 math.OC · cs.SY· eess.SY· stat.ML

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

Yujia Luo , Ye Pu , Jonathan H. Manton , Jingge Zhu This is my paper

Pith reviewed 2026-05-22 10:37 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SYstat.ML

keywords PAC-Bayeslinear discrete-time systemsstochastic controllersunknown parametersquadratic costhigh-probability boundslearning algorithms

0 comments

The pith

A PAC-Bayes bound gives high-probability performance guarantees for any learned stochastic controller on unknown linear discrete-time systems, including cases with unbounded quadratic costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a PAC-Bayes framework for learning controllers when the linear system parameters come from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the performance of any stochastic controller obtained from data. This bound applies even when the quadratic cost is unbounded, extending beyond earlier results. The authors also give efficient algorithms that carry theoretical guarantees and work whether the set of candidate controllers is finite or infinite. In the special case where linear-quadratic-Gaussian control is optimal, the learned controllers reach performance close to that of the optimal LQG solution.

Core claim

The central claim is that a PAC-Bayes analysis produces a data-dependent high-probability bound on the expected cost of any learned stochastic controller for an unknown stochastic linear discrete-time system whose parameters are drawn from a fixed unknown distribution, and that this bound remains valid for unbounded quadratic costs; the same analysis yields practical learning algorithms that come with guarantees for both finite and infinite controller spaces.

What carries the argument

The data-dependent PAC-Bayes bound on the expected quadratic cost of a stochastic controller drawn from a posterior distribution over controller parameters.

If this is right

The bound supplies a concrete certificate that can be checked after data collection before a controller is deployed.
The algorithms scale to infinite controller spaces without requiring explicit enumeration.
Performance guarantees continue to hold when the cost function is an unbounded quadratic form.
In the LQG setting the learned controllers empirically match the performance of the optimal solution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same PAC-Bayes construction could be used to obtain guarantees for policies that are updated online as new data arrives.
Extending the framework to nonlinear or partially observed systems would require only changes to the cost and dynamics models inside the bound.
The approach naturally lends itself to robust variants by replacing the prior with a worst-case distribution over possible parameter laws.

Load-bearing premise

The system parameters are drawn from a fixed but unknown distribution.

What would settle it

Run many independent trials in which system parameters are sampled from the same distribution, learn a controller from data each time, and check whether the observed cost exceeds the derived bound more frequently than the claimed probability allows.

Figures

Figures reproduced from arXiv: 2605.10493 by Jingge Zhu, Jonathan H. Manton, Ye Pu, Yujia Luo.

**Figure 1.** Figure 1: Comparison of PAC-Bayes upper bounds and expected cost, across varying training trajectories per controller, for a time-invariant linear discrete-time system with a finite controller space. Example 2 (Controller evaluation). To further evaluate the controller learned by our PAC-Bayes approach, we consider a modified version of Example 1 in which the classical finite-horizon LQG controller is globally opt… view at source ↗

**Figure 3.** Figure 3: Comparison of expected costs under the prior P0 and the learned posterior Pθ , together with the PAC-Bayes bound, across varying iterations. cost of P (Iter) θ has dropped from its initial value of about 1290 (corresponding to the starting choice Pθ0 = P0) to around 7 and then remains at this low level in all subsequent iterations. This indicates that Algorithm 2 not only effectively learns a posterior Pθ… view at source ↗

**Figure 2.** Figure 2: Comparison of the expected costs of the prior P0, the learned posterior P ⋆ PAC, and the finite-horizon LQG controller, denoted by C¯(P0), C¯(P ⋆ PAC), and J(K(t) LQG;A,B), respectively, as the horizon length T varies with a fixed number of training trajectories per controller n = 10 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of expected costs under the prior P0 and the learned posterior Pθ as the number of training trajectories per controller varies [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PAC-Bayes bound extended to unbounded quadratic costs for linear controllers, with algorithms for infinite spaces, but the stability argument for finite expectations needs checking in the proofs.

read the letter

The one thing to know is that this work extends PAC-Bayes bounds to the case of unbounded quadratic costs for learning controllers in unknown linear discrete-time systems, and it supplies algorithms that apply to both finite and infinite controller spaces. They derive a data-dependent high probability bound on the performance of a learned stochastic controller and propose learning algorithms with theoretical guarantees. In the special case where LQG is optimal, their numerical results indicate the learned controllers get performance close to LQG. That is a concrete check on the method. The main soft spot is whether the bound really holds when the cost is unbounded. For the infinite-horizon quadratic cost to have finite expectation, the closed-loop matrix has to be Schur stable. Because the posterior over controllers is data-dependent and the system parameters are drawn from an unknown distribution, it is possible that the posterior puts positive mass on controllers that make the closed-loop unstable for a positive-measure set of systems. If the proof does not include a step that ensures the expectation is finite, either by showing stability with high probability or by some other argument, then the PAC-Bayes inequality cannot be applied directly to those cases. The stress-test note flags exactly this point, and it would be worth verifying in the full derivations. The rest of the paper appears to follow standard lines for PAC-Bayes in control, with adaptations for the linear system setting. No obvious circularity in the claims based on the abstract and the described results. This paper is aimed at researchers in control theory who are interested in learning controllers with distribution-free performance guarantees. A reader who follows work on robust or data-driven control would find the extension to unbounded costs and the infinite-space algorithms useful. It is solid enough on the formal side to deserve a serious referee. I would send it out for peer review.

Referee Report

1 major / 2 minor

Summary. The paper develops a PAC-Bayes framework for learning stochastic controllers for unknown linear discrete-time systems whose parameters are drawn from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the expected infinite-horizon quadratic cost of any learned controller, proposes efficient algorithms with guarantees for both finite and infinite controller spaces, and claims the bound applies even when the quadratic cost is unbounded (unlike prior work). Numerical experiments indicate that the learned controllers achieve performance comparable to LQG when LQG is optimal.

Significance. If the integrability of the unbounded quadratic cost under the data-dependent posterior is rigorously established, the result would meaningfully extend PAC-Bayes control bounds to the standard LQR/LQG setting with unbounded costs. The provision of algorithms for both finite and infinite controller spaces and the data-dependent nature of the bound are strengths that could support practical learning from trajectory data.

major comments (1)

[§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.

minor comments (2)

[§2] Notation for the closed-loop matrix and the infinite-horizon cost functional should be introduced earlier and used consistently; the current placement after the main theorem makes the unbounded-cost claim harder to follow.
[§5] The numerical section would benefit from reporting the empirical frequency of closed-loop instability across posterior samples, which would directly address the integrability concern raised above.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review of our manuscript. The major comment raises an important point about ensuring the finiteness of the expected cost under the learned posterior. We address this below and plan to revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [§3.2, Theorem 1] §3.2, Theorem 1 and the surrounding derivation: the high-probability PAC-Bayes statement is invoked for the expected quadratic cost, yet the manuscript does not supply an explicit stability or moment lemma showing that E[cost] < ∞ almost surely under the posterior (e.g., via a uniform Schur-stability guarantee or truncation argument). Because the posterior is learned from data and the system parameters are random, mass on destabilizing controllers for a positive-measure set of systems would render the expectation undefined, making the bound formally inapplicable.

Authors: We thank the referee for this insightful comment. Our derivation of the PAC-Bayes bound assumes that the expected cost is finite, which is ensured by our choice of prior distribution over controllers that are stabilizing with respect to the distribution of system parameters. Because the posterior is absolutely continuous w.r.t. the prior, this property carries over to the posterior, guaranteeing that E[cost] < ∞ almost surely. Nevertheless, we agree that an explicit statement would improve clarity. In the revised manuscript, we will add a supporting lemma in §3.2 establishing the finite moment condition under our assumptions, possibly using a truncation argument for the cost function. This revision will be made. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation applies standard PAC-Bayes to controller performance with independent stability considerations.

full rationale

The paper derives a data-dependent high-probability bound on controller performance for linear systems with parameters drawn from an unknown distribution, extending prior PAC-Bayes results to unbounded quadratic costs. No quoted equations or steps reduce the bound to a fitted quantity by construction, nor does any self-citation chain serve as the sole justification for a uniqueness claim or ansatz. The central result relies on PAC-Bayes inequalities applied to the expected cost under the posterior, which is an independent application rather than a renaming or self-definition. The provided abstract and skeptic notes raise a potential integrability question for the unbounded cost, but this is a correctness or proof-completeness issue rather than circularity, as no derivation step is shown to be tautological with its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the modeling assumption that system parameters are drawn from a fixed but unknown distribution; no free parameters, invented entities, or additional axioms are mentioned in the abstract.

axioms (1)

domain assumption System parameters are drawn from a fixed but unknown distribution
Stated in the abstract as the setting for which the PAC-Bayes bound is derived.

pith-pipeline@v0.9.0 · 5631 in / 1200 out tokens · 28101 ms · 2026-05-22T10:37:38.456054+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller... our bound holds for unbounded quadratic cost.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Adaptive linear quadratic Gaussian control: The cost-biased approach revisited , author=. SIAM J. Control Optim. , volume=

work page
[2]

IEEE Trans

Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Trans. Autom. Control , volume=

work page
[3]

IEEE Trans

Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Trans. Autom. Control , volume=

work page
[4]

Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Adv. Neural Inf. Process. Syst. , volume=

work page
[5]

IEEE Trans

Learning robust data-based LQG controllers from noisy data , author=. IEEE Trans. Autom. Control , volume=

work page
[6]

Adaptive dual control of discrete-time LQG problems with unknown-but-bounded parameter , author=. Asian J. Control , volume=

work page
[7]

Energy AI , volume=

Deep reinforcement learning for home energy management system control , author=. Energy AI , volume=

work page
[8]

Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=

work page
[9]

User-friendly introduction to PAC-Bayes bounds , author=. Found. Trends Mach. Learn. , volume=

work page
[10]

PAC-Bayesian bounds based on the R

B. PAC-Bayesian bounds based on the R. Proc. AISTATS , pages=

work page
[11]

PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Proc. CoRL , pages=

work page
[12]

A PAC-Bayesian framework for optimal control with stability guarantees , author=. Proc. IEEE CDC , pages=

work page
[13]

Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Proc. AISTATS , pages=

work page
[14]

Grant, Michael and Boyd, Stephen , title=

work page
[15]

2026 , eprint=

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems , author=. 2026 , eprint=

work page 2026
[16]

Optimal adaptive control of an LQG system , author=. Proc. IEEE CDC , volume=

work page
[17]

Sub-Gaussian random variables , author=. Ukr. Math. J. , volume=

work page
[18]

2007 , publisher=

Optimal Control: Linear Quadratic Methods , author=. 2007 , publisher=

work page 2007
[19]

LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. Proc. IEEE ICRA , pages=

work page
[20]

IEEE Access , volume=

Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian control , author=. IEEE Access , volume=

work page
[21]

LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. Int. J. Robot. Res. , volume=

work page
[22]

Learning latent representations to co-adapt to humans , author=. Auton. Robots , volume=

work page
[23]

arXiv preprint arXiv:2409.00536 , year=

Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=

work page arXiv

[1] [1]

Adaptive linear quadratic Gaussian control: The cost-biased approach revisited , author=. SIAM J. Control Optim. , volume=

work page

[2] [2]

IEEE Trans

Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Trans. Autom. Control , volume=

work page

[3] [3]

IEEE Trans

Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Trans. Autom. Control , volume=

work page

[4] [4]

Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Adv. Neural Inf. Process. Syst. , volume=

work page

[5] [5]

IEEE Trans

Learning robust data-based LQG controllers from noisy data , author=. IEEE Trans. Autom. Control , volume=

work page

[6] [6]

Adaptive dual control of discrete-time LQG problems with unknown-but-bounded parameter , author=. Asian J. Control , volume=

work page

[7] [7]

Energy AI , volume=

Deep reinforcement learning for home energy management system control , author=. Energy AI , volume=

work page

[8] [8]

Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=

work page

[9] [9]

User-friendly introduction to PAC-Bayes bounds , author=. Found. Trends Mach. Learn. , volume=

work page

[10] [10]

PAC-Bayesian bounds based on the R

B. PAC-Bayesian bounds based on the R. Proc. AISTATS , pages=

work page

[11] [11]

PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Proc. CoRL , pages=

work page

[12] [12]

A PAC-Bayesian framework for optimal control with stability guarantees , author=. Proc. IEEE CDC , pages=

work page

[13] [13]

Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Proc. AISTATS , pages=

work page

[14] [14]

Grant, Michael and Boyd, Stephen , title=

work page

[15] [15]

2026 , eprint=

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems , author=. 2026 , eprint=

work page 2026

[16] [16]

Optimal adaptive control of an LQG system , author=. Proc. IEEE CDC , volume=

work page

[17] [17]

Sub-Gaussian random variables , author=. Ukr. Math. J. , volume=

work page

[18] [18]

2007 , publisher=

Optimal Control: Linear Quadratic Methods , author=. 2007 , publisher=

work page 2007

[19] [19]

LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. Proc. IEEE ICRA , pages=

work page

[20] [20]

IEEE Access , volume=

Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian control , author=. IEEE Access , volume=

work page

[21] [21]

LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. Int. J. Robot. Res. , volume=

work page

[22] [22]

Learning latent representations to co-adapt to humans , author=. Auton. Robots , volume=

work page

[23] [23]

arXiv preprint arXiv:2409.00536 , year=

Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=

work page arXiv