Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters
Pith reviewed 2026-05-17 23:54 UTC · model grok-4.3
The pith
Certainty-equivalent LQR paired with least-mean-squares estimation delivers finite-gain ℓ² stability for linear systems whose parameters change over time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The closed-loop interconnection of the unknown linear plant, the LMS parameter estimator, and the certainty-equivalent LQR feedback is finite-gain ℓ²-stable in the presence of unknown disturbances and time-varying parametric uncertainties.
What carries the argument
The LMS update rule together with the certainty-equivalent LQR gain, which together permit a direct small-gain or interconnection stability argument.
If this is right
- Stability holds without persistence-of-excitation requirements on the regressor.
- No explicit upper bound on the speed of parameter variation is required.
- The same modular pipeline applies directly to systems driven by unknown disturbances.
- Implementation cost remains that of two off-the-shelf algorithms.
Where Pith is reading between the lines
- The same interconnection idea may apply to other estimator-controller pairs that admit a similar small-gain decomposition.
- The result suggests that certain adaptive schemes can tolerate drifting parameters at rates comparable to the closed-loop bandwidth.
- Numerical verification on the quadrotor example indicates the bound is not overly conservative for mildly nonlinear plants.
Load-bearing premise
The linear plant structure and the algebraic properties of the LMS update and LQR gain suffice to close the stability argument without extra excitation or variation-rate conditions.
What would settle it
A concrete counter-example consisting of a linear system, bounded disturbances, and a specific time-varying parameter trajectory in which the closed-loop ℓ² gain becomes unbounded.
Figures
read the original abstract
Standard model-based control design deteriorates when the system dynamics change during operation. To overcome this challenge, online and adaptive methods have been proposed in the literature. In this work, we consider the class of discrete-time linear systems with unknown time-varying parameters. We propose a simple, modular, and computationally tractable approach by combining two classical and well-known building blocks from estimation and control: the least mean square filter and the certainty-equivalent linear quadratic regulator. Despite both building blocks being simple and off-the-shelf, our analysis shows that they can be seamlessly combined to a powerful pipeline with stability guarantees. Namely, finite-gain $\ell^2$-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties. Real-world applicability of the proposed algorithm is showcased by simulations carried out on a nonlinear planar quadrotor.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes combining the least mean squares (LMS) estimator with a certainty-equivalent LQR controller for discrete-time linear systems whose parameters vary arbitrarily with time. It claims to prove finite-gain ℓ²-stability of the closed-loop interconnection (plant + LMS estimator + time-varying CE-LQR gain) in the presence of disturbances, without persistence of excitation or any bound on the rate of parameter variation. The result is illustrated by numerical simulations on a nonlinear planar quadrotor.
Significance. If the stability claim is established without implicit rate bounds or post-hoc restrictions, the work would supply a modular, computationally light adaptive scheme for a difficult class of time-varying systems. The use of two classical, off-the-shelf blocks (LMS and LQR) with an interconnection argument is attractive for applications where simplicity and real-time implementability matter.
major comments (1)
- [Stability analysis (proof of the main theorem)] The finite-gain ℓ²-stability result for arbitrary parameter trajectories is the central claim. In the derivation of the estimation-error system, the LMS update injects the term φ(t)Δθ(t) (regressor times parameter jump) as an exogenous input. The subsequent small-gain or Lyapunov interconnection argument must therefore absorb this term while keeping the overall ℓ² gain finite and independent of sup|Δθ(t)|. No equation in the stability section supplies an a-priori bound on the variation that is later removed; the constants appearing in the final ℓ² inequality therefore appear to retain dependence on the unknown rate. Please exhibit the precise steps (around the definition of the error dynamics and the application of the small-gain theorem) that eliminate this dependence.
minor comments (2)
- [Abstract] The abstract states that stability holds 'despite the presence of unknown disturbances and time-varying parametric uncertainties' but does not explicitly list the standing assumptions on the regressor or on the LQR cost matrices; a single sentence clarifying these would help readers.
- [Numerical example] The quadrotor example is nonlinear while the theory is stated for linear plants. A brief remark on the validity of the linear approximation or on robustness margins would strengthen the practical section.
Simulated Author's Rebuttal
We thank the referee for their careful reading and insightful comments on the manuscript. We address the major comment on the stability analysis below, providing the requested clarification on the proof steps while committing to improve the exposition in the revised version.
read point-by-point responses
-
Referee: The finite-gain ℓ²-stability result for arbitrary parameter trajectories is the central claim. In the derivation of the estimation-error system, the LMS update injects the term φ(t)Δθ(t) (regressor times parameter jump) as an exogenous input. The subsequent small-gain or Lyapunov interconnection argument must therefore absorb this term while keeping the overall ℓ² gain finite and independent of sup|Δθ(t)|. No equation in the stability section supplies an a-priori bound on the variation that is later removed; the constants appearing in the final ℓ² inequality therefore appear to retain dependence on the unknown rate. Please exhibit the precise steps (around the definition of the error dynamics and the application of the small-gain theorem) that eliminate this dependence.
Authors: We appreciate the referee highlighting the need for explicit steps in the stability section. In the manuscript (Section 4, equations (14)-(17)), the estimation-error dynamics are written as e(t+1) = (I - μ φ(t)φ(t)^T) e(t) - Δθ(t) + d(t), where d(t) collects the effect of the external disturbance through the output equation. The term φ(t)Δθ(t) does not appear directly as an exogenous input to the closed-loop; instead, because the control law is u(t) = K(hat θ(t)) x(t), the plant equation is rewritten in terms of the estimation error, yielding an augmented interconnection consisting of three blocks: (i) the state dynamics driven by the time-varying true parameters, (ii) the LMS estimator, and (iii) the certainty-equivalent gain. The small-gain theorem is applied to this interconnection after equation (28). The key algebraic step is to bound the composite operator from the external disturbance d to the regulated output by first establishing that the nominal (frozen-parameter) closed-loop map is ℓ²-stable with gain independent of θ, then showing that the perturbation operator induced by Δθ has a gain that factors through the LMS contraction (whose spectral radius is strictly less than one for small μ, uniformly in θ) and the LQR stability margin (which holds for all estimates inside a compact set that the estimator remains in). Because the variation Δθ enters both the plant and estimator blocks symmetrically, its contribution is absorbed into the loop gain without requiring an a-priori bound on sup|Δθ(t)|; the final ℓ² inequality (Theorem 1) therefore contains only constants that depend on the system matrices, the chosen μ, and the LQR weighting matrices, but not on the trajectory θ(·). We agree that these intermediate bounds deserve a dedicated lemma and will insert it in revision: yes
Circularity Check
No circularity: stability proof is self-contained
full rationale
The manuscript derives finite-gain ℓ²-stability for the closed-loop system (plant + LMS estimator + certainty-equivalent LQR) via an interconnection argument that relies on the explicit error dynamics of the LMS update and the properties of the time-varying feedback gain. No step reduces the claimed stability bound to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose validity is presupposed by the present work. The derivation treats parameter variation and disturbances as exogenous inputs and establishes an ℓ² gain bound directly from the system equations without smuggling in the target result by construction. This is the normal case of an independent Lyapunov or small-gain analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The plant is a discrete-time linear system subject to bounded disturbances and unknown time-varying parameters.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
finite-gain ℓ²-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
V(x, θ) = x⊤P(θ)x ... discrete-time algebraic Riccati equation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
A Bayesian Perspective on the Data-Driven LQR
Bayesian ddLQR adds posterior uncertainty to the design, decomposing expected cost into certainty-equivalence plus variance terms, proving indirect-direct equivalence, and producing a data-length-independent SDP.
Reference graph
Works this paper leans on
-
[1]
Adaptive economic model predictive control: Performance guarantees for nonlinear systems
Maximilian Degner, Raffaele Soloperto, Melanie N Zeilinger, John Lygeros, and Johannes K ¨ohler. Adaptive economic model predictive control: Performance guarantees for nonlinear systems. arXiv preprint arXiv:2412.13046,
-
[2]
Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,
Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, and Mark Cannon. Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,
work page 2020
-
[3]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tuto- rial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643,
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[4]
Sahel Vahedi Noori and Maryam Babazadeh. A data-ensemble-based approach for sample-efficient lq control of linear time-varying systems.arXiv preprint arXiv:2506.23716,
-
[5]
Policy Gradient Adaptive Control for the
Feiran Zhao, Alessandro Chiuso, and Florian D ¨orfler. Policy gradient adaptive control for the lqr: Indirect and direct approaches.arXiv preprint arXiv:2505.03706,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.