Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

Florian D\"orfler; Johannes K\"ohler; Marcell Bartos; Melanie N. Zeilinger

arxiv: 2511.08236 · v2 · submitted 2025-11-11 · 📡 eess.SY · cs.SY· math.OC

Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters

Marcell Bartos , Johannes K\"ohler , Florian D\"orfler , Melanie N. Zeilinger This is my paper

Pith reviewed 2026-05-17 23:54 UTC · model grok-4.3

classification 📡 eess.SY cs.SYmath.OC

keywords adaptive LQRleast mean squarestime-varying parametersfinite-gain stabilitylinear systemscertainty equivalencedisturbance rejection

0 comments

The pith

Certainty-equivalent LQR paired with least-mean-squares estimation delivers finite-gain ℓ² stability for linear systems whose parameters change over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that discrete-time linear systems with unknown time-varying parameters can be controlled by combining a standard least-mean-squares estimator with a certainty-equivalent LQR controller. The resulting closed loop remains finite-gain ℓ² stable even when unknown disturbances act and no persistence-of-excitation or rate-of-variation bounds are imposed. Because both building blocks are classical and modular, the approach stays computationally light and easy to implement. A reader would care because it supplies explicit stability guarantees for a broad class of adaptive problems without requiring specialized tuning or extra assumptions.

Core claim

The closed-loop interconnection of the unknown linear plant, the LMS parameter estimator, and the certainty-equivalent LQR feedback is finite-gain ℓ²-stable in the presence of unknown disturbances and time-varying parametric uncertainties.

What carries the argument

The LMS update rule together with the certainty-equivalent LQR gain, which together permit a direct small-gain or interconnection stability argument.

If this is right

Stability holds without persistence-of-excitation requirements on the regressor.
No explicit upper bound on the speed of parameter variation is required.
The same modular pipeline applies directly to systems driven by unknown disturbances.
Implementation cost remains that of two off-the-shelf algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same interconnection idea may apply to other estimator-controller pairs that admit a similar small-gain decomposition.
The result suggests that certain adaptive schemes can tolerate drifting parameters at rates comparable to the closed-loop bandwidth.
Numerical verification on the quadrotor example indicates the bound is not overly conservative for mildly nonlinear plants.

Load-bearing premise

The linear plant structure and the algebraic properties of the LMS update and LQR gain suffice to close the stability argument without extra excitation or variation-rate conditions.

What would settle it

A concrete counter-example consisting of a linear system, bounded disturbances, and a specific time-varying parameter trajectory in which the closed-loop ℓ² gain becomes unbounded.

Figures

Figures reproduced from arXiv: 2511.08236 by Florian D\"orfler, Johannes K\"ohler, Marcell Bartos, Melanie N. Zeilinger.

**Figure 2.** Figure 2: Closed-loop position trajectories of the proposed adaptive controller (Alg. 1) (solid) and a non-adaptive baseline (diverging dashed, see zoomed inset on the right) applied to the nonlinear planar quadrotor (16) for Cases (a) and (b) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of the proposed adaptive controller (Alg. 1) with (dotted) and without (dashed) additional exploratory noise ϵk in terms of convergence of the estimates to the true values (solid) on the linearized dynamics for Cases (a) and (b). 5. Conclusion and Outlook In this work, we proposed an adaptive control scheme for the control of discrete-time linear systems with unknown time-varying parameters. The… view at source ↗

read the original abstract

Standard model-based control design deteriorates when the system dynamics change during operation. To overcome this challenge, online and adaptive methods have been proposed in the literature. In this work, we consider the class of discrete-time linear systems with unknown time-varying parameters. We propose a simple, modular, and computationally tractable approach by combining two classical and well-known building blocks from estimation and control: the least mean square filter and the certainty-equivalent linear quadratic regulator. Despite both building blocks being simple and off-the-shelf, our analysis shows that they can be seamlessly combined to a powerful pipeline with stability guarantees. Namely, finite-gain $\ell^2$-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties. Real-world applicability of the proposed algorithm is showcased by simulations carried out on a nonlinear planar quadrotor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves finite-gain l2 stability for LMS plus certainty-equivalent LQR on linear systems with drifting parameters, but the bound likely retains some dependence on variation speed.

read the letter

The main takeaway is that this paper shows how to combine the least-mean-squares estimator with a certainty-equivalent LQR controller to get finite-gain l2 stability for discrete-time linear systems with unknown time-varying parameters and disturbances. They keep the whole thing modular and simple, using two classical blocks without extra persistence-of-excitation requirements or explicit rate bounds on the parameter changes. The quadrotor simulation is a reasonable check that the method still behaves on a nonlinear plant. That combination and the stability claim are the actual new pieces relative to earlier adaptive LQR work. The analysis appears to rest on an interconnection argument that folds the estimation error into the loop and produces an overall l2 gain. If the math holds without hidden fitting steps, that is useful engineering content. The soft spot is exactly the one the stress test flags. The LMS error is driven by the regressor times the parameter jump at each step. Treating the jump as an exogenous l2 input works only if the variation itself does not grow too fast; otherwise the constants in the final inequality can depend on the size or frequency of those jumps. The manuscript does not state an a-priori rate bound, yet the derivation does not appear to remove all dependence on it either. This is a moderate rather than fatal issue for applications with moderate drift, but it narrows the scope of the “arbitrary time-varying” claim. The paper is aimed at control engineers who need a straightforward adaptive law for drifting linear systems such as vehicles or process plants. A reader who wants a clean theoretical guarantee paired with an easy-to-implement method will get value from it. I would send it to peer review because the core interconnection result is technically non-trivial and the presentation is direct enough to benefit from expert feedback on the exact constants.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes combining the least mean squares (LMS) estimator with a certainty-equivalent LQR controller for discrete-time linear systems whose parameters vary arbitrarily with time. It claims to prove finite-gain ℓ²-stability of the closed-loop interconnection (plant + LMS estimator + time-varying CE-LQR gain) in the presence of disturbances, without persistence of excitation or any bound on the rate of parameter variation. The result is illustrated by numerical simulations on a nonlinear planar quadrotor.

Significance. If the stability claim is established without implicit rate bounds or post-hoc restrictions, the work would supply a modular, computationally light adaptive scheme for a difficult class of time-varying systems. The use of two classical, off-the-shelf blocks (LMS and LQR) with an interconnection argument is attractive for applications where simplicity and real-time implementability matter.

major comments (1)

[Stability analysis (proof of the main theorem)] The finite-gain ℓ²-stability result for arbitrary parameter trajectories is the central claim. In the derivation of the estimation-error system, the LMS update injects the term φ(t)Δθ(t) (regressor times parameter jump) as an exogenous input. The subsequent small-gain or Lyapunov interconnection argument must therefore absorb this term while keeping the overall ℓ² gain finite and independent of sup|Δθ(t)|. No equation in the stability section supplies an a-priori bound on the variation that is later removed; the constants appearing in the final ℓ² inequality therefore appear to retain dependence on the unknown rate. Please exhibit the precise steps (around the definition of the error dynamics and the application of the small-gain theorem) that eliminate this dependence.

minor comments (2)

[Abstract] The abstract states that stability holds 'despite the presence of unknown disturbances and time-varying parametric uncertainties' but does not explicitly list the standing assumptions on the regressor or on the LQR cost matrices; a single sentence clarifying these would help readers.
[Numerical example] The quadrotor example is nonlinear while the theory is stated for linear plants. A brief remark on the validity of the linear approximation or on robustness margins would strengthen the practical section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and insightful comments on the manuscript. We address the major comment on the stability analysis below, providing the requested clarification on the proof steps while committing to improve the exposition in the revised version.

read point-by-point responses

Referee: The finite-gain ℓ²-stability result for arbitrary parameter trajectories is the central claim. In the derivation of the estimation-error system, the LMS update injects the term φ(t)Δθ(t) (regressor times parameter jump) as an exogenous input. The subsequent small-gain or Lyapunov interconnection argument must therefore absorb this term while keeping the overall ℓ² gain finite and independent of sup|Δθ(t)|. No equation in the stability section supplies an a-priori bound on the variation that is later removed; the constants appearing in the final ℓ² inequality therefore appear to retain dependence on the unknown rate. Please exhibit the precise steps (around the definition of the error dynamics and the application of the small-gain theorem) that eliminate this dependence.

Authors: We appreciate the referee highlighting the need for explicit steps in the stability section. In the manuscript (Section 4, equations (14)-(17)), the estimation-error dynamics are written as e(t+1) = (I - μ φ(t)φ(t)^T) e(t) - Δθ(t) + d(t), where d(t) collects the effect of the external disturbance through the output equation. The term φ(t)Δθ(t) does not appear directly as an exogenous input to the closed-loop; instead, because the control law is u(t) = K(hat θ(t)) x(t), the plant equation is rewritten in terms of the estimation error, yielding an augmented interconnection consisting of three blocks: (i) the state dynamics driven by the time-varying true parameters, (ii) the LMS estimator, and (iii) the certainty-equivalent gain. The small-gain theorem is applied to this interconnection after equation (28). The key algebraic step is to bound the composite operator from the external disturbance d to the regulated output by first establishing that the nominal (frozen-parameter) closed-loop map is ℓ²-stable with gain independent of θ, then showing that the perturbation operator induced by Δθ has a gain that factors through the LMS contraction (whose spectral radius is strictly less than one for small μ, uniformly in θ) and the LQR stability margin (which holds for all estimates inside a compact set that the estimator remains in). Because the variation Δθ enters both the plant and estimator blocks symmetrically, its contribution is absorbed into the loop gain without requiring an a-priori bound on sup|Δθ(t)|; the final ℓ² inequality (Theorem 1) therefore contains only constants that depend on the system matrices, the chosen μ, and the LQR weighting matrices, but not on the trajectory θ(·). We agree that these intermediate bounds deserve a dedicated lemma and will insert it in revision: yes

Circularity Check

0 steps flagged

No circularity: stability proof is self-contained

full rationale

The manuscript derives finite-gain ℓ²-stability for the closed-loop system (plant + LMS estimator + certainty-equivalent LQR) via an interconnection argument that relies on the explicit error dynamics of the LMS update and the properties of the time-varying feedback gain. No step reduces the claimed stability bound to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose validity is presupposed by the present work. The derivation treats parameter variation and disturbances as exogenous inputs and establishes an ℓ² gain bound directly from the system equations without smuggling in the target result by construction. This is the normal case of an independent Lyapunov or small-gain analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on standard domain assumptions for linear discrete-time systems and bounded disturbances; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption The plant is a discrete-time linear system subject to bounded disturbances and unknown time-varying parameters.
Explicitly stated as the class of systems considered.

pith-pipeline@v0.9.0 · 5478 in / 1239 out tokens · 36597 ms · 2026-05-17T23:54:15.958377+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

finite-gain ℓ²-stability of the closed-loop interconnection of the unknown system, the parameter estimator, and the controller is proven, despite the presence of unknown disturbances and time-varying parametric uncertainties
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

V(x, θ) = x⊤P(θ)x ... discrete-time algebraic Riccati equation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Bayesian Perspective on the Data-Driven LQR
math.OC 2026-04 unverdicted novelty 6.0

Bayesian ddLQR adds posterior uncertainty to the design, decomposing expected cost into certainty-equivalence plus variance terms, proving indirect-direct equivalence, and producing a data-length-independent SDP.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Adaptive economic model predictive control: Performance guarantees for nonlinear systems

Maximilian Degner, Raffaele Soloperto, Melanie N Zeilinger, John Lygeros, and Johannes K ¨ohler. Adaptive economic model predictive control: Performance guarantees for nonlinear systems. arXiv preprint arXiv:2412.13046,

work page arXiv
[2]

Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,

Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, and Mark Cannon. Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,

work page 2020
[3]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tuto- rial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643,

work page internal anchor Pith review Pith/arXiv arXiv 2005
[4]

A data-ensemble-based approach for sample-efficient lq control of linear time-varying systems.arXiv preprint arXiv:2506.23716,

Sahel Vahedi Noori and Maryam Babazadeh. A data-ensemble-based approach for sample-efficient lq control of linear time-varying systems.arXiv preprint arXiv:2506.23716,

work page arXiv
[5]

Policy Gradient Adaptive Control for the

Feiran Zhao, Alessandro Chiuso, and Florian D ¨orfler. Policy gradient adaptive control for the lqr: Indirect and direct approaches.arXiv preprint arXiv:2505.03706,

work page arXiv

[1] [1]

Adaptive economic model predictive control: Performance guarantees for nonlinear systems

Maximilian Degner, Raffaele Soloperto, Melanie N Zeilinger, John Lygeros, and Johannes K ¨ohler. Adaptive economic model predictive control: Performance guarantees for nonlinear systems. arXiv preprint arXiv:2412.13046,

work page arXiv

[2] [2]

Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,

Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, and Mark Cannon. Infinite-horizon differentiable model predictive control.Proceedings of ICLR 2020,

work page 2020

[3] [3]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tuto- rial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643,

work page internal anchor Pith review Pith/arXiv arXiv 2005

[4] [4]

A data-ensemble-based approach for sample-efficient lq control of linear time-varying systems.arXiv preprint arXiv:2506.23716,

Sahel Vahedi Noori and Maryam Babazadeh. A data-ensemble-based approach for sample-efficient lq control of linear time-varying systems.arXiv preprint arXiv:2506.23716,

work page arXiv

[5] [5]

Policy Gradient Adaptive Control for the

Feiran Zhao, Alessandro Chiuso, and Florian D ¨orfler. Policy gradient adaptive control for the lqr: Indirect and direct approaches.arXiv preprint arXiv:2505.03706,

work page arXiv