Data-Driven Linear Quadratic Control Using Output-Feedback via Non-Minimal Realization
Pith reviewed 2026-05-19 21:24 UTC · model grok-4.3
The pith
An augmented system from Kreisselmeier's adaptive filter recovers the optimal state-feedback gain for the original plant in data-driven LQ control.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The optimal gain of the augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, a data-driven value iteration algorithm is developed within the adaptive dynamic programming framework.
What carries the argument
The augmented system obtained by applying Kreisselmeier's adaptive filter, which preserves the input-output response while making state trajectories accessible for the non-minimal realization.
If this is right
- The resulting controller is implementable directly from input-output data without requiring state measurements.
- The data-driven value iteration converges to the optimal LQ solution for the original unknown system.
- The approach applies to continuous-time systems with unknown matrices.
- Performance is validated through simulations on example systems.
Where Pith is reading between the lines
- This framework might extend to other optimal control problems like LQG or H-infinity by similar augmentation.
- Testing on a low-order plant with known optimal gain would confirm the recovery property numerically.
- Future work could explore discrete-time versions or robustness to noise in the data.
Load-bearing premise
Kreisselmeier's adaptive filter admits an observer interpretation that leads to an augmented system preserving the input-output response of the realization and providing accessible state trajectories.
What would settle it
Apply the data-driven algorithm to a known linear system, compute the gain from the augmented system, and check if it equals the analytically known optimal state-feedback gain for the original plant.
Figures
read the original abstract
In this paper, we investigate a continuous-time linear quadratic control problem for systems with unknown matrices, where only input-output data are available. We propose an output-feedback learning framework based on a canonical nonminimal realization constructed through Kreisselmeier's adaptive filter. The filter admits an observer interpretation, which leads to an augmented system that preserves the input-output response of the realization and provides accessible state trajectories. We show that the optimal gain of this augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, we develop a data-driven value iteration algorithm within the adaptive dynamic programming framework. The resulting controller is implementable from input-output data, and its performance is validated via simulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a data-driven output-feedback framework for continuous-time linear-quadratic regulation of unknown plants. Using only input-output trajectories, it constructs a canonical non-minimal realization via Kreisselmeier’s adaptive filter, interprets the filter as an observer to obtain an augmented state-space system that preserves the original input-output map, proves that the optimal LQR gain computed on the augmented pair recovers the optimal gain of the non-minimal realization (and hence the original plant), and derives a value-iteration algorithm that exploits the known structure of the augmented input matrix to learn the gain from data.
Significance. If the recovery relation is rigorously established, the work supplies a concrete route from raw I/O data to an optimal output-feedback controller without requiring state measurements or explicit system identification. The explicit link between the augmented Riccati solution and the original LQ optimum, together with the data-driven ADP implementation, would constitute a useful addition to the literature on non-minimal realizations and adaptive dynamic programming.
major comments (2)
- [§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.
- [§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.
minor comments (2)
- [Figure 1] Figure 1 (block diagram of the augmented system): the signal-flow arrows between the filter states and the original plant outputs should be labeled with the exact matrix dimensions to avoid ambiguity.
- [Notation] Notation: the symbol for the augmented state vector is occasionally overloaded with the original state; a distinct symbol (e.g., x_aug) would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the constructive comments, which help strengthen the manuscript. We address each major comment below and will incorporate the suggested clarifications in the revised version.
read point-by-point responses
-
Referee: [§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.
Authors: We appreciate this careful remark. The proof in Section 3 proceeds by first establishing that the augmented system is a non-minimal realization that exactly reproduces the input-output map of the original plant, then showing that the optimal LQR gain for the augmented pair (A_aug, B_aug) recovers the optimal gain for the canonical non-minimal realization because the filter states are linearly related to past inputs and outputs. The quadratic cost equivalence follows from the fact that the additional modes are unobservable from the regulated output and are driven by the same input that appears in the original cost; however, we acknowledge that the current write-up relies on this structural property without spelling out an explicit invariance of the quadratic form. In the revision we will insert a short lemma that constructs the similarity transformation between the two quadratic forms and verifies that the filter-induced contribution is identically zero under the optimal policy, thereby making the cost equivalence fully rigorous. revision: yes
-
Referee: [§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.
Authors: We agree that the current description of the data-driven value iteration in Section 4 would benefit from an explicit persistence-of-excitation (PE) assumption and a convergence statement. In the revised manuscript we will add a precise PE condition on the collected input-output trajectories (requiring that the regressor matrix formed by the filtered signals and inputs has full rank) together with a theorem that establishes convergence of the iterated gain to the optimal augmented gain, including a finite-data error bound that depends on the level of excitation and the number of samples. These additions will be placed immediately after the algorithm statement. revision: yes
Circularity Check
No circularity: derivation relies on standard system-theoretic equivalence and ADP structures
full rationale
The paper derives an explicit recovery of the optimal gain for the augmented system from the canonical non-minimal realization via the observer interpretation of Kreisselmeier's filter, which preserves the input-output map by construction of the filter dynamics. This equivalence is established through algebraic properties of the augmented (A_aug, B_aug) pair and the quadratic cost, independent of any fitted parameters or self-referential definitions. The subsequent data-driven value iteration applies standard ADP to the accessible augmented states, without renaming or smuggling prior results as new predictions. No load-bearing step reduces to its inputs by definition; the central claim is a verifiable first-principles result in linear systems theory.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The plant is a continuous-time linear time-invariant system with unknown matrices.
- domain assumption Kreisselmeier's adaptive filter admits an observer interpretation yielding an augmented system that preserves input-output response and supplies accessible states.
Reference graph
Works this paper leans on
-
[1]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018
work page 2018
-
[2]
Beyond regression: New tools for prediction and analysis in the behavioral sciences,
P. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,”PhD thesis, Committee on Applied Mathe- matics, Harvard University, Cambridge, MA, 1974
work page 1974
-
[3]
Deep reinforcement learning for autonomous driving: A survey,
B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. P ´erez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021
work page 2021
-
[4]
Reinforcement learning in robotics: A survey,
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013
work page 2013
-
[5]
Data-driven control based on the behavioral approach from theory to applications in power systems,
I. Markovsky, L. Huang, and F. Dorfler, “Data-driven control based on the behavioral approach from theory to applications in power systems,” IEEE Control Systems Magazine, vol. 43, no. 5, pp. 28–68, 2023
work page 2023
-
[6]
A tour of reinforcement learning: The view from continuous control,
B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019
work page 2019
-
[7]
Reinforcement learning and adaptive dynamic programming for feedback control,
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009
work page 2009
-
[8]
Y . Jiang and Z.-P. Jiang,Robust Adaptive Dynamic Programming. Hoboken, NJ, USA: Wiley-IEEE Press, 2017
work page 2017
-
[9]
F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal Control. John Wiley & Sons, 2012
work page 2012
-
[10]
On an iterative technique for Riccati equation compu- tations,
D. Kleinman, “On an iterative technique for Riccati equation compu- tations,”IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968
work page 1968
-
[11]
Adaptive optimal control for continuous-time linear systems based on policy iteration,
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,”Automatica, vol. 45, no. 2, pp. 477–484, 2009
work page 2009
-
[12]
Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012
work page 2012
-
[13]
Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,
T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016
work page 2016
-
[14]
Global convergence of policy gradient methods for the linear quadratic regulator,
M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning. PMLR, 2018, pp. 1467–1476
work page 2018
-
[15]
H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021
work page 2021
-
[16]
S. A. A. Rizvi and Z. Lin,Output feedback reinforcement learning control for linear systems. Springer, 2023
work page 2023
-
[17]
Optimizing static linear feedback: Gradient method,
I. Fatkhullin and B. Polyak, “Optimizing static linear feedback: Gradient method,”SIAM Journal on Control and Optimization, vol. 59, no. 5, pp. 3887–3911, 2021
work page 2021
-
[18]
F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,”IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010
work page 2010
-
[19]
Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,
S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,”IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 5, pp. 1523–1536, 2018
work page 2018
-
[20]
On the optimization landscape of dynamic output feedback linear quadratic control,
J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2023
work page 2023
-
[21]
L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,”IEEE Transactions on Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2014
work page 2014
-
[22]
S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,”IEEE Transactions on Cybernetics, vol. 50, no. 11, pp. 4670– 4679, 2019
work page 2019
-
[23]
C. Chen, L. Xie, K. Xie, F. L. Lewis, and S. Xie, “Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,”Automatica, vol. 146, p. 110581, 2022
work page 2022
-
[24]
A new approach to the data-driven output- based LQR problem of continuous-time linear systems,
L. Lin, H. Lin, and J. Huang, “A new approach to the data-driven output- based LQR problem of continuous-time linear systems,”arXiv preprint arXiv:2509.18819, 2025
-
[25]
Data-driven control of continuous-time lti systems via non-minimal realizations,
A. Bosso, M. Borghesi, A. Iannelli, G. Notarstefano, and A. R. Teel, “Data-driven control of continuous-time lti systems via non-minimal realizations,”IEEE Transactions on Automatic Control, pp. 1–16, 2026, early access
work page 2026
-
[26]
Data-driven stabilization of continuous-time LTI systems from noisy input-output data,
A. Bosso, M. Borghesi, A. Iannelli, B. Yi, and G. Notarste- fano, “Data-driven stabilization of continuous-time LTI systems from noisy input-output data,”European Control Conference, 2026, see ArXiv:2511.11417
-
[27]
Input-output data- driven stabilization of continuous-time linear MIMO systems,
H. Gao, A. Bosso, L. Wang, D. Saussi ´e, and B. Yi, “Input-output data- driven stabilization of continuous-time linear MIMO systems,”European Control Conference, 2026, (see arXiv:2511.06524)
-
[28]
The generation of adaptive law structures for glob- ally convergent adaptive observers,
G. Kreisselmeier, “The generation of adaptive law structures for glob- ally convergent adaptive observers,”IEEE Transactions on Automatic Control, vol. 24, no. 3, pp. 510–513, 1979
work page 1979
-
[29]
Transverse exponential stability and applications,
V . Andrieu, B. Jayawardhana, and L. Praly, “Transverse exponential stability and applications,”IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3396–3411, 2016
work page 2016
-
[30]
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,”Automatica, vol. 43, no. 3, pp. 473–481, 2007
work page 2007
-
[31]
K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011
work page 2011
-
[32]
B. L. Stevens, F. L. Lewis, and E. N. Johnson,Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems. John Wiley & Sons, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.