pith. sign in

arxiv: 2605.16752 · v1 · pith:GZEZPV4Hnew · submitted 2026-05-16 · 🧮 math.OC

Data-Driven Linear Quadratic Control Using Output-Feedback via Non-Minimal Realization

Pith reviewed 2026-05-19 21:24 UTC · model grok-4.3

classification 🧮 math.OC
keywords data-driven controllinear quadratic controloutput feedbacknon-minimal realizationadaptive filteradaptive dynamic programmingvalue iteration
0
0 comments X

The pith

An augmented system from Kreisselmeier's adaptive filter recovers the optimal state-feedback gain for the original plant in data-driven LQ control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method for solving the linear quadratic control problem when the system matrices are unknown and only input and output data are available. It constructs a non-minimal realization using Kreisselmeier's adaptive filter, which is interpreted as an observer to create an augmented system with accessible states that match the original input-output behavior. The key result is that the optimal gain computed for this augmented system directly gives the optimal state-feedback controller for the underlying plant. This enables a data-driven value iteration algorithm in the adaptive dynamic programming setting to learn the controller without explicit model knowledge. A sympathetic reader would care because it bridges output-feedback control with data-driven methods for systems where full state measurement is impractical.

Core claim

The optimal gain of the augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, a data-driven value iteration algorithm is developed within the adaptive dynamic programming framework.

What carries the argument

The augmented system obtained by applying Kreisselmeier's adaptive filter, which preserves the input-output response while making state trajectories accessible for the non-minimal realization.

If this is right

  • The resulting controller is implementable directly from input-output data without requiring state measurements.
  • The data-driven value iteration converges to the optimal LQ solution for the original unknown system.
  • The approach applies to continuous-time systems with unknown matrices.
  • Performance is validated through simulations on example systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framework might extend to other optimal control problems like LQG or H-infinity by similar augmentation.
  • Testing on a low-order plant with known optimal gain would confirm the recovery property numerically.
  • Future work could explore discrete-time versions or robustness to noise in the data.

Load-bearing premise

Kreisselmeier's adaptive filter admits an observer interpretation that leads to an augmented system preserving the input-output response of the realization and providing accessible state trajectories.

What would settle it

Apply the data-driven algorithm to a known linear system, compute the gain from the augmented system, and check if it equals the analytically known optimal state-feedback gain for the original plant.

Figures

Figures reproduced from arXiv: 2605.16752 by Bowen Yi, Hai Lin, Panos J. Antsaklis, Weijian Li.

Figure 1
Figure 1. Figure 1: The proposed framework for solving Problem 1. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Convergence of Algorithm 2. (a) Trajectory of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence of the states x and Z. 0 7 14 21 28 −10 10 30 50 70 Time (sec) u=−K∗ z z u=−K∗x (a) Input signal u(t) (deg) 0 7 14 21 28 −10 10 30 50 70 Time (sec) (b) Output signal y(t) (deg/s) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The input and output trajectories of (1). [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

In this paper, we investigate a continuous-time linear quadratic control problem for systems with unknown matrices, where only input-output data are available. We propose an output-feedback learning framework based on a canonical nonminimal realization constructed through Kreisselmeier's adaptive filter. The filter admits an observer interpretation, which leads to an augmented system that preserves the input-output response of the realization and provides accessible state trajectories. We show that the optimal gain of this augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, we develop a data-driven value iteration algorithm within the adaptive dynamic programming framework. The resulting controller is implementable from input-output data, and its performance is validated via simulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a data-driven output-feedback framework for continuous-time linear-quadratic regulation of unknown plants. Using only input-output trajectories, it constructs a canonical non-minimal realization via Kreisselmeier’s adaptive filter, interprets the filter as an observer to obtain an augmented state-space system that preserves the original input-output map, proves that the optimal LQR gain computed on the augmented pair recovers the optimal gain of the non-minimal realization (and hence the original plant), and derives a value-iteration algorithm that exploits the known structure of the augmented input matrix to learn the gain from data.

Significance. If the recovery relation is rigorously established, the work supplies a concrete route from raw I/O data to an optimal output-feedback controller without requiring state measurements or explicit system identification. The explicit link between the augmented Riccati solution and the original LQ optimum, together with the data-driven ADP implementation, would constitute a useful addition to the literature on non-minimal realizations and adaptive dynamic programming.

major comments (2)
  1. [§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.
  2. [§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.
minor comments (2)
  1. [Figure 1] Figure 1 (block diagram of the augmented system): the signal-flow arrows between the filter states and the original plant outputs should be labeled with the exact matrix dimensions to avoid ambiguity.
  2. [Notation] Notation: the symbol for the augmented state vector is occasionally overloaded with the original state; a distinct symbol (e.g., x_aug) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive comments, which help strengthen the manuscript. We address each major comment below and will incorporate the suggested clarifications in the revised version.

read point-by-point responses
  1. Referee: [§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.

    Authors: We appreciate this careful remark. The proof in Section 3 proceeds by first establishing that the augmented system is a non-minimal realization that exactly reproduces the input-output map of the original plant, then showing that the optimal LQR gain for the augmented pair (A_aug, B_aug) recovers the optimal gain for the canonical non-minimal realization because the filter states are linearly related to past inputs and outputs. The quadratic cost equivalence follows from the fact that the additional modes are unobservable from the regulated output and are driven by the same input that appears in the original cost; however, we acknowledge that the current write-up relies on this structural property without spelling out an explicit invariance of the quadratic form. In the revision we will insert a short lemma that constructs the similarity transformation between the two quadratic forms and verifies that the filter-induced contribution is identically zero under the optimal policy, thereby making the cost equivalence fully rigorous. revision: yes

  2. Referee: [§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.

    Authors: We agree that the current description of the data-driven value iteration in Section 4 would benefit from an explicit persistence-of-excitation (PE) assumption and a convergence statement. In the revised manuscript we will add a precise PE condition on the collected input-output trajectories (requiring that the regressor matrix formed by the filtered signals and inputs has full rank) together with a theorem that establishes convergence of the iterated gain to the optimal augmented gain, including a finite-data error bound that depends on the level of excitation and the number of samples. These additions will be placed immediately after the algorithm statement. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on standard system-theoretic equivalence and ADP structures

full rationale

The paper derives an explicit recovery of the optimal gain for the augmented system from the canonical non-minimal realization via the observer interpretation of Kreisselmeier's filter, which preserves the input-output map by construction of the filter dynamics. This equivalence is established through algebraic properties of the augmented (A_aug, B_aug) pair and the quadratic cost, independent of any fitted parameters or self-referential definitions. The subsequent data-driven value iteration applies standard ADP to the accessible augmented states, without renaming or smuggling prior results as new predictions. No load-bearing step reduces to its inputs by definition; the central claim is a verifiable first-principles result in linear systems theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard linear-systems assumptions and the observer property of the chosen filter; no free parameters or new entities are introduced in the abstract.

axioms (2)
  • domain assumption The plant is a continuous-time linear time-invariant system with unknown matrices.
    Explicitly stated as the setting for the LQR problem in the abstract.
  • domain assumption Kreisselmeier's adaptive filter admits an observer interpretation yielding an augmented system that preserves input-output response and supplies accessible states.
    Central premise used to connect the augmented system to the original plant (abstract description of the filter).

pith-pipeline@v0.9.0 · 5675 in / 1360 out tokens · 47642 ms · 2026-05-19T21:24:02.318760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018

  2. [2]

    Beyond regression: New tools for prediction and analysis in the behavioral sciences,

    P. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,”PhD thesis, Committee on Applied Mathe- matics, Harvard University, Cambridge, MA, 1974

  3. [3]

    Deep reinforcement learning for autonomous driving: A survey,

    B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. P ´erez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021

  4. [4]

    Reinforcement learning in robotics: A survey,

    J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013

  5. [5]

    Data-driven control based on the behavioral approach from theory to applications in power systems,

    I. Markovsky, L. Huang, and F. Dorfler, “Data-driven control based on the behavioral approach from theory to applications in power systems,” IEEE Control Systems Magazine, vol. 43, no. 5, pp. 28–68, 2023

  6. [6]

    A tour of reinforcement learning: The view from continuous control,

    B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019

  7. [7]

    Reinforcement learning and adaptive dynamic programming for feedback control,

    F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

  8. [8]

    Jiang and Z.-P

    Y . Jiang and Z.-P. Jiang,Robust Adaptive Dynamic Programming. Hoboken, NJ, USA: Wiley-IEEE Press, 2017

  9. [9]

    F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal Control. John Wiley & Sons, 2012

  10. [10]

    On an iterative technique for Riccati equation compu- tations,

    D. Kleinman, “On an iterative technique for Riccati equation compu- tations,”IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968

  11. [11]

    Adaptive optimal control for continuous-time linear systems based on policy iteration,

    D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,”Automatica, vol. 45, no. 2, pp. 477–484, 2009

  12. [12]

    Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

    Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

  13. [13]

    Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

    T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

  14. [14]

    Global convergence of policy gradient methods for the linear quadratic regulator,

    M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning. PMLR, 2018, pp. 1467–1476

  15. [15]

    Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

    H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

  16. [16]

    S. A. A. Rizvi and Z. Lin,Output feedback reinforcement learning control for linear systems. Springer, 2023

  17. [17]

    Optimizing static linear feedback: Gradient method,

    I. Fatkhullin and B. Polyak, “Optimizing static linear feedback: Gradient method,”SIAM Journal on Control and Optimization, vol. 59, no. 5, pp. 3887–3911, 2021

  18. [18]

    Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,

    F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,”IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010

  19. [19]

    Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,

    S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,”IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 5, pp. 1523–1536, 2018

  20. [20]

    On the optimization landscape of dynamic output feedback linear quadratic control,

    J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2023

  21. [21]

    Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,

    L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,”IEEE Transactions on Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2014

  22. [22]

    Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,

    S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,”IEEE Transactions on Cybernetics, vol. 50, no. 11, pp. 4670– 4679, 2019

  23. [23]

    Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,

    C. Chen, L. Xie, K. Xie, F. L. Lewis, and S. Xie, “Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,”Automatica, vol. 146, p. 110581, 2022

  24. [24]

    A new approach to the data-driven output- based LQR problem of continuous-time linear systems,

    L. Lin, H. Lin, and J. Huang, “A new approach to the data-driven output- based LQR problem of continuous-time linear systems,”arXiv preprint arXiv:2509.18819, 2025

  25. [25]

    Data-driven control of continuous-time lti systems via non-minimal realizations,

    A. Bosso, M. Borghesi, A. Iannelli, G. Notarstefano, and A. R. Teel, “Data-driven control of continuous-time lti systems via non-minimal realizations,”IEEE Transactions on Automatic Control, pp. 1–16, 2026, early access

  26. [26]

    Data-driven stabilization of continuous-time LTI systems from noisy input-output data,

    A. Bosso, M. Borghesi, A. Iannelli, B. Yi, and G. Notarste- fano, “Data-driven stabilization of continuous-time LTI systems from noisy input-output data,”European Control Conference, 2026, see ArXiv:2511.11417

  27. [27]

    Input-output data- driven stabilization of continuous-time linear MIMO systems,

    H. Gao, A. Bosso, L. Wang, D. Saussi ´e, and B. Yi, “Input-output data- driven stabilization of continuous-time linear MIMO systems,”European Control Conference, 2026, (see arXiv:2511.06524)

  28. [28]

    The generation of adaptive law structures for glob- ally convergent adaptive observers,

    G. Kreisselmeier, “The generation of adaptive law structures for glob- ally convergent adaptive observers,”IEEE Transactions on Automatic Control, vol. 24, no. 3, pp. 510–513, 1979

  29. [29]

    Transverse exponential stability and applications,

    V . Andrieu, B. Jayawardhana, and L. Praly, “Transverse exponential stability and applications,”IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3396–3411, 2016

  30. [30]

    Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,

    A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,”Automatica, vol. 43, no. 3, pp. 473–481, 2007

  31. [31]

    Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,

    K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

  32. [32]

    B. L. Stevens, F. L. Lewis, and E. N. Johnson,Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems. John Wiley & Sons, 2015