Data-Driven Linear Quadratic Control Using Output-Feedback via Non-Minimal Realization

Bowen Yi; Hai Lin; Panos J. Antsaklis; Weijian Li

arxiv: 2605.16752 · v1 · pith:GZEZPV4Hnew · submitted 2026-05-16 · 🧮 math.OC

Data-Driven Linear Quadratic Control Using Output-Feedback via Non-Minimal Realization

Weijian Li , Bowen Yi , Panos J. Antsaklis , Hai Lin This is my paper

Pith reviewed 2026-05-19 21:24 UTC · model grok-4.3

classification 🧮 math.OC

keywords data-driven controllinear quadratic controloutput feedbacknon-minimal realizationadaptive filteradaptive dynamic programmingvalue iteration

0 comments

The pith

An augmented system from Kreisselmeier's adaptive filter recovers the optimal state-feedback gain for the original plant in data-driven LQ control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method for solving the linear quadratic control problem when the system matrices are unknown and only input and output data are available. It constructs a non-minimal realization using Kreisselmeier's adaptive filter, which is interpreted as an observer to create an augmented system with accessible states that match the original input-output behavior. The key result is that the optimal gain computed for this augmented system directly gives the optimal state-feedback controller for the underlying plant. This enables a data-driven value iteration algorithm in the adaptive dynamic programming setting to learn the controller without explicit model knowledge. A sympathetic reader would care because it bridges output-feedback control with data-driven methods for systems where full state measurement is impractical.

Core claim

The optimal gain of the augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, a data-driven value iteration algorithm is developed within the adaptive dynamic programming framework.

What carries the argument

The augmented system obtained by applying Kreisselmeier's adaptive filter, which preserves the input-output response while making state trajectories accessible for the non-minimal realization.

If this is right

The resulting controller is implementable directly from input-output data without requiring state measurements.
The data-driven value iteration converges to the optimal LQ solution for the original unknown system.
The approach applies to continuous-time systems with unknown matrices.
Performance is validated through simulations on example systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framework might extend to other optimal control problems like LQG or H-infinity by similar augmentation.
Testing on a low-order plant with known optimal gain would confirm the recovery property numerically.
Future work could explore discrete-time versions or robustness to noise in the data.

Load-bearing premise

Kreisselmeier's adaptive filter admits an observer interpretation that leads to an augmented system preserving the input-output response of the realization and providing accessible state trajectories.

What would settle it

Apply the data-driven algorithm to a known linear system, compute the gain from the augmented system, and check if it equals the analytically known optimal state-feedback gain for the original plant.

Figures

Figures reproduced from arXiv: 2605.16752 by Bowen Yi, Hai Lin, Panos J. Antsaklis, Weijian Li.

**Figure 2.** Figure 2: Convergence of Algorithm 2. (a) Trajectory of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence of the states x and Z. 0 7 14 21 28 −10 10 30 50 70 Time (sec) u=−K∗ z z u=−K∗x (a) Input signal u(t) (deg) 0 7 14 21 28 −10 10 30 50 70 Time (sec) (b) Output signal y(t) (deg/s) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: The input and output trajectories of (1). [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

In this paper, we investigate a continuous-time linear quadratic control problem for systems with unknown matrices, where only input-output data are available. We propose an output-feedback learning framework based on a canonical nonminimal realization constructed through Kreisselmeier's adaptive filter. The filter admits an observer interpretation, which leads to an augmented system that preserves the input-output response of the realization and provides accessible state trajectories. We show that the optimal gain of this augmented system explicitly recovers the optimal gain associated with the canonical non-minimal realization, and hence achieves the optimal state-feedback solution of the original plant. Exploiting this relation and the known structure of the augmented input matrix, we develop a data-driven value iteration algorithm within the adaptive dynamic programming framework. The resulting controller is implementable from input-output data, and its performance is validated via simulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable data-driven route to continuous-time output-feedback LQR by augmenting with Kreisselmeier's filter and running value iteration on the resulting system, but the exact cost equivalence in the recovery step needs explicit checking.

read the letter

The main point is a synthesis that turns input-output data into an output-feedback LQR controller for continuous-time plants with unknown dynamics. They build a non-minimal realization via Kreisselmeier's filter, interpret it as an observer to get usable states, and then show that the optimal gain computed on this augmented system recovers the optimal state-feedback gain for the original plant. From there they set up a data-driven value iteration scheme that only needs the known structure of the augmented input matrix. Simulations are used to check closed-loop behavior on examples. This is the concrete advance: a direct link from the filter-augmented LQR solution back to the original problem without estimating the full state-space matrices first. The approach sits inside standard ADP and LQR machinery, which keeps the algorithm steps straightforward to implement from data. That structure is what the paper does cleanly. The soft spot is the cost side of the recovery claim. Input-output equivalence is one requirement, but the quadratic cost on output and input must translate to an equivalent quadratic form on the augmented state. If the filter modes contribute extra terms that are not zero or correctly folded into the augmented Q matrix, the Riccati solution on the augmented pair optimizes a slightly different problem. The abstract states the recovery relation but does not lay out the derivation or error bounds here, so the full manuscript needs to show that the closed-loop cost matches exactly rather than approximately. Minor issues like missing data details or simulation parameters can be fixed in revision, but this equivalence step is central. The work is aimed at control researchers who already use adaptive dynamic programming or data-driven LQR and want an output-feedback version that stays implementable from measurements alone. A reader familiar with Kreisselmeier filters and continuous-time ADP will see the synthesis quickly and can judge the proofs directly. It deserves a serious referee. The framework is concrete, the algorithm is spelled out, and the simulations provide a starting point for checking performance. Peer review can verify the cost equivalence derivation and tighten any assumptions on the filter. I would send it out rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a data-driven output-feedback framework for continuous-time linear-quadratic regulation of unknown plants. Using only input-output trajectories, it constructs a canonical non-minimal realization via Kreisselmeier’s adaptive filter, interprets the filter as an observer to obtain an augmented state-space system that preserves the original input-output map, proves that the optimal LQR gain computed on the augmented pair recovers the optimal gain of the non-minimal realization (and hence the original plant), and derives a value-iteration algorithm that exploits the known structure of the augmented input matrix to learn the gain from data.

Significance. If the recovery relation is rigorously established, the work supplies a concrete route from raw I/O data to an optimal output-feedback controller without requiring state measurements or explicit system identification. The explicit link between the augmented Riccati solution and the original LQ optimum, together with the data-driven ADP implementation, would constitute a useful addition to the literature on non-minimal realizations and adaptive dynamic programming.

major comments (2)

[§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.
[§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.

minor comments (2)

[Figure 1] Figure 1 (block diagram of the augmented system): the signal-flow arrows between the filter states and the original plant outputs should be labeled with the exact matrix dimensions to avoid ambiguity.
[Notation] Notation: the symbol for the augmented state vector is occasionally overloaded with the original state; a distinct symbol (e.g., x_aug) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive comments, which help strengthen the manuscript. We address each major comment below and will incorporate the suggested clarifications in the revised version.

read point-by-point responses

Referee: [§3] §3 (Augmented system and gain recovery): the proof that the Riccati solution on (A_aug, B_aug) yields a gain whose closed-loop cost equals that of the canonical non-minimal realization must be checked for exact quadratic-cost equivalence. I/O preservation alone does not automatically guarantee that the filter-induced modes contribute zero (or correctly absorbed) cost; an explicit transformation relating the two quadratic forms or an invariance argument is required.

Authors: We appreciate this careful remark. The proof in Section 3 proceeds by first establishing that the augmented system is a non-minimal realization that exactly reproduces the input-output map of the original plant, then showing that the optimal LQR gain for the augmented pair (A_aug, B_aug) recovers the optimal gain for the canonical non-minimal realization because the filter states are linearly related to past inputs and outputs. The quadratic cost equivalence follows from the fact that the additional modes are unobservable from the regulated output and are driven by the same input that appears in the original cost; however, we acknowledge that the current write-up relies on this structural property without spelling out an explicit invariance of the quadratic form. In the revision we will insert a short lemma that constructs the similarity transformation between the two quadratic forms and verifies that the filter-induced contribution is identically zero under the optimal policy, thereby making the cost equivalence fully rigorous. revision: yes
Referee: [§4] §4 (Data-driven value iteration): the algorithm statement should include a precise statement of the persistence-of-excitation condition on the collected I/O data and a convergence guarantee (or error bound) for the learned gain relative to the true optimal gain of the augmented system.

Authors: We agree that the current description of the data-driven value iteration in Section 4 would benefit from an explicit persistence-of-excitation (PE) assumption and a convergence statement. In the revised manuscript we will add a precise PE condition on the collected input-output trajectories (requiring that the regressor matrix formed by the filtered signals and inputs has full rank) together with a theorem that establishes convergence of the iterated gain to the optimal augmented gain, including a finite-data error bound that depends on the level of excitation and the number of samples. These additions will be placed immediately after the algorithm statement. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on standard system-theoretic equivalence and ADP structures

full rationale

The paper derives an explicit recovery of the optimal gain for the augmented system from the canonical non-minimal realization via the observer interpretation of Kreisselmeier's filter, which preserves the input-output map by construction of the filter dynamics. This equivalence is established through algebraic properties of the augmented (A_aug, B_aug) pair and the quadratic cost, independent of any fitted parameters or self-referential definitions. The subsequent data-driven value iteration applies standard ADP to the accessible augmented states, without renaming or smuggling prior results as new predictions. No load-bearing step reduces to its inputs by definition; the central claim is a verifiable first-principles result in linear systems theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard linear-systems assumptions and the observer property of the chosen filter; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption The plant is a continuous-time linear time-invariant system with unknown matrices.
Explicitly stated as the setting for the LQR problem in the abstract.
domain assumption Kreisselmeier's adaptive filter admits an observer interpretation yielding an augmented system that preserves input-output response and supplies accessible states.
Central premise used to connect the augmented system to the original plant (abstract description of the filter).

pith-pipeline@v0.9.0 · 5675 in / 1360 out tokens · 47642 ms · 2026-05-19T21:24:02.318760+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018

work page 2018
[2]

Beyond regression: New tools for prediction and analysis in the behavioral sciences,

P. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,”PhD thesis, Committee on Applied Mathe- matics, Harvard University, Cambridge, MA, 1974

work page 1974
[3]

Deep reinforcement learning for autonomous driving: A survey,

B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. P ´erez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021

work page 2021
[4]

Reinforcement learning in robotics: A survey,

J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013

work page 2013
[5]

Data-driven control based on the behavioral approach from theory to applications in power systems,

I. Markovsky, L. Huang, and F. Dorfler, “Data-driven control based on the behavioral approach from theory to applications in power systems,” IEEE Control Systems Magazine, vol. 43, no. 5, pp. 28–68, 2023

work page 2023
[6]

A tour of reinforcement learning: The view from continuous control,

B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019

work page 2019
[7]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

work page 2009
[8]

Jiang and Z.-P

Y . Jiang and Z.-P. Jiang,Robust Adaptive Dynamic Programming. Hoboken, NJ, USA: Wiley-IEEE Press, 2017

work page 2017
[9]

F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal Control. John Wiley & Sons, 2012

work page 2012
[10]

On an iterative technique for Riccati equation compu- tations,

D. Kleinman, “On an iterative technique for Riccati equation compu- tations,”IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968

work page 1968
[11]

Adaptive optimal control for continuous-time linear systems based on policy iteration,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,”Automatica, vol. 45, no. 2, pp. 477–484, 2009

work page 2009
[12]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012
[13]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

work page 2016
[14]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning. PMLR, 2018, pp. 1467–1476

work page 2018
[15]

Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

work page 2021
[16]

S. A. A. Rizvi and Z. Lin,Output feedback reinforcement learning control for linear systems. Springer, 2023

work page 2023
[17]

Optimizing static linear feedback: Gradient method,

I. Fatkhullin and B. Polyak, “Optimizing static linear feedback: Gradient method,”SIAM Journal on Control and Optimization, vol. 59, no. 5, pp. 3887–3911, 2021

work page 2021
[18]

Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,

F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,”IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010

work page 2010
[19]

Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,

S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,”IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 5, pp. 1523–1536, 2018

work page 2018
[20]

On the optimization landscape of dynamic output feedback linear quadratic control,

J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2023

work page 2023
[21]

Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,

L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,”IEEE Transactions on Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2014

work page 2014
[22]

Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,

S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,”IEEE Transactions on Cybernetics, vol. 50, no. 11, pp. 4670– 4679, 2019

work page 2019
[23]

Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,

C. Chen, L. Xie, K. Xie, F. L. Lewis, and S. Xie, “Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,”Automatica, vol. 146, p. 110581, 2022

work page 2022
[24]

A new approach to the data-driven output- based LQR problem of continuous-time linear systems,

L. Lin, H. Lin, and J. Huang, “A new approach to the data-driven output- based LQR problem of continuous-time linear systems,”arXiv preprint arXiv:2509.18819, 2025

work page arXiv 2025
[25]

Data-driven control of continuous-time lti systems via non-minimal realizations,

A. Bosso, M. Borghesi, A. Iannelli, G. Notarstefano, and A. R. Teel, “Data-driven control of continuous-time lti systems via non-minimal realizations,”IEEE Transactions on Automatic Control, pp. 1–16, 2026, early access

work page 2026
[26]

Data-driven stabilization of continuous-time LTI systems from noisy input-output data,

A. Bosso, M. Borghesi, A. Iannelli, B. Yi, and G. Notarste- fano, “Data-driven stabilization of continuous-time LTI systems from noisy input-output data,”European Control Conference, 2026, see ArXiv:2511.11417

work page arXiv 2026
[27]

Input-output data- driven stabilization of continuous-time linear MIMO systems,

H. Gao, A. Bosso, L. Wang, D. Saussi ´e, and B. Yi, “Input-output data- driven stabilization of continuous-time linear MIMO systems,”European Control Conference, 2026, (see arXiv:2511.06524)

work page arXiv 2026
[28]

The generation of adaptive law structures for glob- ally convergent adaptive observers,

G. Kreisselmeier, “The generation of adaptive law structures for glob- ally convergent adaptive observers,”IEEE Transactions on Automatic Control, vol. 24, no. 3, pp. 510–513, 1979

work page 1979
[29]

Transverse exponential stability and applications,

V . Andrieu, B. Jayawardhana, and L. Praly, “Transverse exponential stability and applications,”IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3396–3411, 2016

work page 2016
[30]

Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,”Automatica, vol. 43, no. 3, pp. 473–481, 2007

work page 2007
[31]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011
[32]

B. L. Stevens, F. L. Lewis, and E. N. Johnson,Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems. John Wiley & Sons, 2015

work page 2015

[1] [1]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018

work page 2018

[2] [2]

Beyond regression: New tools for prediction and analysis in the behavioral sciences,

P. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,”PhD thesis, Committee on Applied Mathe- matics, Harvard University, Cambridge, MA, 1974

work page 1974

[3] [3]

Deep reinforcement learning for autonomous driving: A survey,

B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. P ´erez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021

work page 2021

[4] [4]

Reinforcement learning in robotics: A survey,

J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013

work page 2013

[5] [5]

Data-driven control based on the behavioral approach from theory to applications in power systems,

I. Markovsky, L. Huang, and F. Dorfler, “Data-driven control based on the behavioral approach from theory to applications in power systems,” IEEE Control Systems Magazine, vol. 43, no. 5, pp. 28–68, 2023

work page 2023

[6] [6]

A tour of reinforcement learning: The view from continuous control,

B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, no. 1, pp. 253–279, 2019

work page 2019

[7] [7]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

work page 2009

[8] [8]

Jiang and Z.-P

Y . Jiang and Z.-P. Jiang,Robust Adaptive Dynamic Programming. Hoboken, NJ, USA: Wiley-IEEE Press, 2017

work page 2017

[9] [9]

F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal Control. John Wiley & Sons, 2012

work page 2012

[10] [10]

On an iterative technique for Riccati equation compu- tations,

D. Kleinman, “On an iterative technique for Riccati equation compu- tations,”IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968

work page 1968

[11] [11]

Adaptive optimal control for continuous-time linear systems based on policy iteration,

D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,”Automatica, vol. 45, no. 2, pp. 477–484, 2009

work page 2009

[12] [12]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012

[13] [13]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

work page 2016

[14] [14]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning. PMLR, 2018, pp. 1467–1476

work page 2018

[15] [15]

Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

work page 2021

[16] [16]

S. A. A. Rizvi and Z. Lin,Output feedback reinforcement learning control for linear systems. Springer, 2023

work page 2023

[17] [17]

Optimizing static linear feedback: Gradient method,

I. Fatkhullin and B. Polyak, “Optimizing static linear feedback: Gradient method,”SIAM Journal on Control and Optimization, vol. 59, no. 5, pp. 3887–3911, 2021

work page 2021

[18] [18]

Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,

F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,”IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010

work page 2010

[19] [19]

Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,

S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,”IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 5, pp. 1523–1536, 2018

work page 2018

[20] [20]

On the optimization landscape of dynamic output feedback linear quadratic control,

J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2023

work page 2023

[21] [21]

Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,

L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,”IEEE Transactions on Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2014

work page 2014

[22] [22]

Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,

S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,”IEEE Transactions on Cybernetics, vol. 50, no. 11, pp. 4670– 4679, 2019

work page 2019

[23] [23]

Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,

C. Chen, L. Xie, K. Xie, F. L. Lewis, and S. Xie, “Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning,”Automatica, vol. 146, p. 110581, 2022

work page 2022

[24] [24]

A new approach to the data-driven output- based LQR problem of continuous-time linear systems,

L. Lin, H. Lin, and J. Huang, “A new approach to the data-driven output- based LQR problem of continuous-time linear systems,”arXiv preprint arXiv:2509.18819, 2025

work page arXiv 2025

[25] [25]

Data-driven control of continuous-time lti systems via non-minimal realizations,

A. Bosso, M. Borghesi, A. Iannelli, G. Notarstefano, and A. R. Teel, “Data-driven control of continuous-time lti systems via non-minimal realizations,”IEEE Transactions on Automatic Control, pp. 1–16, 2026, early access

work page 2026

[26] [26]

Data-driven stabilization of continuous-time LTI systems from noisy input-output data,

A. Bosso, M. Borghesi, A. Iannelli, B. Yi, and G. Notarste- fano, “Data-driven stabilization of continuous-time LTI systems from noisy input-output data,”European Control Conference, 2026, see ArXiv:2511.11417

work page arXiv 2026

[27] [27]

Input-output data- driven stabilization of continuous-time linear MIMO systems,

H. Gao, A. Bosso, L. Wang, D. Saussi ´e, and B. Yi, “Input-output data- driven stabilization of continuous-time linear MIMO systems,”European Control Conference, 2026, (see arXiv:2511.06524)

work page arXiv 2026

[28] [28]

The generation of adaptive law structures for glob- ally convergent adaptive observers,

G. Kreisselmeier, “The generation of adaptive law structures for glob- ally convergent adaptive observers,”IEEE Transactions on Automatic Control, vol. 24, no. 3, pp. 510–513, 1979

work page 1979

[29] [29]

Transverse exponential stability and applications,

V . Andrieu, B. Jayawardhana, and L. Praly, “Transverse exponential stability and applications,”IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3396–3411, 2016

work page 2016

[30] [30]

Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free q-learning designs for linear discrete-time zero-sum games with application to h- infinity control,”Automatica, vol. 43, no. 3, pp. 473–481, 2007

work page 2007

[31] [31]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011

[32] [32]

B. L. Stevens, F. L. Lewis, and E. N. Johnson,Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems. John Wiley & Sons, 2015

work page 2015