Direct Data-Driven Linear Quadratic Tracking via Policy Optimization
Pith reviewed 2026-05-20 19:42 UTC · model grok-4.3
The pith
Reference decoupling renders data-driven linear quadratic tracking exactly equivalent to certainty-equivalence control.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a reference-decoupled reformulation of LQT is exactly equivalent to the indirect certainty-equivalence LQT solution. This reformulation accommodates the covariance parameterization with decision variables whose dimension stays fixed independent of data horizon. It supports development of offline and online DeePO algorithms, which achieve global linear convergence in the offline case via local gradient dominance and smoothness, and linear decay of the optimality gap up to an SNR-dependent bias in the online case.
What carries the argument
The reference-decoupled reformulation of LQT, which decouples the time-varying reference from the feedback-feedforward policy to enable fixed-dimension sample-covariance parameterization.
Load-bearing premise
The linear system and quadratic cost structure allow the time-varying reference to be fully decoupled from the policy without any loss of optimality.
What would settle it
Apply the proposed DeePO algorithm to a low-dimensional linear system with a known closed-form LQT solution and verify whether the achieved cost equals that of the indirect certainty-equivalence controller or whether observed convergence deviates from the predicted linear rate.
Figures
read the original abstract
Direct data-driven optimal control provides an elegant end-to-end paradigm, yet its real-time applicability is often hindered by the growing dimensionality of online decision variables. Recent breakthroughs, notably Data-EnablEd Policy Optimization (DeePO), overcome this bottleneck for the Linear Quadratic Regulator (LQR) through sample-covariance parameterization; however, extending this paradigm to Linear Quadratic Tracking (LQT) poses a fundamental challenge. The core difficulty stems from the intricate coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization. We first introduce a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables independent of data horizon. This formulation is proven to be exactly equivalent to the indirect certainty-equivalence LQT solution. Leveraging this characterization, we develop offline and online DeePO algorithms. Theoretically, we prove global linear convergence for the offline algorithm using local gradient dominance and smoothness, and show that in the online setting the optimality gap decays linearly up to a bias term that scales inversely with the signal-to-noise ratio (SNR). Numerical simulations varify the theoretical results and illustrate the superior tracking performance of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a reference-decoupled reformulation of the linear quadratic tracking (LQT) problem that is proven equivalent to the indirect certainty-equivalence LQT solution. This reformulation enables covariance parameterization of the policy with dimension independent of the data horizon, allowing development of offline and online Data-EnablEd Policy Optimization (DeePO) algorithms. The authors prove global linear convergence of the offline algorithm via local gradient dominance and smoothness, and show linear decay of the optimality gap up to an SNR-dependent bias in the online setting. Numerical simulations are used to verify the theoretical claims and demonstrate improved tracking performance.
Significance. If the equivalence holds without hidden restrictions on the reference class, the work provides a scalable direct data-driven extension of DeePO from LQR to LQT, with fixed-dimensional parameterization and explicit convergence rates. This could enable more practical real-time tracking controllers from data, strengthening the case for end-to-end data-driven methods in linear systems with time-varying references.
major comments (2)
- [Abstract and reformulation section] The equivalence between the reference-decoupled reformulation and the indirect certainty-equivalence LQT solution is the load-bearing claim for both the fixed-dimension parameterization and the convergence results. The abstract states this equivalence is proven, but the decoupling conditions for arbitrary time-varying r_t under the quadratic cost (including whether the feedforward component is exactly recovered) require explicit statement and verification to rule out implicit restrictions on the reference class.
- [Online convergence theorem] The online result claims linear decay of the optimality gap up to a bias scaling inversely with SNR. The derivation of this bias term and its dependence on data statistics (e.g., how it arises from the online bias term) should be cross-checked against the simulation quantification to confirm it does not undermine the linear rate claim for practical SNR values.
minor comments (2)
- [Abstract] Abstract contains the typo 'varify' which should be corrected to 'verify'.
- [Numerical simulations] The manuscript would benefit from a table summarizing the offline vs. online DeePO convergence rates and bias terms for direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, providing clarifications on the equivalence and convergence results while making targeted revisions to improve explicitness and verification.
read point-by-point responses
-
Referee: [Abstract and reformulation section] The equivalence between the reference-decoupled reformulation and the indirect certainty-equivalence LQT solution is the load-bearing claim for both the fixed-dimension parameterization and the convergence results. The abstract states this equivalence is proven, but the decoupling conditions for arbitrary time-varying r_t under the quadratic cost (including whether the feedforward component is exactly recovered) require explicit statement and verification to rule out implicit restrictions on the reference class.
Authors: The equivalence holds for arbitrary bounded time-varying references r_t under the standard quadratic cost, with no implicit restrictions on the reference class beyond system stabilizability and the boundedness of r_t. Theorem 1 establishes that the reference-decoupled reformulation is exactly equivalent to the indirect certainty-equivalence LQT solution, exactly recovering both the feedback gain and the feedforward component. To make the decoupling conditions fully explicit, we have revised the abstract to reference Theorem 1 directly and added a clarifying remark in Section III stating the conditions and confirming exact feedforward recovery. revision: yes
-
Referee: [Online convergence theorem] The online result claims linear decay of the optimality gap up to a bias scaling inversely with SNR. The derivation of this bias term and its dependence on data statistics (e.g., how it arises from the online bias term) should be cross-checked against the simulation quantification to confirm it does not undermine the linear rate claim for practical SNR values.
Authors: The bias term in the online convergence result (Theorem 3) arises from the persistent covariance estimation error in the online data-driven gradient step, which is inversely proportional to SNR due to the additive noise variance in the collected trajectories. We have cross-checked the derivation against the simulation results in Section V; for practical SNR values (above approximately 15 dB), the plots show clear linear decay of the optimality gap until the predicted bias floor is reached, without undermining the linear rate. In the revision we have expanded the discussion following Theorem 3 to explicitly trace the bias to the online bias term and data statistics, and added SNR-sweep simulation figures to quantify the effect. revision: yes
Circularity Check
Reference-decoupled LQT reformulation equivalence derived via internal proof without reduction to inputs or self-citation chains.
full rationale
The paper presents the reference-decoupled reformulation as a new characterization of LQT, followed by an explicit proof of exact equivalence to the indirect certainty-equivalence solution. This equivalence is used to enable covariance parameterization of fixed dimension. Subsequent offline global linear convergence (via local gradient dominance) and online linear decay results are derived from standard policy optimization analysis applied to the reformulated problem. No equations or claims reduce by construction to fitted parameters, prior self-citations, or ansatzes; the derivation chain is self-contained and relies on the linear-quadratic structure and data-driven covariance properties as independent inputs. The SNR-dependent bias term is an explicit output of the online analysis rather than an implicit fit.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The underlying system is linear time-invariant with quadratic costs.
Reference graph
Works this paper leans on
-
[1]
Stabilizing Dynamical Systems via Policy Gradient Methods , volume =
Perdomo, Juan and Umenberger, Jack and Simchowitz, Max , booktitle =. Stabilizing Dynamical Systems via Policy Gradient Methods , volume =
-
[2]
Reinforcement learning: An introduction , year =
Sutton, Richard S and Barto, Andrew G , publisher =. Reinforcement learning: An introduction , year =
-
[3]
Learning control systems--Review and outlook , volume =
Fu, King-Sun , journal =. Learning control systems--Review and outlook , volume =
-
[4]
arXiv preprint arXiv:2202.07187 , year=
On the sample complexity of stabilizing lti systems on a single trajectory , author=. arXiv preprint arXiv:2202.07187 , year=
-
[5]
Data informativity: a new perspective on data-driven analysis and control , volume =
van Waarde, Henk J and Eising, Jaap and Trentelman, Harry L and Camlibel, M Kanat , journal =. Data informativity: a new perspective on data-driven analysis and control , volume =
-
[6]
Linear System Theory and Design (4th edition) , year =
Chen, Chi-Tsong , date-modified =. Linear System Theory and Design (4th edition) , year =
-
[7]
Ljung, L. , date-modified =. System Identification: Theory for the User , year =
-
[8]
van Waarde, Henk J. and Camlibel, M. Kanat and Mesbahi, Mehran , doi =. From Noisy Data to Feedback Controllers: Nonconservative Design via a Matrix. IEEE Transactions on Automatic Control , number =. 2022 , bdsk-url-1 =
work page 2022
-
[9]
Formulas for data-driven control: Stabilization, optimality, and robustness , volume =
De Persis, Claudio and Tesi, Pietro , journal =. Formulas for data-driven control: Stabilization, optimality, and robustness , volume =
-
[10]
Global convergence of policy gradient methods for the linear quadratic regulator , year =
Fazel, Maryam and Ge, Rong and Kakade, Sham and Mesbahi, Mehran , booktitle =. Global convergence of policy gradient methods for the linear quadratic regulator , year =
-
[11]
Analysis of the Optimization Landscape of Linear Quadratic Gaussian
Tang, Yujie and Zheng, Yang and and Li, Na , booktitle =. Analysis of the Optimization Landscape of Linear Quadratic Gaussian. arXiv:2102.04393 , organization =
-
[12]
Global convergence of policy gradient primal--dual methods for risk-constrained
Zhao, Feiran and You, Keyou and Ba. Global convergence of policy gradient primal--dual methods for risk-constrained. IEEE Transactions on Automatic Control , volume=. 2023 , publisher=
work page 2023
-
[13]
On the linear quadratic data-driven control , year =
Markovsky, Ivan and Rapisarda, Paolo , booktitle =. On the linear quadratic data-driven control , year =. doi:10.23919/ECC.2007.7068299 , pages =
-
[14]
Data-enabled predictive control: In the shallows of the
Coulson, Jeremy and Lygeros, John and D. Data-enabled predictive control: In the shallows of the. 18th European Control Conference (ECC) , organization =
-
[15]
Stability analysis and control design of
Park, Un Sik and Ikeda, Masao , journal =. Stability analysis and control design of
-
[16]
Data-based controllability and observability analysis of linear discrete-time systems , volume =
Wang, Zhuo and Liu, Derong , journal =. Data-based controllability and observability analysis of linear discrete-time systems , volume =
-
[17]
Liu, Derong and Yan, Pengfei and Wei, Qinglai , journal =. Data-based analysis of discrete-time linear systems in noisy environment: Controllability and observability , volume =
-
[18]
Zhou, Binquan and Wang, Zhuo and Zhai, Yueyang and Yuan, Heng , booktitle =. Data-driven analysis methods for controllability and observability of a class of discrete LTI systems with delays , year =
-
[19]
Behavioral systems theory in data-driven analysis, signal processing, and control , volume =
Markovsky, Ivan and D. Behavioral systems theory in data-driven analysis, signal processing, and control , volume =. Annual Reviews in Control , pages =
-
[20]
T.M. Maupong and J.C. Mayo-Maldonado and P. Rapisarda , issn =. On Lyapunov functions and data-driven dissipativity , volume =. IFAC-PapersOnLine , number =
-
[21]
Determining optimal input--output properties: A data-driven approach , volume =
Koch, Anne and Berberich, Julian and K. Determining optimal input--output properties: A data-driven approach , volume =. Automatica , pages =
-
[22]
Martin, Tim and Allgöwer, Frank , journal=. Data-driven inference on optimal input-output properties of polynomial systems with focus on nonlinearity measures , year=
-
[23]
Data-driven tests for controllability , volume =
Mishra, Vikas Kumar and Markovsky, Ivan and Grossmann, Ben , journal =. Data-driven tests for controllability , volume =
-
[24]
ArXiv preprint arXiv:2109.02090 , title =
van Waarde, Henk J and Camlibel, M Kanat and Rapisarda, Paolo and Trentelman, Harry L , date-modified =. ArXiv preprint arXiv:2109.02090 , title =
-
[25]
van Waarde, Henk J. , journal=. Beyond Persistent Excitation: Online Experiment Design for Data-Driven Modeling and Control , year=
-
[26]
A note on persistency of excitation , volume =
Willems, Jan C and Rapisarda, Paolo and Markovsky, Ivan and De Moor, Bart LM , journal =. A note on persistency of excitation , volume =
-
[27]
Data-driven model predictive control with stability and robustness guarantees , volume =
Berberich, Julian and K. Data-driven model predictive control with stability and robustness guarantees , volume =. IEEE Transactions on Automatic Control , number =
-
[28]
Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control , journal =. 1985 , issn =. doi:https://doi.org/10.1016/0167-6911(85)90035-0 , author =
-
[29]
From model-based control to data-driven control: Survey, classification and perspective , volume =
Zhong-Sheng Hou and Zhuo Wang , date-modified =. From model-based control to data-driven control: Survey, classification and perspective , volume =. Information Sciences , pages =
-
[30]
D. Bridging direct & indirect data-driven control formulations via regularizations and relaxations , year =. IEEE Transactions on Automatic Control , publisher =
-
[31]
A Tour of Reinforcement Learning: The View from Continuous Control , volume =
Recht, Benjamin , doi =. A Tour of Reinforcement Learning: The View from Continuous Control , volume =. Annual Review of Control, Robotics, and Autonomous Systems , number =. 2019 , bdsk-url-1 =
work page 2019
-
[32]
Data-driven control of complex networks , volume =
Baggio, Giacomo and Bassett, Danielle S and Pasqualetti, Fabio , journal =. Data-driven control of complex networks , volume =
-
[33]
Human-level control through deep reinforcement learning , volume =
Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A and Veness, Joel and Bellemare, Marc G and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K and Ostrovski, Georg and others , journal =. Human-level control through deep reinforcement learning , volume =
-
[34]
Mastering the game of Go with deep neural networks and tree search , volume =
Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal =. Mastering the game of Go with deep neural networks and tree search , volume =
-
[35]
Tu, Stephen and Recht, Benjamin , booktitle =. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint , year =
-
[36]
Mohammadi, Hesameddin and Zare, Armin and Soltanolkotabi, Mahdi and Jovanovi. Convergence and Sample Complexity of Gradient Methods for the Model-Free Linear--Quadratic Regulator Problem , volume =. IEEE Transactions on Automatic Control , number =
-
[37]
From time series to linear system---Part I
Willems, Jan C , journal =. From time series to linear system---Part I. Finite dimensional linear time invariant systems , volume =
-
[38]
Distributionally robust chance constrained data-enabled predictive control , volume =
Coulson, Jeremy and Lygeros, John and D. Distributionally robust chance constrained data-enabled predictive control , volume =. IEEE Transactions on Automatic Control , number =
-
[39]
A trajectory-based framework for data-driven system analysis and control , year =
Berberich, Julian and Allg. A trajectory-based framework for data-driven system analysis and control , year =. European Control Conference (ECC) , organization =
-
[40]
Data-driven stabilization of nonlinear polynomial systems with noisy data , volume =
Guo, Meichen and De Persis, Claudio and Tesi, Pietro , date-modified =. Data-driven stabilization of nonlinear polynomial systems with noisy data , volume =. IEEE Transactions on Automatic Control , number =
-
[41]
Data-driven control of dynamic event-triggered systems with delays , year =
Wang, Xin and Sun, Jian and Berberich, Julian and Wang, Gang and Allgower, Frank and Chen, Jie , journal =. Data-driven control of dynamic event-triggered systems with delays , year =
-
[42]
Control theory for linear systems , year =
Trentelman, Harry L and Stoorvogel, Anton A and Hautus, Malo , publisher =. Control theory for linear systems , year =
-
[43]
Robust and optimal control , year =
Zhou, Kemin and Doyle, John Comstock and Glover, Keith , publisher =. Robust and optimal control , year =
- [44]
-
[45]
Model Predictive Control: Theory, Computation, and Design , author=. 2017 , publisher=
work page 2017
-
[46]
Process control: modeling, design, and simulation , author=. 2003 , publisher=
work page 2003
-
[47]
Robust and Adaptive Control: With Aerospace Applications , author=. 2012 , publisher=
work page 2012
- [48]
-
[49]
Experiment design for impulse response identification with signal matrix models , author=. IFAC-PapersOnLine , volume=. 2021 , publisher=
work page 2021
-
[50]
IFAC Proceedings Volumes , volume=
Numerical identification of linear dynamic systems from normal operating records , author=. IFAC Proceedings Volumes , volume=. 1965 , publisher=
work page 1965
-
[51]
Fast identification and stabilization of unknown linear systems , year =
Dennis Gramlich, Christian Ebenbauer , journal =. Fast identification and stabilization of unknown linear systems , year =
-
[52]
IEEE Transactions on Industrial Electronics , volume=
A failure-detection strategy for IGBT based on gate-voltage behavior applied to a motor drive system , author=. IEEE Transactions on Industrial Electronics , volume=. 2010 , publisher=
work page 2010
-
[53]
Subspace identification for linear systems: Theory, Implementation, Applications , author=. 1996 , publisher=
work page 1996
- [54]
- [55]
-
[56]
Minimum input design for direct data-driven property identification of unknown linear systems , author=. Automatica , volume=. 2023 , publisher=
work page 2023
-
[57]
Dynamic programming and optimal control , author=. 2012 , publisher=
work page 2012
-
[58]
On the certainty-equivalence approach to direct data-driven
D. On the certainty-equivalence approach to direct data-driven. IEEE Transactions on Automatic Control , volume=. 2023 , publisher=
work page 2023
-
[59]
IEEE Control Systems Magazine , volume=
Control for societal-scale challenges: Road map 2030 , author=. IEEE Control Systems Magazine , volume=. 2024 , publisher=
work page 2030
-
[60]
Proceedings of the 24th Annual Conference on Learning Theory , pages=
Regret bounds for the adaptive control of linear quadratic systems , author=. Proceedings of the 24th Annual Conference on Learning Theory , pages=. 2011 , organization=
work page 2011
-
[61]
Foundations of Computational Mathematics , volume=
On the sample complexity of the linear quadratic regulator , author=. Foundations of Computational Mathematics , volume=. 2020 , publisher=
work page 2020
- [62]
-
[63]
2023 62nd IEEE Conference on Decision and Control (CDC) , pages=
Data-enabled policy optimization for the linear quadratic regulator , author=. 2023 62nd IEEE Conference on Decision and Control (CDC) , pages=. 2023 , organization=
work page 2023
-
[64]
IEEE Transactions on Automatic Control , year=
Convergence and sample complexity of policy gradient methods for stabilizing linear systems , author=. IEEE Transactions on Automatic Control , year=
-
[65]
Annual Review of Control, Robotics, and Autonomous Systems , volume=
Toward a theoretical foundation of policy optimization for learning control policies , author=. Annual Review of Control, Robotics, and Autonomous Systems , volume=. 2023 , publisher=
work page 2023
-
[66]
Reinforcement learning and optimal control , author=. 2019 , publisher=
work page 2019
-
[67]
IEEE Control Systems Magazine , volume=
Data-driven control based on the behavioral approach: From theory to applications in power systems , author=. IEEE Control Systems Magazine , volume=. 2023 , publisher=
work page 2023
-
[68]
arXiv preprint arXiv:2312.14788 , year=
Harnessing the final control error for optimal data-driven predictive control , author=. arXiv preprint arXiv:2312.14788 , year=
-
[69]
Low-complexity learning of linear quadratic regulators from noisy data , author=. Automatica , volume=. 2021 , publisher=
work page 2021
-
[70]
Advances in Neural Information Processing Systems , volume=
Certainty equivalence is efficient for linear quadratic control , author=. Advances in Neural Information Processing Systems , volume=
-
[71]
2025 American Control Conference (ACC) , pages=
Linear convergence of data-enabled policy optimization for linear quadratic tracking , author=. 2025 American Control Conference (ACC) , pages=. 2025 , organization=
work page 2025
-
[72]
IEEE Control Systems Letters , volume=
Data-driven design of explicit predictive controllers with structural priors , author=. IEEE Control Systems Letters , volume=. 2023 , publisher=
work page 2023
-
[73]
Data-Enabled Policy Optimization for Direct Adaptive Learning of the
Zhao, Feiran and Dörfler, Florian and Chiuso, Alessandro and You, Keyou , journal=. Data-Enabled Policy Optimization for Direct Adaptive Learning of the. 2025 , volume=
work page 2025
-
[74]
Regularization for Covariance Parameterization of Direct Data-Driven LQR Control , year=
Zhao, Feiran and Chiuso, Alessandro and Dörfler, Florian , journal=. Regularization for Covariance Parameterization of Direct Data-Driven LQR Control , year=
-
[75]
On the Role of Regularization in Direct Data-Driven
Dörfler, Florian and Tesi, Pietro and De Persis, Claudio , booktitle=. On the Role of Regularization in Direct Data-Driven. 2022 , volume=
work page 2022
-
[76]
van and Lygeros, John and Dörfler, Florian , journal=
Coulson, Jeremy and Waarde, Henk J. van and Lygeros, John and Dörfler, Florian , journal=. A Quantitative Notion of Persistency of Excitation and the Robust Fundamental Lemma , year=
- [77]
-
[78]
Policy Gradient Adaptive Control for the
Zhao, Feiran and Chiuso, Alessandro and D. Policy Gradient Adaptive Control for the. arXiv preprint arXiv:2505.03706 , year=
-
[79]
Mathematics of Control, Signals and Systems , volume=
Small-gain theorem for ISS systems and applications , author=. Mathematics of Control, Signals and Systems , volume=. 1994 , publisher=
work page 1994
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.