Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient Approach
Pith reviewed 2026-05-10 10:54 UTC · model grok-4.3
The pith
A data-driven closed-loop parameterization of augmented dynamics enables convex optimization of optimal LQI controllers from input-state-output measurements alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors derive a data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state based solely on input-state-output measurements. This leads to a convex optimization problem whose solution gives the optimal LQR feedback gain for the augmented system, enabling data-driven optimal tracking control without explicit state augmentation in data collection.
What carries the argument
The data-driven closed-loop parameterization of the augmented dynamics, which incorporates the integral state and allows formulation of the convex data-driven LQR problem without system matrices.
If this is right
- Optimal LQI controllers for reference tracking can be obtained via convex optimization from data alone.
- The method applies to continuous-time systems and avoids explicit state augmentation during data collection.
- A policy gradient flow provides an alternative way to compute the optimal controller within the space of stabilizing gains.
- The approach is demonstrated to work on a distributed generation unit in a DC microgrid.
Where Pith is reading between the lines
- The parameterization technique could extend to other augmented controller structures beyond integral action.
- In practice this might allow direct deployment of optimal tracking controllers on hardware where only input-output data is accessible.
- Similar data-driven convex formulations might apply to discrete-time or sampled-data LQI problems.
Load-bearing premise
A data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state can be derived relying solely on input-state-output measurements of the underlying system.
What would settle it
Applying the proposed data-driven method to a linear system with known dynamics, computing the resulting controller gain, and comparing it to the analytically known optimal LQI gain from the model-based Riccati solution; a mismatch would falsify the claim that the parameterization yields the optimal feedback.
Figures
read the original abstract
This paper studies the data-driven synthesis of linear quadratic integral (LQI) controllers for continuous-time systems. The objective is to achieve optimal state-feedback control with integral action for reference tracking using only measured data. To this end, we derive a data-driven closed-loop parameterization of the augmented dynamics that incorporates the integral state while relying solely on input-state-output measurements of the underlying system. Based on this parameterization, a data-driven convex optimization problem is formulated whose solution yields the optimal linear quadratic regulator (LQR) feedback gain for the augmented system without explicit knowledge of the system matrices. In addition, a policy gradient flow is derived to compute the optimal controller within the space of stabilizing gains. The proposed approach enables data-driven optimal tracking control while avoiding explicit state augmentation in the data collection phase. The effectiveness of the method is demonstrated through a numerical example involving a distributed generation unit (DGU) in a DC microgrid.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a data-driven method for synthesizing optimal linear quadratic integral (LQI) controllers for continuous-time systems. It derives a closed-loop parameterization of the augmented (plant plus integral) dynamics that uses only input-state-output trajectories of the unaugmented plant, formulates a convex optimization problem whose solution is the optimal augmented LQR feedback gain, and supplies a policy-gradient flow to compute the gain within the set of stabilizing controllers. The approach is illustrated on a distributed generation unit in a DC microgrid.
Significance. If the parameterization is algebraically correct and the convex program recovers the true optimal augmented gain, the work would extend data-driven LQR methods to reference-tracking problems that require integral action, without requiring system identification or explicit augmentation of the collected data. This is a practically relevant extension for systems where zero steady-state error is mandatory.
major comments (2)
- [Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.
- [Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.
minor comments (2)
- [Policy gradient approach] The policy-gradient flow section should state the step-size restrictions and the invariance of the stabilizing-gain set under the flow.
- [Numerical example] The numerical example would benefit from a direct comparison of the data-driven gain against the model-based LQI solution and from reporting the closed-loop eigenvalues or tracking error norms.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. The suggestions identify opportunities to improve the exposition of the technical derivations. We address each major comment below and will make the indicated revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.
Authors: We agree that an explicit algebraic derivation is necessary for full transparency. In the revised manuscript we will insert a dedicated subsection that walks through the derivation step by step, showing precisely how the integral-state evolution and reference-tracking error are expressed solely in terms of the measured input-state-output trajectories of the unaugmented plant. We will also state the exact persistence-of-excitation rank conditions that guarantee uniqueness of the resulting parameterization. revision: yes
-
Referee: [Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.
Authors: We will add a new proposition and its proof that establishes the exact equivalence between the quadratic cost and linear constraints of the convex program and the infinite-horizon LQI cost of the augmented closed-loop system. The proof will rely only on the data-driven parameterization already derived and will explicitly verify the absence of any hidden dependence on the system matrices or the integral state, thereby confirming the model-free character of the formulation. revision: yes
Circularity Check
No circularity: parameterization derived independently from data
full rationale
The paper derives a closed-loop parameterization of the augmented (plant + integral) dynamics directly from input-state-output trajectories of the unaugmented system, then uses that parameterization to pose a convex program whose solution is the optimal augmented LQR gain. No equation is shown that defines the target gain in terms of itself, renames a fitted quantity as a prediction, or reduces the central claim to a self-citation chain. The policy-gradient flow is presented as an alternative solver within the same parameterized space. The derivation is therefore self-contained against external model-based LQI benchmarks; the only potential issue is algebraic correctness of the parameterization, which is a correctness question rather than a circularity question.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Formulas for data-driven control: Stabilization, optimality, and robustness,
C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control , no. 3, 2020
work page 2020
-
[2]
An approach to the linear multivariable servomechanism problem,
P. C. Young and J. C. Willems, “An approach to the linear multivariable servomechanism problem,”International Journal of Control , no. 5, 1972. eprint: https://doi. org/10.1080/00207177208932211
-
[3]
Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,
H. Purnawan, Mardlijah, and E. B. Purwanto, “Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,” in Journal of Physics: Conference Series, IOP Publishing, 2017
work page 2017
-
[4]
B. Kedjar and K. Al-Haddad, “LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,” in 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) , ISSN: 0840-7789, 2011
work page 2011
-
[5]
Comparison of two methods of incorporating an integral action in linear quadratic regulator,
H. G. Malkapure and M. Chidambaram, “Comparison of two methods of incorporating an integral action in linear quadratic regulator,”IFAC Proceedings Volumes, no. 1, 2014, 3rd International Conference on Advances in Control and Optimization of Dynamical Systems (2014)
work page 2014
-
[6]
Numerical Methods for H2 Related Prob- lems,
E. Feron et al. , “Numerical Methods for H2 Related Prob- lems,” in 1992 American Control Conference , 1992
work page 1992
-
[7]
Connections Be- tween Duality in Control Theory and Convex Optimization,
V . Balakrishnan and L. Vandenberghe, “Connections Be- tween Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, 1995
work page 1995
-
[8]
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,
M. Fazel et al. , “Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,” in Proceedings of the 35th ICML , 2018
work page 2018
-
[9]
Toward a theoretical foundation of policy optimization for learning control policies,
B. Hu et al. , “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems , no. V olume 6, 2023, 2023
work page 2023
-
[10]
PID Equivalent of Optimal Regulator,
S Mukhopadhyay, “PID Equivalent of Optimal Regulator,” Electronics Letters, no. 25, 1978
work page 1978
-
[11]
PID Control for Multivariable Pro- cesses,
Q.-G. Wang et al. , “PID Control for Multivariable Pro- cesses,” en,
-
[12]
New Criteria for Tuning PID Controllers,
B. T. Polyak and M. V . Khlebnikov, “New Criteria for Tuning PID Controllers,” Automation and Remote Control , no. 11, 2022
work page 2022
-
[13]
A. Datta, M.-T. Ho, and S. P. Bhattacharyya, Structure and synthesis of PID controllers . Springer Science & Business Media, 1999
work page 1999
-
[14]
On the optimization landscape of dynamic output feedback linear quadratic control,
J. Duan et al. , “On the optimization landscape of dynamic output feedback linear quadratic control,” IEEE Transactions on Automatic Control , no. 2, 2024
work page 2024
-
[15]
Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,
F. Zhao et al., “Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control, 2025
work page 2025
-
[16]
Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,
F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters , 2025
work page 2025
-
[17]
V . G. Lopez and M. A. M ¨uller, Data-based control of continuous-time linear systems with performance specifica- tions, 2025. arXiv: 2403.00424 [eess.SY]
work page internal anchor Pith review arXiv 2025
-
[18]
Data-Enabled Policy Optimization for the Linear Quadratic Regulator,
F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023
work page 2023
-
[19]
J. Bu, A. Mesbahi, and M. Mesbahi, On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case, 2019. arXiv: 1904.02737 [cs.SY]
work page Pith review arXiv 2019
-
[20]
J. Bu, A. Mesbahi, and M. Mesbahi, Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control ,
- [21]
-
[22]
An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,
V . G. Lopez and M. A. M ¨uller, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023
work page 2023
-
[23]
J. P. Hespanha, Linear Systems Theory , 2nd ed. Princeton, USA: Princeton University Press, 2018
work page 2018
-
[24]
I. R. Shafarevich and A. O. Remizov, Linear Algebra and Geometry. Springer Science & Business Media, 2013
work page 2013
-
[25]
Old and New Matrix Algebra Useful for Statis- tics,
T. Minka, “Old and New Matrix Algebra Useful for Statis- tics,” 2000
work page 2000
-
[26]
R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012
work page 2012
-
[27]
A passivity-based approach to voltage stabilization in dc microgrids with zip loads,
P. Nahata et al. , “A passivity-based approach to voltage stabilization in dc microgrids with zip loads,” Automatica, 2020
work page 2020
-
[28]
A. Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025. arXiv: 2505.22248 [eess.SY]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.