Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient Approach

Armin Gie{\ss}ler; Pol Jan\'e-Soneira; S\"oren Hohmann

arxiv: 2604.14905 · v1 · submitted 2026-04-16 · 📡 eess.SY · cs.SY

Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient Approach

Armin Gie{\ss}ler , Pol Jan\'e-Soneira , S\"oren Hohmann This is my paper

Pith reviewed 2026-05-10 10:54 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords data-driven controllinear quadratic integralLQIconvex optimizationpolicy gradientreference trackingclosed-loop parameterization

0 comments

The pith

A data-driven closed-loop parameterization of augmented dynamics enables convex optimization of optimal LQI controllers from input-state-output measurements alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that optimal linear quadratic integral control for reference tracking can be synthesized directly from measured data without knowledge of the system matrices. It does this by deriving a parameterization of the augmented closed-loop dynamics that includes the integral state. A sympathetic reader would care because this avoids the need for explicit model identification or state augmentation during data collection, making optimal tracking control more practical for real systems with uncertain dynamics. The approach also includes a policy gradient method for computation within stabilizing gains.

Core claim

The authors derive a data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state based solely on input-state-output measurements. This leads to a convex optimization problem whose solution gives the optimal LQR feedback gain for the augmented system, enabling data-driven optimal tracking control without explicit state augmentation in data collection.

What carries the argument

The data-driven closed-loop parameterization of the augmented dynamics, which incorporates the integral state and allows formulation of the convex data-driven LQR problem without system matrices.

If this is right

Optimal LQI controllers for reference tracking can be obtained via convex optimization from data alone.
The method applies to continuous-time systems and avoids explicit state augmentation during data collection.
A policy gradient flow provides an alternative way to compute the optimal controller within the space of stabilizing gains.
The approach is demonstrated to work on a distributed generation unit in a DC microgrid.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parameterization technique could extend to other augmented controller structures beyond integral action.
In practice this might allow direct deployment of optimal tracking controllers on hardware where only input-output data is accessible.
Similar data-driven convex formulations might apply to discrete-time or sampled-data LQI problems.

Load-bearing premise

A data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state can be derived relying solely on input-state-output measurements of the underlying system.

What would settle it

Applying the proposed data-driven method to a linear system with known dynamics, computing the resulting controller gain, and comparing it to the analytically known optimal LQI gain from the model-based Riccati solution; a mismatch would falsify the claim that the parameterization yields the optimal feedback.

Figures

Figures reproduced from arXiv: 2604.14905 by Armin Gie{\ss}ler, Pol Jan\'e-Soneira, S\"oren Hohmann.

**Figure 4.** Figure 4: Trajectories of the voltage v(t) for a time-varying load and different LQI controllers. Y = 0.02 S, achieves fast tracking without overshoot. In contrast, K1 provides fast tracking due to its large integrator gain but results in significant overshoot at t = 0.5 s and t = 2.5 s. The gain K2 eliminates overshoot but yields slower reference tracking. The gain K3 contains only an integrator term, resulting in … view at source ↗

**Figure 3.** Figure 3: Normalized residuals ∥K(t)−K⋆∥F ∥K(0)−K⋆∥F of the projected gradient flow for initial gains K(0) ∈ {K1, K2, K3}. the augmented system (12). In [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

This paper studies the data-driven synthesis of linear quadratic integral (LQI) controllers for continuous-time systems. The objective is to achieve optimal state-feedback control with integral action for reference tracking using only measured data. To this end, we derive a data-driven closed-loop parameterization of the augmented dynamics that incorporates the integral state while relying solely on input-state-output measurements of the underlying system. Based on this parameterization, a data-driven convex optimization problem is formulated whose solution yields the optimal linear quadratic regulator (LQR) feedback gain for the augmented system without explicit knowledge of the system matrices. In addition, a policy gradient flow is derived to compute the optimal controller within the space of stabilizing gains. The proposed approach enables data-driven optimal tracking control while avoiding explicit state augmentation in the data collection phase. The effectiveness of the method is demonstrated through a numerical example involving a distributed generation unit (DGU) in a DC microgrid.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends data-driven LQR to continuous-time LQI by deriving an augmented closed-loop parameterization from unaugmented plant data, yielding a convex program and policy gradient method, but the parameterization's correctness is the make-or-break point.

read the letter

The main thing to know is that this work supplies a data-driven route to optimal LQI control for continuous-time linear systems. They derive a closed-loop parameterization of the plant-plus-integrator dynamics that uses only input-state-output trajectories from the original system, then turn that into a convex optimization whose solution is the optimal augmented LQR gain. They also give a policy gradient flow over stabilizing gains as an alternative. The DC microgrid example with a distributed generation unit illustrates tracking with zero steady-state error in simulation. This keeps data collection straightforward since you never have to measure or store the integral state itself. That is a practical advantage over naive augmentation. The formulation stays within the existing data-driven LQR paradigm but adds the integral piece cleanly enough to be usable for reference tracking problems. The algebra appears to reconstruct the effect of the integrator through the data matrices without explicit model knowledge, which is the step that makes the convex program work. The policy gradient derivation follows standard lines but is adapted to the augmented case. The soft spot is exactly the parameterization step highlighted in the stress test. Because the integral state is generated internally by the controller and is absent from the collected data, the mapping has to encode its dynamics, the reference, and the continuous-time evolution correctly. If that algebra has any gap, the convex solution will not match the true optimal gain. The abstract is silent on persistence-of-excitation conditions and noise robustness, so those need to be stated sharply in the paper. The contribution is incremental rather than foundational, but the application focus is reasonable. This is for researchers and engineers already working on data-driven control or microgrid regulation who want a model-free tracking method. A reader comfortable with data-driven LQR papers will follow the extension and can test the example themselves. It deserves peer review because the claim is specific, the method is implementable, and the example gives referees something concrete to examine. Send it.

Referee Report

2 major / 2 minor

Summary. The paper develops a data-driven method for synthesizing optimal linear quadratic integral (LQI) controllers for continuous-time systems. It derives a closed-loop parameterization of the augmented (plant plus integral) dynamics that uses only input-state-output trajectories of the unaugmented plant, formulates a convex optimization problem whose solution is the optimal augmented LQR feedback gain, and supplies a policy-gradient flow to compute the gain within the set of stabilizing controllers. The approach is illustrated on a distributed generation unit in a DC microgrid.

Significance. If the parameterization is algebraically correct and the convex program recovers the true optimal augmented gain, the work would extend data-driven LQR methods to reference-tracking problems that require integral action, without requiring system identification or explicit augmentation of the collected data. This is a practically relevant extension for systems where zero steady-state error is mandatory.

major comments (2)

[Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.
[Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.

minor comments (2)

[Policy gradient approach] The policy-gradient flow section should state the step-size restrictions and the invariance of the stabilizing-gain set under the flow.
[Numerical example] The numerical example would benefit from a direct comparison of the data-driven gain against the model-based LQI solution and from reporting the closed-loop eigenvalues or tracking error norms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The suggestions identify opportunities to improve the exposition of the technical derivations. We address each major comment below and will make the indicated revisions to strengthen the paper.

read point-by-point responses

Referee: [Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.

Authors: We agree that an explicit algebraic derivation is necessary for full transparency. In the revised manuscript we will insert a dedicated subsection that walks through the derivation step by step, showing precisely how the integral-state evolution and reference-tracking error are expressed solely in terms of the measured input-state-output trajectories of the unaugmented plant. We will also state the exact persistence-of-excitation rank conditions that guarantee uniqueness of the resulting parameterization. revision: yes
Referee: [Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.

Authors: We will add a new proposition and its proof that establishes the exact equivalence between the quadratic cost and linear constraints of the convex program and the infinite-horizon LQI cost of the augmented closed-loop system. The proof will rely only on the data-driven parameterization already derived and will explicitly verify the absence of any hidden dependence on the system matrices or the integral state, thereby confirming the model-free character of the formulation. revision: yes

Circularity Check

0 steps flagged

No circularity: parameterization derived independently from data

full rationale

The paper derives a closed-loop parameterization of the augmented (plant + integral) dynamics directly from input-state-output trajectories of the unaugmented system, then uses that parameterization to pose a convex program whose solution is the optimal augmented LQR gain. No equation is shown that defines the target gain in terms of itself, renames a fitted quantity as a prediction, or reduces the central claim to a self-citation chain. The policy-gradient flow is presented as an alternative solver within the same parameterized space. The derivation is therefore self-contained against external model-based LQI benchmarks; the only potential issue is algebraic correctness of the parameterization, which is a correctness question rather than a circularity question.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. Standard assumptions of linear time-invariant dynamics, stabilizability, and sufficient data richness are implicitly required but not enumerated.

pith-pipeline@v0.9.0 · 5468 in / 1116 out tokens · 25374 ms · 2026-05-10T10:54:43.938977+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Formulas for data-driven control: Stabilization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control , no. 3, 2020

work page 2020
[2]

An approach to the linear multivariable servomechanism problem,

P. C. Young and J. C. Willems, “An approach to the linear multivariable servomechanism problem,”International Journal of Control , no. 5, 1972. eprint: https://doi. org/10.1080/00207177208932211

work page doi:10.1080/00207177208932211 1972
[3]

Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,

H. Purnawan, Mardlijah, and E. B. Purwanto, “Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,” in Journal of Physics: Conference Series, IOP Publishing, 2017

work page 2017
[4]

LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,

B. Kedjar and K. Al-Haddad, “LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,” in 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) , ISSN: 0840-7789, 2011

work page 2011
[5]

Comparison of two methods of incorporating an integral action in linear quadratic regulator,

H. G. Malkapure and M. Chidambaram, “Comparison of two methods of incorporating an integral action in linear quadratic regulator,”IFAC Proceedings Volumes, no. 1, 2014, 3rd International Conference on Advances in Control and Optimization of Dynamical Systems (2014)

work page 2014
[6]

Numerical Methods for H2 Related Prob- lems,

E. Feron et al. , “Numerical Methods for H2 Related Prob- lems,” in 1992 American Control Conference , 1992

work page 1992
[7]

Connections Be- tween Duality in Control Theory and Convex Optimization,

V . Balakrishnan and L. Vandenberghe, “Connections Be- tween Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, 1995

work page 1995
[8]

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,

M. Fazel et al. , “Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,” in Proceedings of the 35th ICML , 2018

work page 2018
[9]

Toward a theoretical foundation of policy optimization for learning control policies,

B. Hu et al. , “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems , no. V olume 6, 2023, 2023

work page 2023
[10]

PID Equivalent of Optimal Regulator,

S Mukhopadhyay, “PID Equivalent of Optimal Regulator,” Electronics Letters, no. 25, 1978

work page 1978
[11]

PID Control for Multivariable Pro- cesses,

Q.-G. Wang et al. , “PID Control for Multivariable Pro- cesses,” en,

work page
[12]

New Criteria for Tuning PID Controllers,

B. T. Polyak and M. V . Khlebnikov, “New Criteria for Tuning PID Controllers,” Automation and Remote Control , no. 11, 2022

work page 2022
[13]

Datta, M.-T

A. Datta, M.-T. Ho, and S. P. Bhattacharyya, Structure and synthesis of PID controllers . Springer Science & Business Media, 1999

work page 1999
[14]

On the optimization landscape of dynamic output feedback linear quadratic control,

J. Duan et al. , “On the optimization landscape of dynamic output feedback linear quadratic control,” IEEE Transactions on Automatic Control , no. 2, 2024

work page 2024
[15]

Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,

F. Zhao et al., “Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control, 2025

work page 2025
[16]

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters , 2025

work page 2025
[17]

V . G. Lopez and M. A. M ¨uller, Data-based control of continuous-time linear systems with performance specifica- tions, 2025. arXiv: 2403.00424 [eess.SY]

work page internal anchor Pith review arXiv 2025
[18]

Data-Enabled Policy Optimization for the Linear Quadratic Regulator,

F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

work page 2023
[19]

J. Bu, A. Mesbahi, and M. Mesbahi, On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case, 2019. arXiv: 1904.02737 [cs.SY]

work page Pith review arXiv 2019
[20]

J. Bu, A. Mesbahi, and M. Mesbahi, Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control ,

work page
[21]

arXiv: 2006.09178 [eess.SY]

work page arXiv 2006
[22]

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,

V . G. Lopez and M. A. M ¨uller, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

work page 2023
[23]

J. P. Hespanha, Linear Systems Theory , 2nd ed. Princeton, USA: Princeton University Press, 2018

work page 2018
[24]

I. R. Shafarevich and A. O. Remizov, Linear Algebra and Geometry. Springer Science & Business Media, 2013

work page 2013
[25]

Old and New Matrix Algebra Useful for Statis- tics,

T. Minka, “Old and New Matrix Algebra Useful for Statis- tics,” 2000

work page 2000
[26]

R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012

work page 2012
[27]

A passivity-based approach to voltage stabilization in dc microgrids with zip loads,

P. Nahata et al. , “A passivity-based approach to voltage stabilization in dc microgrids with zip loads,” Automatica, 2020

work page 2020
[28]

Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025

A. Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025. arXiv: 2505.22248 [eess.SY]

work page arXiv 2025

[1] [1]

Formulas for data-driven control: Stabilization, optimality, and robustness,

C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control , no. 3, 2020

work page 2020

[2] [2]

An approach to the linear multivariable servomechanism problem,

P. C. Young and J. C. Willems, “An approach to the linear multivariable servomechanism problem,”International Journal of Control , no. 5, 1972. eprint: https://doi. org/10.1080/00207177208932211

work page doi:10.1080/00207177208932211 1972

[3] [3]

Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,

H. Purnawan, Mardlijah, and E. B. Purwanto, “Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,” in Journal of Physics: Conference Series, IOP Publishing, 2017

work page 2017

[4] [4]

LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,

B. Kedjar and K. Al-Haddad, “LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,” in 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) , ISSN: 0840-7789, 2011

work page 2011

[5] [5]

Comparison of two methods of incorporating an integral action in linear quadratic regulator,

H. G. Malkapure and M. Chidambaram, “Comparison of two methods of incorporating an integral action in linear quadratic regulator,”IFAC Proceedings Volumes, no. 1, 2014, 3rd International Conference on Advances in Control and Optimization of Dynamical Systems (2014)

work page 2014

[6] [6]

Numerical Methods for H2 Related Prob- lems,

E. Feron et al. , “Numerical Methods for H2 Related Prob- lems,” in 1992 American Control Conference , 1992

work page 1992

[7] [7]

Connections Be- tween Duality in Control Theory and Convex Optimization,

V . Balakrishnan and L. Vandenberghe, “Connections Be- tween Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, 1995

work page 1995

[8] [8]

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,

M. Fazel et al. , “Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,” in Proceedings of the 35th ICML , 2018

work page 2018

[9] [9]

Toward a theoretical foundation of policy optimization for learning control policies,

B. Hu et al. , “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems , no. V olume 6, 2023, 2023

work page 2023

[10] [10]

PID Equivalent of Optimal Regulator,

S Mukhopadhyay, “PID Equivalent of Optimal Regulator,” Electronics Letters, no. 25, 1978

work page 1978

[11] [11]

PID Control for Multivariable Pro- cesses,

Q.-G. Wang et al. , “PID Control for Multivariable Pro- cesses,” en,

work page

[12] [12]

New Criteria for Tuning PID Controllers,

B. T. Polyak and M. V . Khlebnikov, “New Criteria for Tuning PID Controllers,” Automation and Remote Control , no. 11, 2022

work page 2022

[13] [13]

Datta, M.-T

A. Datta, M.-T. Ho, and S. P. Bhattacharyya, Structure and synthesis of PID controllers . Springer Science & Business Media, 1999

work page 1999

[14] [14]

On the optimization landscape of dynamic output feedback linear quadratic control,

J. Duan et al. , “On the optimization landscape of dynamic output feedback linear quadratic control,” IEEE Transactions on Automatic Control , no. 2, 2024

work page 2024

[15] [15]

Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,

F. Zhao et al., “Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control, 2025

work page 2025

[16] [16]

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,

F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters , 2025

work page 2025

[17] [17]

V . G. Lopez and M. A. M ¨uller, Data-based control of continuous-time linear systems with performance specifica- tions, 2025. arXiv: 2403.00424 [eess.SY]

work page internal anchor Pith review arXiv 2025

[18] [18]

Data-Enabled Policy Optimization for the Linear Quadratic Regulator,

F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

work page 2023

[19] [19]

J. Bu, A. Mesbahi, and M. Mesbahi, On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case, 2019. arXiv: 1904.02737 [cs.SY]

work page Pith review arXiv 2019

[20] [20]

J. Bu, A. Mesbahi, and M. Mesbahi, Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control ,

work page

[21] [21]

arXiv: 2006.09178 [eess.SY]

work page arXiv 2006

[22] [22]

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,

V . G. Lopez and M. A. M ¨uller, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

work page 2023

[23] [23]

J. P. Hespanha, Linear Systems Theory , 2nd ed. Princeton, USA: Princeton University Press, 2018

work page 2018

[24] [24]

I. R. Shafarevich and A. O. Remizov, Linear Algebra and Geometry. Springer Science & Business Media, 2013

work page 2013

[25] [25]

Old and New Matrix Algebra Useful for Statis- tics,

T. Minka, “Old and New Matrix Algebra Useful for Statis- tics,” 2000

work page 2000

[26] [26]

R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012

work page 2012

[27] [27]

A passivity-based approach to voltage stabilization in dc microgrids with zip loads,

P. Nahata et al. , “A passivity-based approach to voltage stabilization in dc microgrids with zip loads,” Automatica, 2020

work page 2020

[28] [28]

Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025

A. Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025. arXiv: 2505.22248 [eess.SY]

work page arXiv 2025