pith. sign in

arxiv: 2604.14905 · v1 · submitted 2026-04-16 · 📡 eess.SY · cs.SY

Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient Approach

Pith reviewed 2026-05-10 10:54 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords data-driven controllinear quadratic integralLQIconvex optimizationpolicy gradientreference trackingclosed-loop parameterization
0
0 comments X

The pith

A data-driven closed-loop parameterization of augmented dynamics enables convex optimization of optimal LQI controllers from input-state-output measurements alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that optimal linear quadratic integral control for reference tracking can be synthesized directly from measured data without knowledge of the system matrices. It does this by deriving a parameterization of the augmented closed-loop dynamics that includes the integral state. A sympathetic reader would care because this avoids the need for explicit model identification or state augmentation during data collection, making optimal tracking control more practical for real systems with uncertain dynamics. The approach also includes a policy gradient method for computation within stabilizing gains.

Core claim

The authors derive a data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state based solely on input-state-output measurements. This leads to a convex optimization problem whose solution gives the optimal LQR feedback gain for the augmented system, enabling data-driven optimal tracking control without explicit state augmentation in data collection.

What carries the argument

The data-driven closed-loop parameterization of the augmented dynamics, which incorporates the integral state and allows formulation of the convex data-driven LQR problem without system matrices.

If this is right

  • Optimal LQI controllers for reference tracking can be obtained via convex optimization from data alone.
  • The method applies to continuous-time systems and avoids explicit state augmentation during data collection.
  • A policy gradient flow provides an alternative way to compute the optimal controller within the space of stabilizing gains.
  • The approach is demonstrated to work on a distributed generation unit in a DC microgrid.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The parameterization technique could extend to other augmented controller structures beyond integral action.
  • In practice this might allow direct deployment of optimal tracking controllers on hardware where only input-output data is accessible.
  • Similar data-driven convex formulations might apply to discrete-time or sampled-data LQI problems.

Load-bearing premise

A data-driven closed-loop parameterization of the augmented dynamics incorporating the integral state can be derived relying solely on input-state-output measurements of the underlying system.

What would settle it

Applying the proposed data-driven method to a linear system with known dynamics, computing the resulting controller gain, and comparing it to the analytically known optimal LQI gain from the model-based Riccati solution; a mismatch would falsify the claim that the parameterization yields the optimal feedback.

Figures

Figures reproduced from arXiv: 2604.14905 by Armin Gie{\ss}ler, Pol Jan\'e-Soneira, S\"oren Hohmann.

Figure 1
Figure 1. Figure 1: DGU in closed-loop with the LQI controller. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trajectories of the voltage v(t) for a time-varying load and different LQI controllers. Y = 0.02 S, achieves fast tracking without overshoot. In contrast, K1 provides fast tracking due to its large integrator gain but results in significant overshoot at t = 0.5 s and t = 2.5 s. The gain K2 eliminates overshoot but yields slower reference tracking. The gain K3 contains only an integrator term, resulting in … view at source ↗
Figure 3
Figure 3. Figure 3: Normalized residuals ∥K(t)−K⋆∥F ∥K(0)−K⋆∥F of the projected gradient flow for initial gains K(0) ∈ {K1, K2, K3}. the augmented system (12). In [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

This paper studies the data-driven synthesis of linear quadratic integral (LQI) controllers for continuous-time systems. The objective is to achieve optimal state-feedback control with integral action for reference tracking using only measured data. To this end, we derive a data-driven closed-loop parameterization of the augmented dynamics that incorporates the integral state while relying solely on input-state-output measurements of the underlying system. Based on this parameterization, a data-driven convex optimization problem is formulated whose solution yields the optimal linear quadratic regulator (LQR) feedback gain for the augmented system without explicit knowledge of the system matrices. In addition, a policy gradient flow is derived to compute the optimal controller within the space of stabilizing gains. The proposed approach enables data-driven optimal tracking control while avoiding explicit state augmentation in the data collection phase. The effectiveness of the method is demonstrated through a numerical example involving a distributed generation unit (DGU) in a DC microgrid.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a data-driven method for synthesizing optimal linear quadratic integral (LQI) controllers for continuous-time systems. It derives a closed-loop parameterization of the augmented (plant plus integral) dynamics that uses only input-state-output trajectories of the unaugmented plant, formulates a convex optimization problem whose solution is the optimal augmented LQR feedback gain, and supplies a policy-gradient flow to compute the gain within the set of stabilizing controllers. The approach is illustrated on a distributed generation unit in a DC microgrid.

Significance. If the parameterization is algebraically correct and the convex program recovers the true optimal augmented gain, the work would extend data-driven LQR methods to reference-tracking problems that require integral action, without requiring system identification or explicit augmentation of the collected data. This is a practically relevant extension for systems where zero steady-state error is mandatory.

major comments (2)
  1. [Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.
  2. [Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.
minor comments (2)
  1. [Policy gradient approach] The policy-gradient flow section should state the step-size restrictions and the invariance of the stabilizing-gain set under the flow.
  2. [Numerical example] The numerical example would benefit from a direct comparison of the data-driven gain against the model-based LQI solution and from reporting the closed-loop eigenvalues or tracking error norms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The suggestions identify opportunities to improve the exposition of the technical derivations. We address each major comment below and will make the indicated revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [Derivation of the data-driven parameterization] The central technical step is the data-driven closed-loop parameterization of the augmented dynamics (plant + integral state). The manuscript must supply the explicit algebraic derivation showing how the integral-state evolution and the reference-tracking error are encoded using only the measured input-state-output trajectories of the unaugmented plant, together with the precise persistence-of-excitation conditions that guarantee uniqueness of the parameterization.

    Authors: We agree that an explicit algebraic derivation is necessary for full transparency. In the revised manuscript we will insert a dedicated subsection that walks through the derivation step by step, showing precisely how the integral-state evolution and reference-tracking error are expressed solely in terms of the measured input-state-output trajectories of the unaugmented plant. We will also state the exact persistence-of-excitation rank conditions that guarantee uniqueness of the resulting parameterization. revision: yes

  2. Referee: [Convex optimization formulation] The convex program is asserted to yield the optimal augmented LQR gain. It must be shown that the quadratic cost and the linear constraints in the program are exactly equivalent to the infinite-horizon LQI cost for the augmented closed-loop system; any hidden dependence on the unknown system matrices or on the integral state would invalidate the claim that the solution is model-free.

    Authors: We will add a new proposition and its proof that establishes the exact equivalence between the quadratic cost and linear constraints of the convex program and the infinite-horizon LQI cost of the augmented closed-loop system. The proof will rely only on the data-driven parameterization already derived and will explicitly verify the absence of any hidden dependence on the system matrices or the integral state, thereby confirming the model-free character of the formulation. revision: yes

Circularity Check

0 steps flagged

No circularity: parameterization derived independently from data

full rationale

The paper derives a closed-loop parameterization of the augmented (plant + integral) dynamics directly from input-state-output trajectories of the unaugmented system, then uses that parameterization to pose a convex program whose solution is the optimal augmented LQR gain. No equation is shown that defines the target gain in terms of itself, renames a fitted quantity as a prediction, or reduces the central claim to a self-citation chain. The policy-gradient flow is presented as an alternative solver within the same parameterized space. The derivation is therefore self-contained against external model-based LQI benchmarks; the only potential issue is algebraic correctness of the parameterization, which is a correctness question rather than a circularity question.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. Standard assumptions of linear time-invariant dynamics, stabilizability, and sufficient data richness are implicitly required but not enumerated.

pith-pipeline@v0.9.0 · 5468 in / 1116 out tokens · 25374 ms · 2026-05-10T10:54:43.938977+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    Formulas for data-driven control: Stabilization, optimality, and robustness,

    C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control , no. 3, 2020

  2. [2]

    An approach to the linear multivariable servomechanism problem,

    P. C. Young and J. C. Willems, “An approach to the linear multivariable servomechanism problem,”International Journal of Control , no. 5, 1972. eprint: https://doi. org/10.1080/00207177208932211

  3. [3]

    Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,

    H. Purnawan, Mardlijah, and E. B. Purwanto, “Design of linear quadratic regulator (LQR) control system for flight stability of LSU-05,” in Journal of Physics: Conference Series, IOP Publishing, 2017

  4. [4]

    LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,

    B. Kedjar and K. Al-Haddad, “LQR with integral action applied to a wind energy conversion system based on doubly fed induction generator,” in 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) , ISSN: 0840-7789, 2011

  5. [5]

    Comparison of two methods of incorporating an integral action in linear quadratic regulator,

    H. G. Malkapure and M. Chidambaram, “Comparison of two methods of incorporating an integral action in linear quadratic regulator,”IFAC Proceedings Volumes, no. 1, 2014, 3rd International Conference on Advances in Control and Optimization of Dynamical Systems (2014)

  6. [6]

    Numerical Methods for H2 Related Prob- lems,

    E. Feron et al. , “Numerical Methods for H2 Related Prob- lems,” in 1992 American Control Conference , 1992

  7. [7]

    Connections Be- tween Duality in Control Theory and Convex Optimization,

    V . Balakrishnan and L. Vandenberghe, “Connections Be- tween Duality in Control Theory and Convex Optimization,” in Proceedings of 1995 American Control Conference - ACC’95, 1995

  8. [8]

    Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,

    M. Fazel et al. , “Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator,” in Proceedings of the 35th ICML , 2018

  9. [9]

    Toward a theoretical foundation of policy optimization for learning control policies,

    B. Hu et al. , “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems , no. V olume 6, 2023, 2023

  10. [10]

    PID Equivalent of Optimal Regulator,

    S Mukhopadhyay, “PID Equivalent of Optimal Regulator,” Electronics Letters, no. 25, 1978

  11. [11]

    PID Control for Multivariable Pro- cesses,

    Q.-G. Wang et al. , “PID Control for Multivariable Pro- cesses,” en,

  12. [12]

    New Criteria for Tuning PID Controllers,

    B. T. Polyak and M. V . Khlebnikov, “New Criteria for Tuning PID Controllers,” Automation and Remote Control , no. 11, 2022

  13. [13]

    Datta, M.-T

    A. Datta, M.-T. Ho, and S. P. Bhattacharyya, Structure and synthesis of PID controllers . Springer Science & Business Media, 1999

  14. [14]

    On the optimization landscape of dynamic output feedback linear quadratic control,

    J. Duan et al. , “On the optimization landscape of dynamic output feedback linear quadratic control,” IEEE Transactions on Automatic Control , no. 2, 2024

  15. [15]

    Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,

    F. Zhao et al., “Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR,” IEEE Transactions on Automatic Control, 2025

  16. [16]

    Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,

    F. Zhao, A. Chiuso, and F. D ¨orfler, “Regularization for Covariance Parameterization of Direct Data-Driven LQR Control,” IEEE Control Systems Letters , 2025

  17. [17]

    V . G. Lopez and M. A. M ¨uller, Data-based control of continuous-time linear systems with performance specifica- tions, 2025. arXiv: 2403.00424 [eess.SY]

  18. [18]

    Data-Enabled Policy Optimization for the Linear Quadratic Regulator,

    F. Zhao, F. D ¨orfler, and K. You, “Data-Enabled Policy Optimization for the Linear Quadratic Regulator,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

  19. [19]

    J. Bu, A. Mesbahi, and M. Mesbahi, On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case, 2019. arXiv: 1904.02737 [cs.SY]

  20. [20]

    J. Bu, A. Mesbahi, and M. Mesbahi, Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control ,

  21. [21]

    arXiv: 2006.09178 [eess.SY]

  22. [22]

    An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,

    V . G. Lopez and M. A. M ¨uller, “An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023

  23. [23]

    J. P. Hespanha, Linear Systems Theory , 2nd ed. Princeton, USA: Princeton University Press, 2018

  24. [24]

    I. R. Shafarevich and A. O. Remizov, Linear Algebra and Geometry. Springer Science & Business Media, 2013

  25. [25]

    Old and New Matrix Algebra Useful for Statis- tics,

    T. Minka, “Old and New Matrix Algebra Useful for Statis- tics,” 2000

  26. [26]

    R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge University Press, 2012

  27. [27]

    A passivity-based approach to voltage stabilization in dc microgrids with zip loads,

    P. Nahata et al. , “A passivity-based approach to voltage stabilization in dc microgrids with zip loads,” Automatica, 2020

  28. [28]

    Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025

    A. Gießler et al., Dynamic State-Feedback Control for LPV Systems: Ensuring Stability and LQR Performance , 2025. arXiv: 2505.22248 [eess.SY]