pith. sign in

arxiv: 2604.10635 · v1 · submitted 2026-04-12 · 📡 eess.SY · cs.SY

On the Optimization Landscape of Observer-based Dynamic Linear Quadratic Control

Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords observer-based controldynamic LQRoptimization landscapeSylvester equationsoutput-feedbackseparation principlepolicy gradient
0
0 comments X

The pith

In observer-based dynamic LQR, the stationary points satisfy a pair of coupled symmetric discrete-time Sylvester equations rather than the usual separated LQR and Kalman designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies the optimization landscape of observer-based dynamic output-feedback control for linear quadratic regulation. When finite-horizon transient costs cannot be neglected, the separation principle no longer holds and the gradients with respect to controller and observer parameters become coupled. The paper shows that the standard infinite-horizon LQR gain paired with the steady-state minimum-variance observer is not, in general, a stationary point of the overall closed-loop cost. It derives an explicit pair of discrete-time Sylvester equations with symmetric structure that together characterize all stationary points. These equations supply concrete optimality conditions and a structural foundation for policy-gradient algorithms that learn controllers from reconstructed state information.

Core claim

The stationary points of the observer-based dynamic LQR objective are characterized by a pair of discrete-time Sylvester equations that share the same set of matrix elements and possess symmetric structure. These equations arise directly from setting the coupled gradients of the closed-loop quadratic cost to zero and replace the decoupled Riccati solutions that appear when the separation principle applies. The analysis therefore supplies analytical insight into the optimality conditions without relying on independent design of controller and observer.

What carries the argument

A pair of discrete-time Sylvester equations with symmetric structure, both involving the same matrix elements, that together characterize the stationary points of the observer-based dynamic LQR performance objective.

If this is right

  • The combination of the standard LQR controller and the observer minimizing accumulated estimation-error covariance is not a stationary point in general.
  • Numerical policy-gradient methods for learning observer-based controllers can be derived from the gradient expressions whose zero set is defined by the Sylvester pair.
  • The symmetric structure of the equations supplies an explicit test for local optimality of any candidate observer-controller pair.
  • Joint optimization over observer and controller parameters is required rather than sequential or alternating design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The symmetric Sylvester structure may permit specialized numerical solvers that are more efficient than generic gradient descent on the joint parameter space.
  • Similar coupled equations are likely to appear in other partially observed control settings where separation fails, such as risk-sensitive or finite-horizon problems.
  • Reinforcement-learning agents that must learn dynamic controllers from partial observations should optimize the joint observer-controller parameters simultaneously rather than in alternation.

Load-bearing premise

Transient quadratic performance terms cannot be neglected, which prevents the gradients with respect to observer and controller parameters from decoupling.

What would settle it

For a concrete linear system, solve the pair of Sylvester equations for the stationary gains and check whether those gains coincide with the standard infinite-horizon LQR gain and steady-state Kalman filter gain; any mismatch shows that the separated design is not stationary.

Figures

Figures reproduced from arXiv: 2604.10635 by Guofa Li, Jie Li, Jingliang Duan, Lin Zhao, Liping Zhang, Liye Tang, Shengbo Eben Li, Yinsong Ma.

Figure 1
Figure 1. Figure 1: Results of general initial state correlation [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results of special initial state correlation [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Understanding the optimization landscape of linear quadratic regulation (LQR) problems is fundamental to the design of efficient reinforcement learning solutions. Recent work has made significant progress in characterizing the landscape of static output-feedback control and linear quadratic Gaussian (LQG) control. For LQG, much of the analysis leverages the separation principle, which allows the controller and estimator to be designed independently. However, this simplification breaks down when the gradients with respect to the estimator and controller parameters are inherently coupled, leading to a more intricate analysis. This paper investigates the optimization landscape of observer-based dynamic output-feedback control of LQR problems. We derive the optimal observer-controller pair in settings where transient quadratic performance cannot be neglected. Our analysis reveals that, in general, the combination of the standard LQR controller and the observer that minimizes the trace of the accumulated estimation error covariance does not correspond to a stationary point of the overall closed-loop performance objective. Moreover, we derive a pair of discrete-time Sylvester equations with symmetric structure, both involving the same set of matrix elements, that characterize the stationary point of the observer-based dynamic LQR problem. These equations offer analytical insight into the structure of the optimality conditions and provide a foundation for developing numerical policy gradient methods aimed at learning complex controllers that rely on reconstructed state information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the optimization landscape of observer-based dynamic output-feedback LQR control. It shows that the standard LQR controller combined with the observer minimizing the trace of the accumulated estimation error covariance is not, in general, a stationary point of the finite-horizon closed-loop quadratic cost. The authors derive a pair of discrete-time Sylvester equations with symmetric structure, both involving the same matrix elements, that characterize the stationary points of the joint (controller, observer) optimization problem.

Significance. If the derivations hold, the work clarifies why the separation principle fails once transient quadratic costs are retained, extending prior landscape analyses of static LQR and LQG. The structured Sylvester equations supply explicit first-order optimality conditions and a foundation for policy-gradient methods that optimize dynamic controllers using reconstructed state information.

major comments (2)
  1. [Derivation of the stationarity conditions (around the Sylvester equations)] The central claim that the standard LQR+observer pair fails to be stationary rests on the completeness of the gradient of the closed-loop cost with respect to the observer gain. The derivation of the pair of Sylvester equations must retain every cross term that arises when the state-estimate trajectory depends simultaneously on both the controller gain K and the observer gain L. Please supply the explicit expression for the gradient (prior to simplification into the Sylvester form) so that the absence of omitted coupling terms can be verified.
  2. [Problem formulation and main theorem] The result is stated for settings 'where transient quadratic performance cannot be neglected.' Clarify whether the Sylvester characterization holds for arbitrary finite horizons or only under additional assumptions on the horizon length, initial conditions, or system matrices that make the transient contribution dominant.
minor comments (2)
  1. [Main result] Ensure that all matrix dimensions and symmetry properties are stated explicitly when the pair of Sylvester equations is introduced.
  2. [Discussion] Add a brief remark on how the derived equations reduce to the known LQG solution when the horizon tends to infinity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate the requested clarifications and explicit derivations.

read point-by-point responses
  1. Referee: [Derivation of the stationarity conditions (around the Sylvester equations)] The central claim that the standard LQR+observer pair fails to be stationary rests on the completeness of the gradient of the closed-loop cost with respect to the observer gain. The derivation of the pair of Sylvester equations must retain every cross term that arises when the state-estimate trajectory depends simultaneously on both the controller gain K and the observer gain L. Please supply the explicit expression for the gradient (prior to simplification into the Sylvester form) so that the absence of omitted coupling terms can be verified.

    Authors: We appreciate the referee's request to verify the completeness of the gradient derivation. The state-estimate trajectory depends on both K and L, and our derivation accounts for all resulting cross terms in the closed-loop cost gradient with respect to L. To address this explicitly, we will add the unsimplified gradient expression (involving the coupled state and estimate trajectories, the quadratic cost matrices, and the finite-horizon summation) in the revised manuscript prior to its reduction to the pair of symmetric Sylvester equations. This will confirm that no coupling terms were omitted. revision: yes

  2. Referee: [Problem formulation and main theorem] The result is stated for settings 'where transient quadratic performance cannot be neglected.' Clarify whether the Sylvester characterization holds for arbitrary finite horizons or only under additional assumptions on the horizon length, initial conditions, or system matrices that make the transient contribution dominant.

    Authors: The Sylvester characterization holds for arbitrary finite horizons. The derivation begins directly from the finite-horizon closed-loop quadratic cost without imposing restrictions on horizon length, initial conditions, or system matrices beyond standard assumptions (e.g., stabilizability and detectability). The abstract phrasing emphasizes the contrast with the infinite-horizon case, where separation applies and the standard LQR+observer design is optimal; it does not imply that transients must dominate. We will add a clarifying remark in the problem formulation and theorem statement to make this explicit. revision: yes

Circularity Check

0 steps flagged

Derivations start from closed-loop cost and standard matrix equations; no reduction to fitted inputs or self-citations

full rationale

The paper begins from the finite-horizon closed-loop quadratic cost for observer-based dynamic output feedback, differentiates with respect to the gains K and L while retaining transient coupling (explicitly breaking separation), and arrives at a pair of symmetric discrete-time Sylvester equations. These steps use standard Lyapunov/Sylvester identities and the chain rule on the quadratic form along the coupled state-estimate trajectory; no parameter is fitted to data and then renamed a prediction, no load-bearing premise rests on a self-citation chain, and the final equations are not definitionally equivalent to the input cost. The abstract and derivation outline therefore remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

Review based on abstract only; the paper appears to rely on standard linear-system and quadratic-cost assumptions without introducing new free parameters or invented entities.

axioms (3)
  • domain assumption The plant is a linear time-invariant discrete-time system.
    Required for the standard LQR and observer setup described.
  • domain assumption The performance index is a quadratic cost that includes transient terms.
    Central to the claim that separation fails.
  • standard math Solutions to the derived discrete-time Sylvester equations exist and are unique under stabilizability and detectability.
    Implicit for the equations to characterize stationary points.

pith-pipeline@v0.9.0 · 5546 in / 1533 out tokens · 81200 ms · 2026-05-10T16:09:16.512784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal control. John Wiley & Sons, 2012

  2. [2]

    Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,

    Y . Tang, Y . Zheng, and N. Li, “Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,”Mathematical Programming, vol. 202, no. 1, pp. 399–444, 2023

  3. [3]

    Global convergence of policy gradient methods for the linear quadratic regulator,

    M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning, pp. 1467–1476, PMLR, 2018

  4. [4]

    Derivative-free methods for policy optimization: Guar- antees for linear quadratic systems,

    D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. Bartlett, and M. Wainwright, “Derivative-free methods for policy optimization: Guar- antees for linear quadratic systems,”Journal of Machine Learning Research, vol. 21, pp. 1–51, 2020

  5. [5]

    Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,

    H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021

  6. [6]

    Asynchronous parallel policy gradient methods for the linear quadratic regulator,

    F. Zhao, X. Sha, and K. You, “Asynchronous parallel policy gradient methods for the linear quadratic regulator,”IEEE Transactions on Automatic Control, 2025

  7. [7]

    Global optimality of single-timescale actor-critic under continuous state-action space: A study on linear quadratic regulator,

    X. Chen, J. Duan, and L. Zhao, “Global optimality of single-timescale actor-critic under continuous state-action space: A study on linear quadratic regulator,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pp. 3816–3824, 2024

  8. [8]

    Optimization landscape of policy gradient methods for discrete-time static output feedback,

    J. Duan, J. Li, X. Chen, K. Zhao, S. E. Li, and L. Zhao, “Optimization landscape of policy gradient methods for discrete-time static output feedback,”IEEE Transactions on Cybernetics, vol. 54, no. 6, pp. 3588– 3601, 2023

  9. [9]

    Novel results on output-feedback LQR design,

    A. Ilka and N. Murgovski, “Novel results on output-feedback LQR design,”IEEE Transactions on Automatic Control, vol. 68, no. 9, pp. 5187–5200, 2022

  10. [10]

    Policy gradient methods for designing dynamic output feedback controllers,

    T. Sadamoto and T. Hirai, “Policy gradient methods for designing dynamic output feedback controllers,”European Journal of Control, vol. 79, p. 101081, 2024

  11. [11]

    On the optimization landscape of dynamic output feedback linear quadratic control,

    J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2024

  12. [12]

    Globally convergent policy gradient methods for linear quadratic control of partially observed systems,

    F. Zhao, X. Fu, and K. You, “Globally convergent policy gradient methods for linear quadratic control of partially observed systems,” IFAC-PapersOnLine, vol. 56, no. 2, pp. 5506–5511, 2023

  13. [13]

    Connectivity of the feasible and sublevel sets of dynamic output feedback control with robustness constraints,

    B. Hu and Y . Zheng, “Connectivity of the feasible and sublevel sets of dynamic output feedback control with robustness constraints,”IEEE Control Systems Letters, vol. 7, pp. 442–447, 2022

  14. [14]

    On the global optimality of direct policy search for nonsmoothh ∞ output-feedback control,

    Y . Tang and Y . Zheng, “On the global optimality of direct policy search for nonsmoothh ∞ output-feedback control,” in2023 62nd IEEE Conference on Decision and Control, pp. 6148–6153, IEEE, 2023

  15. [15]

    Optimal output feed- back learning control for continuous-time linear quadratic regulation,

    K. Xie, M. Guay, M. Lu, S. Wang, and F. Deng, “Optimal output feed- back learning control for continuous-time linear quadratic regulation,” IEEE Transactions on Automatic Control, 2025

  16. [16]

    On the lack of gradient domination for linear quadratic gaussian problems with incomplete state information,

    H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanovic, “On the lack of gradient domination for linear quadratic gaussian problems with incomplete state information,” in2021 60th IEEE Conference on Decision and Control, pp. 2562–2568, IEEE, 2021

  17. [17]

    Optimal dynamic output feedback design for continuous-time linear time-invariant systems,

    Y . Kumar, P. V . Chanekar, and S. B. Roy, “Optimal dynamic output feedback design for continuous-time linear time-invariant systems,” IEEE Control Systems Letters, 2024

  18. [18]

    Primal-dual Q-learning framework for LQR design,

    D. Lee and J. Hu, “Primal-dual Q-learning framework for LQR design,” IEEE Transactions on Automatic Control, vol. 64, no. 9, pp. 3756–3763, 2018

  19. [19]

    Toward a theoretical foundation of policy optimization for learning control policies,

    B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Bas ¸ar, “Toward a theoretical foundation of policy optimization for learning control policies,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023

  20. [20]

    Formulas for data-driven control: Stabilization, optimality, and robustness,

    C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019

  21. [21]

    Policy optimization forH 2 linear control withH ∞ robustness guarantee: Implicit regularization and global convergence,

    K. Zhang, B. Hu, and T. Basar, “Policy optimization forH 2 linear control withH ∞ robustness guarantee: Implicit regularization and global convergence,” inLearning for Dynamics and Control, pp. 179– 190, PMLR, 2020

  22. [22]

    On topological properties of the set of stabilizing feedback gains,

    J. Bu, A. Mesbahi, and M. Mesbahi, “On topological properties of the set of stabilizing feedback gains,”IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 730–744, 2020

  23. [23]

    Gu,Discrete-time linear systems: theory and design with applica- tions

    G. Gu,Discrete-time linear systems: theory and design with applica- tions. Springer Science & Business Media, 2012

  24. [24]

    Datta,Numerical methods for linear control systems

    B. Datta,Numerical methods for linear control systems. Elsevier, 2004

  25. [25]

    Guaranteed margins for LQG regulators,

    J. C. Doyle, “Guaranteed margins for LQG regulators,”IEEE Transac- tions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978. 9 APPENDIX PROOF OFLEMMA5 Proof.We define the cost difference between arbitrary gains(K, L)and the optimal gains(K ‡, L‡)as J(K, L)−J(K ‡, L‡) =Tr (SK,L −S ‡)Y =Tr(∆S·Y). First, we establish the relationship between∆Sand the pa...