On the Optimization Landscape of Observer-based Dynamic Linear Quadratic Control
Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3
The pith
In observer-based dynamic LQR, the stationary points satisfy a pair of coupled symmetric discrete-time Sylvester equations rather than the usual separated LQR and Kalman designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The stationary points of the observer-based dynamic LQR objective are characterized by a pair of discrete-time Sylvester equations that share the same set of matrix elements and possess symmetric structure. These equations arise directly from setting the coupled gradients of the closed-loop quadratic cost to zero and replace the decoupled Riccati solutions that appear when the separation principle applies. The analysis therefore supplies analytical insight into the optimality conditions without relying on independent design of controller and observer.
What carries the argument
A pair of discrete-time Sylvester equations with symmetric structure, both involving the same matrix elements, that together characterize the stationary points of the observer-based dynamic LQR performance objective.
If this is right
- The combination of the standard LQR controller and the observer minimizing accumulated estimation-error covariance is not a stationary point in general.
- Numerical policy-gradient methods for learning observer-based controllers can be derived from the gradient expressions whose zero set is defined by the Sylvester pair.
- The symmetric structure of the equations supplies an explicit test for local optimality of any candidate observer-controller pair.
- Joint optimization over observer and controller parameters is required rather than sequential or alternating design.
Where Pith is reading between the lines
- The symmetric Sylvester structure may permit specialized numerical solvers that are more efficient than generic gradient descent on the joint parameter space.
- Similar coupled equations are likely to appear in other partially observed control settings where separation fails, such as risk-sensitive or finite-horizon problems.
- Reinforcement-learning agents that must learn dynamic controllers from partial observations should optimize the joint observer-controller parameters simultaneously rather than in alternation.
Load-bearing premise
Transient quadratic performance terms cannot be neglected, which prevents the gradients with respect to observer and controller parameters from decoupling.
What would settle it
For a concrete linear system, solve the pair of Sylvester equations for the stationary gains and check whether those gains coincide with the standard infinite-horizon LQR gain and steady-state Kalman filter gain; any mismatch shows that the separated design is not stationary.
Figures
read the original abstract
Understanding the optimization landscape of linear quadratic regulation (LQR) problems is fundamental to the design of efficient reinforcement learning solutions. Recent work has made significant progress in characterizing the landscape of static output-feedback control and linear quadratic Gaussian (LQG) control. For LQG, much of the analysis leverages the separation principle, which allows the controller and estimator to be designed independently. However, this simplification breaks down when the gradients with respect to the estimator and controller parameters are inherently coupled, leading to a more intricate analysis. This paper investigates the optimization landscape of observer-based dynamic output-feedback control of LQR problems. We derive the optimal observer-controller pair in settings where transient quadratic performance cannot be neglected. Our analysis reveals that, in general, the combination of the standard LQR controller and the observer that minimizes the trace of the accumulated estimation error covariance does not correspond to a stationary point of the overall closed-loop performance objective. Moreover, we derive a pair of discrete-time Sylvester equations with symmetric structure, both involving the same set of matrix elements, that characterize the stationary point of the observer-based dynamic LQR problem. These equations offer analytical insight into the structure of the optimality conditions and provide a foundation for developing numerical policy gradient methods aimed at learning complex controllers that rely on reconstructed state information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the optimization landscape of observer-based dynamic output-feedback LQR control. It shows that the standard LQR controller combined with the observer minimizing the trace of the accumulated estimation error covariance is not, in general, a stationary point of the finite-horizon closed-loop quadratic cost. The authors derive a pair of discrete-time Sylvester equations with symmetric structure, both involving the same matrix elements, that characterize the stationary points of the joint (controller, observer) optimization problem.
Significance. If the derivations hold, the work clarifies why the separation principle fails once transient quadratic costs are retained, extending prior landscape analyses of static LQR and LQG. The structured Sylvester equations supply explicit first-order optimality conditions and a foundation for policy-gradient methods that optimize dynamic controllers using reconstructed state information.
major comments (2)
- [Derivation of the stationarity conditions (around the Sylvester equations)] The central claim that the standard LQR+observer pair fails to be stationary rests on the completeness of the gradient of the closed-loop cost with respect to the observer gain. The derivation of the pair of Sylvester equations must retain every cross term that arises when the state-estimate trajectory depends simultaneously on both the controller gain K and the observer gain L. Please supply the explicit expression for the gradient (prior to simplification into the Sylvester form) so that the absence of omitted coupling terms can be verified.
- [Problem formulation and main theorem] The result is stated for settings 'where transient quadratic performance cannot be neglected.' Clarify whether the Sylvester characterization holds for arbitrary finite horizons or only under additional assumptions on the horizon length, initial conditions, or system matrices that make the transient contribution dominant.
minor comments (2)
- [Main result] Ensure that all matrix dimensions and symmetry properties are stated explicitly when the pair of Sylvester equations is introduced.
- [Discussion] Add a brief remark on how the derived equations reduce to the known LQG solution when the horizon tends to infinity.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate the requested clarifications and explicit derivations.
read point-by-point responses
-
Referee: [Derivation of the stationarity conditions (around the Sylvester equations)] The central claim that the standard LQR+observer pair fails to be stationary rests on the completeness of the gradient of the closed-loop cost with respect to the observer gain. The derivation of the pair of Sylvester equations must retain every cross term that arises when the state-estimate trajectory depends simultaneously on both the controller gain K and the observer gain L. Please supply the explicit expression for the gradient (prior to simplification into the Sylvester form) so that the absence of omitted coupling terms can be verified.
Authors: We appreciate the referee's request to verify the completeness of the gradient derivation. The state-estimate trajectory depends on both K and L, and our derivation accounts for all resulting cross terms in the closed-loop cost gradient with respect to L. To address this explicitly, we will add the unsimplified gradient expression (involving the coupled state and estimate trajectories, the quadratic cost matrices, and the finite-horizon summation) in the revised manuscript prior to its reduction to the pair of symmetric Sylvester equations. This will confirm that no coupling terms were omitted. revision: yes
-
Referee: [Problem formulation and main theorem] The result is stated for settings 'where transient quadratic performance cannot be neglected.' Clarify whether the Sylvester characterization holds for arbitrary finite horizons or only under additional assumptions on the horizon length, initial conditions, or system matrices that make the transient contribution dominant.
Authors: The Sylvester characterization holds for arbitrary finite horizons. The derivation begins directly from the finite-horizon closed-loop quadratic cost without imposing restrictions on horizon length, initial conditions, or system matrices beyond standard assumptions (e.g., stabilizability and detectability). The abstract phrasing emphasizes the contrast with the infinite-horizon case, where separation applies and the standard LQR+observer design is optimal; it does not imply that transients must dominate. We will add a clarifying remark in the problem formulation and theorem statement to make this explicit. revision: yes
Circularity Check
Derivations start from closed-loop cost and standard matrix equations; no reduction to fitted inputs or self-citations
full rationale
The paper begins from the finite-horizon closed-loop quadratic cost for observer-based dynamic output feedback, differentiates with respect to the gains K and L while retaining transient coupling (explicitly breaking separation), and arrives at a pair of symmetric discrete-time Sylvester equations. These steps use standard Lyapunov/Sylvester identities and the chain rule on the quadratic form along the coupled state-estimate trajectory; no parameter is fitted to data and then renamed a prediction, no load-bearing premise rests on a self-citation chain, and the final equations are not definitionally equivalent to the input cost. The abstract and derivation outline therefore remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption The plant is a linear time-invariant discrete-time system.
- domain assumption The performance index is a quadratic cost that includes transient terms.
- standard math Solutions to the derived discrete-time Sylvester equations exist and are unique under stabilizability and detectability.
Reference graph
Works this paper leans on
-
[1]
F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal control. John Wiley & Sons, 2012
work page 2012
-
[2]
Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,
Y . Tang, Y . Zheng, and N. Li, “Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,”Mathematical Programming, vol. 202, no. 1, pp. 399–444, 2023
work page 2023
-
[3]
Global convergence of policy gradient methods for the linear quadratic regulator,
M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inInterna- tional Conference on Machine Learning, pp. 1467–1476, PMLR, 2018
work page 2018
-
[4]
Derivative-free methods for policy optimization: Guar- antees for linear quadratic systems,
D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. Bartlett, and M. Wainwright, “Derivative-free methods for policy optimization: Guar- antees for linear quadratic systems,”Journal of Machine Learning Research, vol. 21, pp. 1–51, 2020
work page 2020
-
[5]
H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovi ´c, “Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2435–2450, 2021
work page 2021
-
[6]
Asynchronous parallel policy gradient methods for the linear quadratic regulator,
F. Zhao, X. Sha, and K. You, “Asynchronous parallel policy gradient methods for the linear quadratic regulator,”IEEE Transactions on Automatic Control, 2025
work page 2025
-
[7]
X. Chen, J. Duan, and L. Zhao, “Global optimality of single-timescale actor-critic under continuous state-action space: A study on linear quadratic regulator,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pp. 3816–3824, 2024
work page 2024
-
[8]
Optimization landscape of policy gradient methods for discrete-time static output feedback,
J. Duan, J. Li, X. Chen, K. Zhao, S. E. Li, and L. Zhao, “Optimization landscape of policy gradient methods for discrete-time static output feedback,”IEEE Transactions on Cybernetics, vol. 54, no. 6, pp. 3588– 3601, 2023
work page 2023
-
[9]
Novel results on output-feedback LQR design,
A. Ilka and N. Murgovski, “Novel results on output-feedback LQR design,”IEEE Transactions on Automatic Control, vol. 68, no. 9, pp. 5187–5200, 2022
work page 2022
-
[10]
Policy gradient methods for designing dynamic output feedback controllers,
T. Sadamoto and T. Hirai, “Policy gradient methods for designing dynamic output feedback controllers,”European Journal of Control, vol. 79, p. 101081, 2024
work page 2024
-
[11]
On the optimization landscape of dynamic output feedback linear quadratic control,
J. Duan, W. Cao, Y . Zheng, and L. Zhao, “On the optimization landscape of dynamic output feedback linear quadratic control,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 920–935, 2024
work page 2024
-
[12]
F. Zhao, X. Fu, and K. You, “Globally convergent policy gradient methods for linear quadratic control of partially observed systems,” IFAC-PapersOnLine, vol. 56, no. 2, pp. 5506–5511, 2023
work page 2023
-
[13]
B. Hu and Y . Zheng, “Connectivity of the feasible and sublevel sets of dynamic output feedback control with robustness constraints,”IEEE Control Systems Letters, vol. 7, pp. 442–447, 2022
work page 2022
-
[14]
On the global optimality of direct policy search for nonsmoothh ∞ output-feedback control,
Y . Tang and Y . Zheng, “On the global optimality of direct policy search for nonsmoothh ∞ output-feedback control,” in2023 62nd IEEE Conference on Decision and Control, pp. 6148–6153, IEEE, 2023
work page 2023
-
[15]
Optimal output feed- back learning control for continuous-time linear quadratic regulation,
K. Xie, M. Guay, M. Lu, S. Wang, and F. Deng, “Optimal output feed- back learning control for continuous-time linear quadratic regulation,” IEEE Transactions on Automatic Control, 2025
work page 2025
-
[16]
H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanovic, “On the lack of gradient domination for linear quadratic gaussian problems with incomplete state information,” in2021 60th IEEE Conference on Decision and Control, pp. 2562–2568, IEEE, 2021
work page 2021
-
[17]
Optimal dynamic output feedback design for continuous-time linear time-invariant systems,
Y . Kumar, P. V . Chanekar, and S. B. Roy, “Optimal dynamic output feedback design for continuous-time linear time-invariant systems,” IEEE Control Systems Letters, 2024
work page 2024
-
[18]
Primal-dual Q-learning framework for LQR design,
D. Lee and J. Hu, “Primal-dual Q-learning framework for LQR design,” IEEE Transactions on Automatic Control, vol. 64, no. 9, pp. 3756–3763, 2018
work page 2018
-
[19]
Toward a theoretical foundation of policy optimization for learning control policies,
B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Bas ¸ar, “Toward a theoretical foundation of policy optimization for learning control policies,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023
work page 2023
-
[20]
Formulas for data-driven control: Stabilization, optimality, and robustness,
C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,”IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019
work page 2019
-
[21]
K. Zhang, B. Hu, and T. Basar, “Policy optimization forH 2 linear control withH ∞ robustness guarantee: Implicit regularization and global convergence,” inLearning for Dynamics and Control, pp. 179– 190, PMLR, 2020
work page 2020
-
[22]
On topological properties of the set of stabilizing feedback gains,
J. Bu, A. Mesbahi, and M. Mesbahi, “On topological properties of the set of stabilizing feedback gains,”IEEE Transactions on Automatic Control, vol. 66, no. 2, pp. 730–744, 2020
work page 2020
-
[23]
Gu,Discrete-time linear systems: theory and design with applica- tions
G. Gu,Discrete-time linear systems: theory and design with applica- tions. Springer Science & Business Media, 2012
work page 2012
-
[24]
Datta,Numerical methods for linear control systems
B. Datta,Numerical methods for linear control systems. Elsevier, 2004
work page 2004
-
[25]
Guaranteed margins for LQG regulators,
J. C. Doyle, “Guaranteed margins for LQG regulators,”IEEE Transac- tions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978. 9 APPENDIX PROOF OFLEMMA5 Proof.We define the cost difference between arbitrary gains(K, L)and the optimal gains(K ‡, L‡)as J(K, L)−J(K ‡, L‡) =Tr (SK,L −S ‡)Y =Tr(∆S·Y). First, we establish the relationship between∆Sand the pa...
work page 1978
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.