HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
A synthesis method for piecewise quadratic terminal costs and configuration-constrained polytopic terminal regions in linear MPC that match the infinite-horizon LQR cost near the origin.
citing papers explorer
-
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
-
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
-
On Piecewise Quadratic Terminal Costs for MPC
A synthesis method for piecewise quadratic terminal costs and configuration-constrained polytopic terminal regions in linear MPC that match the infinite-horizon LQR cost near the origin.