HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Learned primal and dual maps conditioned on population summaries enable reliable coordination across composition shifts in large multi-agent systems, cutting forecast error 16-19% and violations 20-51% in a supply-chain case study.
citing papers explorer
-
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
-
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems
Learned primal and dual maps conditioned on population summaries enable reliable coordination across composition shifts in large multi-agent systems, cutting forecast error 16-19% and violations 20-51% in a supply-chain case study.