pith:SW3MFUY7
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
Hybrid Policy Optimization mixes pathwise and score-function gradients to keep policy updates unbiased in hybrid discrete-continuous action spaces.
arxiv:2605.14297 v1 · 2026-05-14 · cs.LG · cs.AI · math.OC · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SW3MFUY7E3GKINHIMFFDXQQHKX}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form... Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows.
The mixed gradient estimator maintains unbiasedness despite the combination of pathwise and score-function components, and that the simulator allows backpropagation where smoothness permits without introducing bias from discrete actions.
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
References
Formal links
Receipt and verification
| First computed | 2026-05-17T23:39:10.132625Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
95b6c2d31f26cca434e8614a3bc20755fd77a64cc1c201bead4661616441c705
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SW3MFUY7E3GKINHIMFFDXQQHKX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 95b6c2d31f26cca434e8614a3bc20755fd77a64cc1c201bead4661616441c705
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "e02d514969680dbb6e652b72a68618d3483c1c5330c1af21e75c6a72910658bf",
"cross_cats_sorted": [
"cs.AI",
"math.OC",
"stat.ML"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T02:59:45Z",
"title_canon_sha256": "e1114d9c38f5a4309d09384a406f3e6b004dc368b29ae41ae882ac0db629b4f5"
},
"schema_version": "1.0",
"source": {
"id": "2605.14297",
"kind": "arxiv",
"version": 1
}
}