pith:7DDKR4XU
Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
Q-Flow stabilizes training of expressive flow-based policies in reinforcement learning by propagating terminal values backward along deterministic flow paths.
arxiv:2605.13435 v1 · 2026-05-13 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7DDKR4XUTVPMLMVNBPZAA2FTN5}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Q-Flow leverages the deterministic nature of flow dynamics to explicitly propagate terminal trajectory value to intermediate latent states along the policy-induced flow, enabling stable policy optimization using intermediate value gradients without unrolling the numerical solver.
The assumption that propagating terminal trajectory value to intermediate latent states along the flow provides reliable gradients for policy optimization without introducing bias or instability from the flow matching process itself.
Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.
References
Receipt and verification
| First computed | 2026-05-18T02:44:47.114249Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
f8c6a8f2f49d5ec5b2ad0bf20068b36f4e0483f4f460a5f7fd10d6ada7f80dcb
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7DDKR4XUTVPMLMVNBPZAA2FTN5 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f8c6a8f2f49d5ec5b2ad0bf20068b36f4e0483f4f460a5f7fd10d6ada7f80dcb
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d3f3b4c1a13b3c5ce7999a8811fedb2756df7e952a1be1811ec7abc65d2dd2fc",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T12:31:02Z",
"title_canon_sha256": "5d8746ad1b114d8b8f642d3fc3e2b0905a72d645b9d317e72b75b91c473bb228"
},
"schema_version": "1.0",
"source": {
"id": "2605.13435",
"kind": "arxiv",
"version": 1
}
}