pith:MFMFMCOO
Flow Matching for Offline Reinforcement Learning with Discrete Actions
Flow matching with continuous-time Markov chains recovers the optimal policy for offline RL with discrete actions under idealized conditions.
arxiv:2602.06138 v2 · 2026-02-05 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MFMFMCOOW6LMWSYDKBNN2HV4BW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We theoretically show that, under idealized conditions, optimizing this objective recovers the optimal policy.
The idealized conditions required for the theoretical recovery of the optimal policy via the Q-weighted flow matching objective on continuous-time Markov chains.
Flow matching is adapted to discrete actions via continuous-time Markov chains and Q-weighted objectives, recovering optimal policies under idealized conditions and outperforming baselines in multi-agent and multi-objective offline RL experiments.
Cited by
Receipt and verification
| First computed | 2026-05-18T03:09:23.801072Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
61585609ceb796cb4b03505add1ebc0db5caf821c4e12b7a9affbd03d2823f50
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MFMFMCOOW6LMWSYDKBNN2HV4BW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 61585609ceb796cb4b03505add1ebc0db5caf821c4e12b7a9affbd03d2823f50
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "7873ca2cb6c902caf3f985d1bb5ba63fdb961700bedac3b36377cb032ad74207",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-05T19:13:44Z",
"title_canon_sha256": "edd1c32da07207bd2adb1ae628d3824dd48dd46a905ebe0c5d9504ee225cd208"
},
"schema_version": "1.0",
"source": {
"id": "2602.06138",
"kind": "arxiv",
"version": 2
}
}