pith. sign in
Pith Number

pith:DM6LXZ55

pith:2026:DM6LXZ552J4LVENK4XZ3IUT3PX
not attested not anchored not stored refs resolved

Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry

Carlee Joe-Wong, Tian Lan, Zuyuan Zhang

Positive semidefinite matrix descriptors of trajectory segments let reinforcement learning agents reuse local transition geometry across tasks.

arxiv:2605.14304 v1 · 2026-05-14 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DM6LXZ552J4LVENK4XZ3IUT3PX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We prove that the descriptor is well defined up to coordinate gauge, complete for the induced low-order additive signal class, additive under valid segment composition, and minimally sufficient among admissible additive descriptors. We further show that conditioning value functions on the trajectory-segment matrix yields a first-order smooth approximation of action values, enabling source-learned matrix-to-value mappings to bootstrap learning in new tasks. Empirically, MSRL achieves the best average finite-budget target AUC of 0.73.

C2weakest assumption

That the positive semidefinite matrix descriptors aggregating first- and second-order statistics of lifted one-step transitions actually expose shared hidden structure that supports valid algebraic composition and useful transfer across tasks.

C3one line summary

MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.

References

18 extracted · 18 resolved · 10 Pith anchors

[1] Proximal Policy Optimization Algorithms · arXiv:1707.06347
[2] Cochain perspectives on temporal-difference signals for learning beyond markov dynamics.arXiv preprint arXiv:2602.06939, 2026a
[3] Progressive Neural Networks · arXiv:1606.04671
[4] Operator-Guided Invariance Learning for Continuous Reinforcement Learning · arXiv:2605.06500
[5] Eigenoption Discovery through the Deep Successor Representation · arXiv:1710.11089

Formal links

3 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:10.064483Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1b3cbbe7bdd278ba91aae5f3b4527b7dc709cbc0b7f29ee7b798669a79dc29be

Aliases

arxiv: 2605.14304 · arxiv_version: 2605.14304v1 · doi: 10.48550/arxiv.2605.14304 · pith_short_12: DM6LXZ552J4L · pith_short_16: DM6LXZ552J4LVENK · pith_short_8: DM6LXZ55
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DM6LXZ552J4LVENK4XZ3IUT3PX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1b3cbbe7bdd278ba91aae5f3b4527b7dc709cbc0b7f29ee7b798669a79dc29be
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f9eee20125e076820ee708c6ede9be132a01769f7dc8b5e52a5020269aa707b8",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T03:12:29Z",
    "title_canon_sha256": "9ddb57cf15e2a03801f6e7d72fc85a4e06a4675eca41fe42b7bf00f536a27170"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14304",
    "kind": "arxiv",
    "version": 1
  }
}