Pith Number

pith:BDSGYFRK

pith:2026:BDSGYFRKE5JTBJCZHINOEC5ZYF

not attested not anchored not stored refs resolved

ASH: Agents that Self-Hone via Embodied Learning

Benjamin Schneider, Sun Sun, Victor Zhong, Xavier Schneider

ASH learns long-horizon policies in complex games by training an inverse dynamics model on its own trajectories to label unlabeled internet videos.

arxiv:2605.14211 v1 · 2026-05-14 · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{BDSGYFRKE5JTBJCZHINOEC5ZYF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ASH reaches an average of 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Legend of Zelda, while the strongest baseline gets stuck in both environments at an average of 6.5/12 and 6.0/12 milestones, respectively.

C2weakest assumption

That an inverse dynamics model trained only on the agent's own noisy, self-generated trajectories will produce sufficiently accurate action labels when applied to unrelated, low-quality internet video clips.

C3one line summary

ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.

References

53 extracted · 53 resolved · 12 Pith anchors

[1] Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052 2026

[2] Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, and Yonatan Bisk 2024

[3] Behavioral cloning from observation 2018 · doi:10.24963/ijcai.2018/

[4] URLhttps://doi.org/10.24963/ijcai.2018/687 2018 · doi:10.24963/ijcai.2018/687

[5] Video pretraining (vpt): Learning to act by watching unlabeled online videos 2022

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-17T23:39:10.927621Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

08e46c162a275330a4593a1ae20bb9c1424ede9a82d3c41f2581b542e10b7dc7

Aliases

arxiv: 2605.14211 · arxiv_version: 2605.14211v1 · doi: 10.48550/arxiv.2605.14211 · pith_short_12: BDSGYFRKE5JT · pith_short_16: BDSGYFRKE5JTBJCZ · pith_short_8: BDSGYFRK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/BDSGYFRKE5JTBJCZHINOEC5ZYF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 08e46c162a275330a4593a1ae20bb9c1424ede9a82d3c41f2581b542e10b7dc7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "6be4fe88e54af10342b12e874c2f2d2bf795f752421d702ce88c40a6790b8793",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-14T00:10:12Z",
    "title_canon_sha256": "2bf8aac2c5e4296dfd181fa0a4c12d9155bbfb1628e86d21cdcc23b948f40d28"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14211",
    "kind": "arxiv",
    "version": 1
  }
}