Pith Number

pith:FHMXRTDT

pith:2025:FHMXRTDTXNYDEM4HQO32OER25V

not attested not anchored not stored refs resolved

FLARE: Robot Learning with Implicit World Modeling

Avnish Narayan, Fengyuan Hu, Furong Huang, Guanzhi Wang, Jan Kautz, Jiannan Xiang, Jing Wang, Joel Jang, Johan Bjorck, Kaushil Kundalia, Linxi Fan, Loic Magne, Qi Wang, Ruijie Zheng, Scott Reed, Seonghyeon Ye, Yinzhen Xu, You Liang Tan, Yu Fang, Yuke Zhu, Zongyu Lin

Aligning a diffusion transformer's features with future observation latents lets robot policies anticipate long-term consequences during action generation.

arxiv:2505.15659 v1 · 2025-05-21 · cs.RO · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{FHMXRTDTXNYDEM4HQO32OER25V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By aligning features from a diffusion transformer with latent embeddings of future observations, FLARE enables a diffusion transformer policy to anticipate latent representations of future observations, allowing it to reason about long-term consequences while generating actions. Across two challenging multitask simulation imitation learning benchmarks spanning single-arm and humanoid tabletop manipulation, FLARE achieves state-of-the-art performance, outperforming prior policy learning baselines by up to 26%.

C2weakest assumption

That adding a few tokens for future-latent alignment to existing VLA diffusion models is sufficient to produce reliable long-horizon reasoning without additional supervision or architectural changes that would alter the core diffusion process.

C3one line summary

FLARE integrates predictive latent world modeling into diffusion transformer policies for robots, delivering up to 26% gains on multitask manipulation benchmarks and enabling co-training with action-free human videos.

References

69 extracted · 69 resolved · 19 Pith anchors

[1] H. Wu, Y . Jing, C. Cheang, G. Chen, J. Xu, X. Li, M. Liu, H. Li, and T. Kong. Unleashing large- scale video generative pre-training for visual robot manipulation. In The Twelfth International Confere 2024

[3] Unified Video Action Model 2025 · arXiv:2503.00200

[4] C. Zhu, R. Yu, S. Feng, B. Burchfiel, P. Shah, and A. Gupta. Unified world models: Coupling video and action diffusion for pretraining on large robotic datasets. 2025 2025

[5] CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models 2025 · arXiv:2503.22020

[6] Y . Du, S. Yang, B. Dai, H. Dai, O. Nachum, J. B. Tenenbaum, D. Schuurmans, and P. Abbeel. Learning universal policies via text-guided video generation. In Thirty-seventh Conference on Neural Informat 2023

Formal links

2 machine-checked theorem links

Cited by

18 papers in Pith

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control

AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature Alignment

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

Receipt and verification

First computed	2026-05-17T23:38:13.649055Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

29d978cc73bb7032338783b7a7123aed6e038a20251717a494b8f4a7ed7a00eb

Aliases

arxiv: 2505.15659 · arxiv_version: 2505.15659v1 · doi: 10.48550/arxiv.2505.15659 · pith_short_12: FHMXRTDTXNYD · pith_short_16: FHMXRTDTXNYDEM4H · pith_short_8: FHMXRTDT

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/FHMXRTDTXNYDEM4HQO32OER25V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 29d978cc73bb7032338783b7a7123aed6e038a20251717a494b8f4a7ed7a00eb

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "07a761b92ad3ee2eeafe57fca02e6460a8b6c175372e9f4a7319ade349de37e0",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-05-21T15:33:27Z",
    "title_canon_sha256": "f3cd028e9eb07f663460c4832adf41299156087f6f54c3a54047e302a28fe422"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.15659",
    "kind": "arxiv",
    "version": 1
  }
}