pith. sign in
Pith Number

pith:FHMXRTDT

pith:2025:FHMXRTDTXNYDEM4HQO32OER25V
not attested not anchored not stored refs resolved

FLARE: Robot Learning with Implicit World Modeling

Avnish Narayan, Fengyuan Hu, Furong Huang, Guanzhi Wang, Jan Kautz, Jiannan Xiang, Jing Wang, Joel Jang, Johan Bjorck, Kaushil Kundalia, Linxi Fan, Loic Magne, Qi Wang, Ruijie Zheng, Scott Reed, Seonghyeon Ye, Yinzhen Xu, You Liang Tan, Yu Fang, Yuke Zhu, Zongyu Lin

Aligning a diffusion transformer's features with future observation latents lets robot policies anticipate long-term consequences during action generation.

arxiv:2505.15659 v1 · 2025-05-21 · cs.RO · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FHMXRTDTXNYDEM4HQO32OER25V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By aligning features from a diffusion transformer with latent embeddings of future observations, FLARE enables a diffusion transformer policy to anticipate latent representations of future observations, allowing it to reason about long-term consequences while generating actions. Across two challenging multitask simulation imitation learning benchmarks spanning single-arm and humanoid tabletop manipulation, FLARE achieves state-of-the-art performance, outperforming prior policy learning baselines by up to 26%.

C2weakest assumption

That adding a few tokens for future-latent alignment to existing VLA diffusion models is sufficient to produce reliable long-horizon reasoning without additional supervision or architectural changes that would alter the core diffusion process.

C3one line summary

FLARE integrates predictive latent world modeling into diffusion transformer policies for robots, delivering up to 26% gains on multitask manipulation benchmarks and enabling co-training with action-free human videos.

References

69 extracted · 69 resolved · 19 Pith anchors

[1] H. Wu, Y . Jing, C. Cheang, G. Chen, J. Xu, X. Li, M. Liu, H. Li, and T. Kong. Unleashing large- scale video generative pre-training for visual robot manipulation. In The Twelfth International Confere 2024
[3] Unified Video Action Model 2025 · arXiv:2503.00200
[4] C. Zhu, R. Yu, S. Feng, B. Burchfiel, P. Shah, and A. Gupta. Unified world models: Coupling video and action diffusion for pretraining on large robotic datasets. 2025 2025
[5] CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models 2025 · arXiv:2503.22020
[6] Y . Du, S. Yang, B. Dai, H. Dai, O. Nachum, J. B. Tenenbaum, D. Schuurmans, and P. Abbeel. Learning universal policies via text-guided video generation. In Thirty-seventh Conference on Neural Informat 2023

Formal links

2 machine-checked theorem links

Cited by

18 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.649055Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

29d978cc73bb7032338783b7a7123aed6e038a20251717a494b8f4a7ed7a00eb

Aliases

arxiv: 2505.15659 · arxiv_version: 2505.15659v1 · doi: 10.48550/arxiv.2505.15659 · pith_short_12: FHMXRTDTXNYD · pith_short_16: FHMXRTDTXNYDEM4H · pith_short_8: FHMXRTDT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FHMXRTDTXNYDEM4HQO32OER25V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 29d978cc73bb7032338783b7a7123aed6e038a20251717a494b8f4a7ed7a00eb
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "07a761b92ad3ee2eeafe57fca02e6460a8b6c175372e9f4a7319ade349de37e0",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-05-21T15:33:27Z",
    "title_canon_sha256": "f3cd028e9eb07f663460c4832adf41299156087f6f54c3a54047e302a28fe422"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.15659",
    "kind": "arxiv",
    "version": 1
  }
}