Pith Number

pith:55FVLLS2

pith:2026:55FVLLS257ANMDE2F5DU2B3ESM

not attested not anchored not stored refs resolved

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning

Badong Chen, Haoran Zhang, Ruiheng Wang, Shuanghao Bai, Xiangyu Xu

BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.

arxiv:2605.13382 v1 · 2026-05-13 · cs.RO

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{55FVLLS257ANMDE2F5DU2B3ESM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

BlockVLA achieves a 3.3× inference acceleration over standard discrete diffusion baselines and exhibits superior training efficiency with significant performance gains in the early stages of training on complex, long-horizon tasks.

C2weakest assumption

That maintaining autoregressive dependencies only at the block level while performing parallel denoising inside blocks preserves the original model's reasoning capabilities and does not introduce new modes of error accumulation during long-horizon execution.

C3one line summary

BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.

References

20 extracted · 20 resolved · 11 Pith anchors

[1] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models · arXiv:2308.01390

[2] Qwen Technical Report · arXiv:2309.16609

[3] Embodied robot manipulation in the era of foundation models: Planning and learning perspectives

[4] Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models · arXiv:2602.01166

[5] LLaDA2.0: Scaling Up Diffusion Language Models to 100B · arXiv:2512.15745

Receipt and verification

First computed	2026-05-18T02:44:47.824098Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

ef4b55ae5aefc0d60c9a2f474d0764930420d10ebd71b3f97da89fdac16f3e56

Aliases

arxiv: 2605.13382 · arxiv_version: 2605.13382v1 · doi: 10.48550/arxiv.2605.13382 · pith_short_12: 55FVLLS257AN · pith_short_16: 55FVLLS257ANMDE2 · pith_short_8: 55FVLLS2

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/55FVLLS257ANMDE2F5DU2B3ESM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ef4b55ae5aefc0d60c9a2f474d0764930420d10ebd71b3f97da89fdac16f3e56

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ecb922ceebb7025b32744b266b68f0d834cf5fc491a3a208020e76698e38da41",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-05-13T11:37:51Z",
    "title_canon_sha256": "b117477d7afc8cf4c8c6746c53e675be06c5f931688f7bea381ae57838da0a66"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13382",
    "kind": "arxiv",
    "version": 1
  }
}