Pith Number

pith:7NLFFWOB

pith:2024:7NLFFWOBUFPWWKQRD4NXRATHGW

not attested not anchored not stored refs resolved

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

Chaomin Shen, Feifei Feng, Jian Tang, Jinming Li, Junjie Wen, Kun Wu, Minjie Zhu, Ning Liu, Ran Cheng, Yaxin Peng, Yichen Zhu, Zhiyuan Xu

TinyVLA reaches OpenVLA-level performance on robot tasks by initializing from fast multimodal models and adding a diffusion action decoder, removing the pre-training stage entirely.

arxiv:2409.12514 v5 · 2024-09-19 · cs.RO · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7NLFFWOBUFPWWKQRD4NXRATHGW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our approach significantly outperforms the state-of-the-art VLA model, OpenVLA, in terms of speed and data efficiency, while delivering comparable or superior performance.

C2weakest assumption

That initializing the policy backbone with existing high-speed multimodal models plus a diffusion decoder during fine-tuning is sufficient to eliminate the pre-training stage while preserving or improving task performance and generalization.

C3one line summary

TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.

References

46 extracted · 46 resolved · 11 Pith anchors

[1] Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking, 2024

[2] Bridge data: Boosting generalization of robotic skills with cross-domain datasets, 2022

[3] Diffusion policy: Visuomotor policy learning via action diffusion, 2023

[4] 3d diffusion policy: Generalizable visuomo- tor policy learning via simple 3d representations, 2024

[5] Llama 2: Open Foundation and Fine-Tuned Chat Models 2023 · arXiv:2307.09288

Formal links

1 machine-checked theorem link

Cited by

24 papers in Pith

A Survey on Vision-Language-Action Models for Embodied AI

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

VLAs are Confined yet Capable of Generalizing to Novel Instructions

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

Receipt and verification

First computed	2026-05-17T23:38:13.621854Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fb5652d9c1a15f6b2a111f1b788267358946b746507846aac867f099aa1f0551

Aliases

arxiv: 2409.12514 · arxiv_version: 2409.12514v5 · doi: 10.48550/arxiv.2409.12514 · pith_short_12: 7NLFFWOBUFPW · pith_short_16: 7NLFFWOBUFPWWKQR · pith_short_8: 7NLFFWOB

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7NLFFWOBUFPWWKQRD4NXRATHGW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fb5652d9c1a15f6b2a111f1b788267358946b746507846aac867f099aa1f0551

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4b8dc1f54b57add6d3130db258de0181fe6b46834717ccd6fdea4538564fde06",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-09-19T07:10:18Z",
    "title_canon_sha256": "29fb64fb48e6a5eb3ad63b2ee904bee527a38ffc0b152de0baa1808b18c63dc6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.12514",
    "kind": "arxiv",
    "version": 5
  }
}