pith. sign in
Pith Number

pith:7NLFFWOB

pith:2024:7NLFFWOBUFPWWKQRD4NXRATHGW
not attested not anchored not stored refs resolved

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

Chaomin Shen, Feifei Feng, Jian Tang, Jinming Li, Junjie Wen, Kun Wu, Minjie Zhu, Ning Liu, Ran Cheng, Yaxin Peng, Yichen Zhu, Zhiyuan Xu

TinyVLA reaches OpenVLA-level performance on robot tasks by initializing from fast multimodal models and adding a diffusion action decoder, removing the pre-training stage entirely.

arxiv:2409.12514 v5 · 2024-09-19 · cs.RO · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7NLFFWOBUFPWWKQRD4NXRATHGW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our approach significantly outperforms the state-of-the-art VLA model, OpenVLA, in terms of speed and data efficiency, while delivering comparable or superior performance.

C2weakest assumption

That initializing the policy backbone with existing high-speed multimodal models plus a diffusion decoder during fine-tuning is sufficient to eliminate the pre-training stage while preserving or improving task performance and generalization.

C3one line summary

TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.

References

46 extracted · 46 resolved · 11 Pith anchors

[1] Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking, 2024
[2] Bridge data: Boosting generalization of robotic skills with cross-domain datasets, 2022
[3] Diffusion policy: Visuomotor policy learning via action diffusion, 2023
[4] 3d diffusion policy: Generalizable visuomo- tor policy learning via simple 3d representations, 2024
[5] Llama 2: Open Foundation and Fine-Tuned Chat Models 2023 · arXiv:2307.09288

Formal links

1 machine-checked theorem link

Cited by

24 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.621854Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fb5652d9c1a15f6b2a111f1b788267358946b746507846aac867f099aa1f0551

Aliases

arxiv: 2409.12514 · arxiv_version: 2409.12514v5 · doi: 10.48550/arxiv.2409.12514 · pith_short_12: 7NLFFWOBUFPW · pith_short_16: 7NLFFWOBUFPWWKQR · pith_short_8: 7NLFFWOB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7NLFFWOBUFPWWKQRD4NXRATHGW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fb5652d9c1a15f6b2a111f1b788267358946b746507846aac867f099aa1f0551
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4b8dc1f54b57add6d3130db258de0181fe6b46834717ccd6fdea4538564fde06",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-09-19T07:10:18Z",
    "title_canon_sha256": "29fb64fb48e6a5eb3ad63b2ee904bee527a38ffc0b152de0baa1808b18c63dc6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.12514",
    "kind": "arxiv",
    "version": 5
  }
}