pith. sign in
Pith Number

pith:6PODWBQ6

pith:2025:6PODWBQ66NHPHH4P2YAGBWZZOJ
not attested not anchored not stored refs resolved

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Amir Zadeh, Chia-Yu Hung, Chuan Li, Navonil Majumder, Pengfei Hong, Qi Sun, Soujanya Poria, U-Xuan Tan

A 3B-parameter vision-language-action model outperforms larger ones on robotic tasks with far less computation.

arxiv:2504.19854 v1 · 2025-04-28 · cs.RO · cs.AI · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6PODWBQ66NHPHH4P2YAGBWZZOJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results demonstrate that NORA outperforms existing large-scale VLA models, achieving better task performance with significantly reduced computational overhead, making it a more practical solution for real-time robotic autonomy.

C2weakest assumption

The assumption that using the Qwen-2.5-VL-3B as backbone and FAST+ tokenizer will overcome the visual encoding limitations leading to failures in tasks like object grasping, without new issues arising from the reduced model size.

C3one line summary

NORA is a compact 3B-parameter VLA model trained on 970k robot demonstrations that outperforms larger VLA models in embodied tasks while using significantly less computational resources.

References

16 extracted · 16 resolved · 14 Pith anchors

[1] Qwen2.5-VL Technical Report · arXiv:2502.13923
[2] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control · arXiv:2410.24164
[3] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control · arXiv:2307.15818
[4] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion · arXiv:2303.04137
[5] PaLM-E: An Embodied Multimodal Language Model · arXiv:2303.03378

Formal links

2 machine-checked theorem links

Cited by

24 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.387149Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f3dc3b061ef34ef39f8fd60060db39726382cb3b3c9ae9854e9305501446d3b6

Aliases

arxiv: 2504.19854 · arxiv_version: 2504.19854v1 · doi: 10.48550/arxiv.2504.19854 · pith_short_12: 6PODWBQ66NHP · pith_short_16: 6PODWBQ66NHPHH4P · pith_short_8: 6PODWBQ6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6PODWBQ66NHPHH4P2YAGBWZZOJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f3dc3b061ef34ef39f8fd60060db39726382cb3b3c9ae9854e9305501446d3b6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7716eaf2f5ee9b037088608ffdf6d906d146c65ade6c3f590ef9fba649d57556",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-04-28T14:47:34Z",
    "title_canon_sha256": "5a390a6e64ecfb8fabe23d1f69197cd49ca6c9544d82162d63a710c7e40e70a9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.19854",
    "kind": "arxiv",
    "version": 1
  }
}