Pith Number

pith:MVQLCYDA

pith:2025:MVQLCYDAV5XYQOSFXCJZJTZ4P5

not attested not anchored not stored refs resolved

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

Adrian Li-Bell, Anna Walling, Brian Ichter, Chelsea Finn, Danny Driess, Haohuan Wang, James Tanner, Karl Pertsch, Lachy Groom, Liyiming Ke, Lucy Xiaoyang Shi, Michael Equi, Niccolo Fusai, Quan Vuong, Sergey Levine

A hierarchical vision-language model lets robots interpret complex instructions and real-time feedback to choose and perform next steps.

arxiv:2502.19417 v2 · 2025-02-26 · cs.RO · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{MVQLCYDAV5XYQOSFXCJZJTZ4P5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our system can reason through complex prompts and incorporate situated feedback during task execution ('that's not trash')

C2weakest assumption

That the high-level VLM can reliably map open-ended natural language and visual feedback into correct next-step decisions without hallucinating or misinterpreting physical context.

C3one line summary

A hierarchical VLA architecture lets robots follow complex instructions and situated feedback by separating high-level reasoning from low-level control.

References

51 extracted · 51 resolved · 15 Pith anchors

[1] RT-H: Action Hierarchies Using Language 2024 · arXiv:2403.01823

[2] PaliGemma: A versatile 3B VLM for transfer 2024 · arXiv:2407.07726

[3] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control 2024 · arXiv:2410.24164

[4] RT-1: Robotics Transformer for Real-World Control at Scale 2022 · arXiv:2212.06817

[5] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control 2023 · arXiv:2307.15818

Formal links

1 machine-checked theorem link

Cited by

32 papers in Pith

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Action with Visual Primitives

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Receipt and verification

First computed	2026-05-17T23:38:49.872422Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

6560b16060af6f883a45b89394cf3c7f69d4dc0f491bc183b2b52a5082e3020b

Aliases

arxiv: 2502.19417 · arxiv_version: 2502.19417v2 · doi: 10.48550/arxiv.2502.19417 · pith_short_12: MVQLCYDAV5XY · pith_short_16: MVQLCYDAV5XYQOSF · pith_short_8: MVQLCYDA

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MVQLCYDAV5XYQOSFXCJZJTZ4P5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6560b16060af6f883a45b89394cf3c7f69d4dc0f491bc183b2b52a5082e3020b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "2b64350ff6a13afc04f9ab60c2db11011409494aafe124453dddad25a14cfe73",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-02-26T18:58:41Z",
    "title_canon_sha256": "3b1fdea721df6a4839273c19af265454d6171db77f88a82f2f1d4d419a24a8a0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.19417",
    "kind": "arxiv",
    "version": 2
  }
}