Pith Number

pith:IWBYRN2A

pith:2023:IWBYRN2ASLXKP6H7V375PJ5X4V

not attested not anchored not stored refs resolved

Vision-Language Foundation Models as Effective Robot Imitators

Chilam Cheang, Cunjun Yu, Hanbo Zhang, Hang Li, Hongtao Wu, Huaping Liu, Jie Xu, Minghuan Liu, Tao Kong, Weinan Zhang, Xinghang Li, Ya Jing

Simple fine-tuning adapts pre-trained vision-language models into robot policies that beat prior methods.

arxiv:2311.01378 v3 · 2023-11-02 · cs.RO · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{IWBYRN2ASLXKP6H7V375PJ5X4V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control.

C2weakest assumption

That modest fine-tuning on existing language-conditioned manipulation datasets is sufficient to transfer the general vision-language understanding of pre-trained VLMs into reliable sequential robot policies without catastrophic forgetting or domain shift.

C3one line summary

RoboFlamingo adapts open-source vision-language models for robot manipulation tasks via single-step comprehension plus an explicit policy head, outperforming prior methods on benchmarks with only light fine-tuning.

References

25 extracted · 25 resolved · 13 Pith anchors

[1] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances · arXiv:2204.01691

[2] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models · arXiv:2308.01390

[3] S., Purohit, S., Reynolds, L., Tow, J., Wang, B., and Weinbach, S · arXiv:2204.06745

[4] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control · arXiv:2307.15818

[5] Language models are few-shot learners 1901

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

A Survey on Vision-Language-Action Models for Embodied AI

VLAs are Confined yet Capable of Generalizing to Novel Instructions

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

Receipt and verification

First computed	2026-05-17T23:38:46.479380Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

458388b74092eea7f8ffaeffd7a7b7e55266a2aeb2963d1e0f1e15ceefe3d1fc

Aliases

arxiv: 2311.01378 · arxiv_version: 2311.01378v3 · doi: 10.48550/arxiv.2311.01378 · pith_short_12: IWBYRN2ASLXK · pith_short_16: IWBYRN2ASLXKP6H7 · pith_short_8: IWBYRN2A

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/IWBYRN2ASLXKP6H7V375PJ5X4V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 458388b74092eea7f8ffaeffd7a7b7e55266a2aeb2963d1e0f1e15ceefe3d1fc

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "acf5b854e35077856f99fbbcc551f23d4efec5c2d3f5deccf242275df7a9dba8",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2023-11-02T16:34:33Z",
    "title_canon_sha256": "a46623a0809364e13041acd5187a238e16b787a3e7101eff9db09d46cb95d7f9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.01378",
    "kind": "arxiv",
    "version": 3
  }
}