pith:3TM73BSX
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Vision-language-action models unify under one framework of action token chains from inputs to actions.
arxiv:2507.01925 v1 · 2025-07-02 · cs.RO
Record completeness
Claims
current VLA models can be unified under a single framework: vision and language inputs are processed by a series of VLA modules, producing a chain of action tokens that progressively encode more grounded and actionable information, ultimately generating executable actions.
the primary design choice distinguishing VLA models lies in how action tokens are formulated, which can be categorized into language description, code, affordance, trajectory, goal state, latent representation, raw action, and reasoning.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.882054Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
dcd9fd86571f0ff4777ddb5f53d2fe3b0b829472527471f8198f0dc3a6c6dc06
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3TM73BSXD4H7I5353NPVHUX6HM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: dcd9fd86571f0ff4777ddb5f53d2fe3b0b829472527471f8198f0dc3a6c6dc06
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1684ec07c21257a1da9c84eae86ef835a4a06bedfdb53bc256ef53935533bb40",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.RO",
"submitted_at": "2025-07-02T17:34:52Z",
"title_canon_sha256": "55de44bf23e2520adab3a5805a62d82f2eff96566380ccc5cd38c9d8b684069c"
},
"schema_version": "1.0",
"source": {
"id": "2507.01925",
"kind": "arxiv",
"version": 1
}
}