pith. sign in

Mixed citations

VLA-Arena: An open-source framework for benchmarking vision-language-action models

Mixed citation behavior. Most common role is background (57%).

8 Pith papers citing it
Background 57% of classified citations

citation-role summary

background 4 dataset 2 baseline 1

citation-polarity summary

fields

cs.RO 7 cs.AI 1

years

2026 8

representative citing papers

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

citing papers explorer

Showing 8 of 8 citing papers.