vla-eval decouples VLA model inference from benchmark execution via WebSocket and Docker, supporting 14 benchmarks with up to 47x speedup and reproducing published scores across six codebases.
RoboCerebra: A large-scale benchmark for long-horizon robotic manipulation evalua- tion,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
vla-eval decouples VLA model inference from benchmark execution via WebSocket and Docker, supporting 14 benchmarks with up to 47x speedup and reproducing published scores across six codebases.