Benchmarking vision, language, & action models on robotic learning tasks

Pranav Guruprasad, Harshvardhan Sikka, Jaewoo Song, Yangyue Wang, Paul Pu Liang · 2024 · arXiv 2411.05821

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

cs.CV · 2025-10-04

citing papers explorer

Showing 2 of 2 citing papers.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments cs.AI · 2025-06-03 · unverdicted · none · ref 25
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization cs.CV · 2025-10-04 · unreviewed · ref 8

Benchmarking vision, language, & action models on robotic learning tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer