VSI-Bench: Benchmarking visual spatial intelligence in vision-language models

Lixin Yang, Kailin Chen, Songyou Peng, et al · 2024 · arXiv 2407.07890

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Unsteady Metrics and Benchmarking Cultures of AI Model Builders

cs.AI · 2026-05-13 · accept · novelty 8.0

AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.

Generalizing Verifiable Instruction Following

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

Introduces IFBench benchmark with 58 new constraints and demonstrates RLVR training improves generalization of language models to unseen verifiable output constraints.

Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

cs.LG · 2025-07-30 · unverdicted · novelty 4.0

Human tests should not be applied to AI to measure traits like intelligence due to calibration, validity, contamination, and prompt sensitivity issues; develop AI-specific evaluation frameworks instead.

Do Vision-Language Models Understand 3D Scenes or Just Catalogue Objects?

cs.CV · 2026-05-19

citing papers explorer

Showing 1 of 1 citing paper after filters.

Unsteady Metrics and Benchmarking Cultures of AI Model Builders cs.AI · 2026-05-13 · accept · none · ref 20
AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.

VSI-Bench: Benchmarking visual spatial intelligence in vision-language models

fields

years

verdicts

representative citing papers

citing papers explorer