Grasp: A novel benchmark for evaluating language grounding and situated physics understand- ing in multimodal language models.arXiv preprint arXiv:2311.09048, 2023

Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, Elia Bruni · 2023 · arXiv 2311.09048

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Do generative video models understand physical principles?

cs.CV · 2025-01-14 · unverdicted · novelty 8.0

Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

LiveK12Bench is a growing multi-disciplinary benchmark showing LMMs like GPT-5 drop from 79 to 53 under realistic exam constraints including process rigor and efficiency.

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

cs.CV · 2026-02-09 · unverdicted · novelty 6.0

VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

cs.CV · 2024-10-07 · unverdicted · novelty 6.0

PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Do generative video models understand physical principles? cs.CV · 2025-01-14 · unverdicted · none · ref 57
Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations? cs.AI · 2026-05-26 · unverdicted · none · ref 26
LiveK12Bench is a growing multi-disciplinary benchmark showing LMMs like GPT-5 drop from 79 to 53 under realistic exam constraints including process rigor and efficiency.
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction cs.CV · 2026-02-09 · unverdicted · none · ref 21
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 41
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation cs.CV · 2024-10-07 · unverdicted · none · ref 16
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

Grasp: A novel benchmark for evaluating language grounding and situated physics understand- ing in multimodal language models.arXiv preprint arXiv:2311.09048, 2023

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer