GRASP: A novel benchmark for evaluating language grounding and situated physics understand- ing in multimodal language models

URL https: //arxiv · 2023 · arXiv 2311.09048

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Do generative video models understand physical principles?

cs.CV · 2025-01-14 · unverdicted · novelty 8.0

Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

cs.CV · 2026-02-09 · unverdicted · novelty 6.0

VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

cs.CV · 2024-10-07 · unverdicted · novelty 6.0

PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

citing papers explorer

Showing 4 of 4 citing papers.

Do generative video models understand physical principles? cs.CV · 2025-01-14 · unverdicted · none · ref 57
Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction cs.CV · 2026-02-09 · unverdicted · none · ref 21
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 41
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation cs.CV · 2024-10-07 · unverdicted · none · ref 16
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

GRASP: A novel benchmark for evaluating language grounding and situated physics understand- ing in multimodal language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer