WorldMark is the first public benchmark that standardizes scenes, trajectories, and control interfaces across heterogeneous interactive image-to-video world models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
GeoWorld-VLM distills geometric structure from camera-conditioned world models into VLMs by aligning visual features, improving spatial reasoning by about 4% on What'sUp and VSR benchmarks across two architectures while preserving language capabilities.
citing papers explorer
-
WorldMark: A Unified Benchmark Suite for Interactive Video World Models
WorldMark is the first public benchmark that standardizes scenes, trajectories, and control interfaces across heterogeneous interactive image-to-video world models.
-
GeoWorld-VLM: Geometry from World Models for Vision-Language Models
GeoWorld-VLM distills geometric structure from camera-conditioned world models into VLMs by aligning visual features, improving spatial reasoning by about 4% on What'sUp and VSR benchmarks across two architectures while preserving language capabilities.