Omni-worldbench: Towards a comprehensive interaction-centric evaluation for world models,

· 2026 · arXiv 2603.22212

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

WorldRoamBench: An Open-World Benchmark for Long-Horizon Stability of Interactive World Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

WorldRoamBench is a new benchmark for interactive world models that evaluates four stability dimensions with custom metrics and finds no tested model performs reliably across all.

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.

Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends

cs.CV · 2026-05-31 · unverdicted · novelty 2.0

This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

citing papers explorer

Showing 4 of 4 citing papers after filters.

WorldRoamBench: An Open-World Benchmark for Long-Horizon Stability of Interactive World Models cs.CV · 2026-06-30 · unverdicted · none · ref 43
WorldRoamBench is a new benchmark for interactive world models that evaluates four stability dimensions with custom metrics and finds no tested model performs reliably across all.
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models cs.CV · 2026-05-30 · unverdicted · none · ref 81
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation cs.CV · 2026-05-25 · unverdicted · none · ref 25
WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.
Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends cs.CV · 2026-05-31 · unverdicted · none · ref 141
This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

Omni-worldbench: Towards a comprehensive interaction-centric evaluation for world models,

fields

years

verdicts

representative citing papers

citing papers explorer