LoopNav: Benchmarking Spatial Consistency in World Models

· 2025 · cs.CV · arXiv 2505.22976

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

The ability to simulate the world in a spatially consistent manner is a crucial requirement for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulation and planning. It must not only retain long-horizon observational information, but also enables the construction of explicit or implicit internal spatial representations. However, existing datasets do not explicitly enforce spatial consistency constraints, limiting both the ability to systematically evaluate this capability and to learn it through data-driven approaches. Furthermore, most existing benchmarks primarily emphasize visual coherence or generation quality, neglecting the requirement of long-range spatial consistency. To bridge this gap, we propose LoopNav, a dataset and corresponding benchmark centered on loop-based navigation for evaluating spatial consistency. The dataset comprises 250 hours (20 million frames) of loop-based navigation videos with actions, collected from diverse locations in the open-world environment of Minecraft. We further introduce a Scene Graph Consistency Score to quantify spatial consistency while remaining invariant to pixel-level variations. Dataset, benchmark, and code are open-sourced to support future research.

representative citing papers

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

PROWL introduces a KL-constrained adversarial curriculum and prioritized adversarial trajectory buffer to actively discover and correct rare failure modes in action-conditioned video world models.

citing papers explorer

Showing 2 of 2 citing papers.

DiLA: Disentangled Latent Action World Models cs.CV · 2026-05-15 · unverdicted · none · ref 18 · internal anchor
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
PROWL: Prioritized Regret-Driven Optimization for World Model Learning cs.LG · 2026-05-11 · unverdicted · none · ref 51 · internal anchor
PROWL introduces a KL-constrained adversarial curriculum and prioritized adversarial trajectory buffer to actively discover and correct rare failure modes in action-conditioned video world models.

LoopNav: Benchmarking Spatial Consistency in World Models

fields

years

verdicts

representative citing papers

citing papers explorer