Neoverse: Enhancing 4d world model with in-the-wild monocular videos

Yuxue Yang, Lue Fan, Ziqi Shi, Junran Peng, Feng Wang, Zhaoxiang Zhang · 2026 · arXiv 2601.00393

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.

Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

PREX decomposes target 4D video volumes into Preserve, Reveal, and Expand roles with a region-aware adapter on a frozen diffusion backbone, trained via proxy tasks, and introduces the PREBench benchmark to reduce region-structured editing failures.

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

Distilling view-consistent future views and action-outcome supervision from a generative world model into a VLM via two-stage post-training improves dynamic spatial reasoning on SAT-Real, VSI-Bench and similar benchmarks while avoiding test-time world-model cost.

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

cs.CV · 2026-04-15 · unverdicted · novelty 4.0

HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

cs.CV · 2026-05-18

citing papers explorer

Showing 7 of 7 citing papers.

No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos cs.CV · 2026-05-21 · unverdicted · none · ref 66
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning cs.CV · 2026-05-20 · unverdicted · none · ref 34
PREX decomposes target 4D video volumes into Preserve, Reveal, and Expand roles with a region-aware adapter on a frozen diffusion backbone, trained via proxy tasks, and introduces the PREBench benchmark to reduce region-structured editing failures.
World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning cs.CV · 2026-04-29 · unverdicted · none · ref 59
Distilling view-consistent future views and action-outcome supervision from a generative world model into a VLM via two-stage post-training improves dynamic spatial reasoning on SAT-Real, VSI-Bench and similar benchmarks while avoiding test-time world-model cost.
VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis cs.RO · 2026-04-10 · unverdicted · none · ref 75
VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 23
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds cs.CV · 2026-04-15 · unverdicted · none · ref 79
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving cs.CV · 2026-05-18 · unreviewed · ref 14

Neoverse: Enhancing 4d world model with in-the-wild monocular videos

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer