hub Mixed citations

Worldscore: A unified evaluation benchmark for world generation

Haoyi Duan, Hong-Xing Yu, Sirui Chen, Li Fei-Fei, Jiajun Wu · 2025 · arXiv 2504.00983

Mixed citation behavior. Most common role is background (67%).

22 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 22 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 2 method 1

citation-polarity summary

background 4 use dataset 1 use method 1

representative citing papers

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.

HumanScore: Benchmarking Human Motions in Generated Videos

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps between visual appeal and physical fidelity.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.

Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets?

cs.CV · 2025-11-21 · unverdicted · novelty 7.0

Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

cs.RO · 2025-05-19 · unverdicted · novelty 7.0

DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.

Latent Spatial Memory for Video World Models

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

TunerDiT adds event-partitioned masking and cross-event prompt fusion to diffusion transformers for training-free multi-event video generation, with gains scaling by event count on a new Meve benchmark.

Quantitative Video World Model Evaluation for Geometric-Consistency

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

cs.CV · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

cs.CV · 2025-12-01 · conditional · novelty 6.0

A new dataset and fine-tuned VLM detector/explainer called PhyDetEx shows that current T2V models still struggle to generate videos that obey physical laws, with open-source models performing worse.

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

cs.CV · 2025-07-10 · unverdicted · novelty 6.0

Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position

cs.LG · 2026-06-13 · unverdicted · novelty 5.0

The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

CP4D: Compositional Physics-aware 4D Scene Generation

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

CP4D generates physically consistent 4D scenes via compositional integration of pre-trained 3D models, hybrid simulator-diffusion motion synthesis, and automated scene composition.

OptiWorld: Optimal Control for Video World Generation under Physical Constraints

cs.CV · 2026-05-30 · unverdicted · novelty 5.0

OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

cs.RO · 2026-05-18 · unverdicted · novelty 5.0

WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

cs.CV · 2025-11-23 · unverdicted · novelty 5.0

MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

cs.CV · 2026-04-15 · unverdicted · novelty 4.0

HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

World Simulation with Video Foundation Models for Physical AI

cs.CV · 2025-10-28 · unverdicted · novelty 4.0

Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends

cs.CV · 2026-05-31 · unverdicted · novelty 2.0

This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

cs.CV · 2026-06-25 · 3 refs

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

cs.CV · 2025-12-16

citing papers explorer

Showing 19 of 19 citing papers after filters.

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models cs.CV · 2026-05-30 · unverdicted · none · ref 17
MBench is a new benchmark that quantifies long-term memory in video world models via three hierarchical consistency dimensions evaluated on curated real videos.
HumanScore: Benchmarking Human Motions in Generated Videos cs.CV · 2026-04-22 · unverdicted · none · ref 14
HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps between visual appeal and physical fidelity.
MultiWorld: Scalable Multi-Agent Multi-View Video World Models cs.CV · 2026-04-20 · unverdicted · none · ref 8
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets? cs.CV · 2025-11-21 · unverdicted · none · ref 8
Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.
DreamGen: Unlocking Generalization in Robot Learning through Video World Models cs.RO · 2025-05-19 · unverdicted · none · ref 28
DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.
Latent Spatial Memory for Video World Models cs.CV · 2026-06-08 · unverdicted · none · ref 28
Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.
TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation cs.CV · 2026-05-29 · unverdicted · none · ref 9
TunerDiT adds event-partitioned masking and cross-event prompt fusion to diffusion transformers for training-free multi-event video generation, with gains scaling by event count on a new Meve benchmark.
Quantitative Video World Model Evaluation for Geometric-Consistency cs.CV · 2026-05-14 · unverdicted · none · ref 9
PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.
Embody4D: A Generalist Data Engine for Embodied 4D World Modeling cs.CV · 2026-05-03 · unverdicted · none · ref 13 · 2 links
Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 19
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position cs.LG · 2026-06-13 · unverdicted · none · ref 13
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
CP4D: Compositional Physics-aware 4D Scene Generation cs.CV · 2026-06-08 · unverdicted · none · ref 31
CP4D generates physically consistent 4D scenes via compositional integration of pre-trained 3D models, hybrid simulator-diffusion motion synthesis, and automated scene composition.
OptiWorld: Optimal Control for Video World Generation under Physical Constraints cs.CV · 2026-05-30 · unverdicted · none · ref 60
OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.
WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform cs.RO · 2026-05-18 · unverdicted · none · ref 5
WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models cs.CV · 2025-11-23 · unverdicted · none · ref 16
MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 217
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds cs.CV · 2026-04-15 · unverdicted · none · ref 15
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
World Simulation with Video Foundation Models for Physical AI cs.CV · 2025-10-28 · unverdicted · none · ref 19
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends cs.CV · 2026-05-31 · unverdicted · none · ref 140
This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

Worldscore: A unified evaluation benchmark for world generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer