hub

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer

Xuanchi Ren, Tianshi Cao, Amirmojtaba Sabour, Tianchang Shen, Yifan Lu, Ruiyuan Gao, Tobias Pfaff, Jay Zhangjie Wu, Seung Wook Kim, Shengyu Huang, Laura Leal-Taixé, Jun Gao, Huan Ling, Sanja Fidler · 2025 · arXiv 2506.09042

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

LongLive-RAG formulates long video generation as retrieval-augmented generation by treating self-generated latents as a dynamic searchable history and adding a Window Temporal Delta Loss for better retrieval.

Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

AutoScale is a closed-loop data engine using Graph-RAE for scene representation and Cluster-GA for importance-based retrieval to improve real-synthetic co-training for autonomous driving.

ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

ScenarioControl introduces the first vision-language controllable generator for realistic vectorized 3D driving scenarios with temporal consistency across actor views.

Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.

Geometry-Aware Implicit Memory for Video World Models

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

GIM-World adds a camera-queryable geometry distillation head and pruning rule to implicit memory in video world models, claiming better long-horizon geometric consistency on the MIND benchmark than explicit and implicit baselines.

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

A multi-agent video world model using simplex rotary agent encoding and sparse hub attention achieves better fidelity, controllability, and consistency than baselines while generalizing from 2 to 4 players.

E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

E³C is a video diffusion model that disentangles persistent 3D scene structure via point-cloud memory from human dynamics via ego-exo pose controls for improved egocentric video generation on the Nymeria dataset.

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

Sensor2Sensor uses 4D Gaussian Splatting to create synthetic training pairs and a diffusion model to convert monocular dashcam videos into high-fidelity multi-modal AV sensor data.

HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

cs.CV · 2025-12-29 · unverdicted · novelty 6.0

DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

cs.CL · 2026-06-10 · unverdicted · novelty 5.0

This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.

ViPE: Video Pose Engine for 3D Geometric Perception

cs.CV · 2025-08-12 · unverdicted · novelty 5.0

ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

cs.CV · 2026-06-02 · unverdicted · novelty 4.0

OmniDreams is a real-time generative world model mid- and post-trained from the Cosmos diffusion model on 21k hours of driving data to autoregressively generate action-conditioned videos for closed-loop AV simulation.

Advancing Reliable Synthetic Video Detection: Insights from the SAFE Challenge

cs.CV · 2026-05-07 · unverdicted · novelty 4.0

The SAFE challenge shows measurable progress in detecting synthetic videos across different generators but persistent weaknesses against post-processing operations.

Advancing Open-source World Models

cs.CV · 2026-01-28 · unverdicted · novelty 4.0

LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.

World Simulation with Video Foundation Models for Physical AI

cs.CV · 2025-10-28 · unverdicted · novelty 4.0

Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer