hub

Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, Aditya Ramesh · 2024

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Detecting AI-Generated Videos with Spiking Neural Networks

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.

LPM 1.0: Video-based Character Performance Model

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.

Test-Time Training Done Right

cs.LG · 2025-05-29 · conditional · novelty 6.0

Large-chunk online updates during inference let test-time training scale state capacity to 40% of model size and handle contexts up to 1M tokens without custom kernels.

Improving Video Generation with Human Feedback

cs.CV · 2025-01-23 · unverdicted · novelty 6.0

A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

cs.RO · 2024-10-08 · unverdicted · novelty 6.0

GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.

VideoPhy: Evaluating Physical Commonsense for Video Generation

cs.CV · 2024-06-05 · conditional · novelty 6.0

VideoPhy benchmark shows state-of-the-art text-to-video models follow physical commonsense and text prompts in only 39.6% of cases for the best model.

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

cs.CV · 2024-06-04 · unverdicted · novelty 6.0

CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

cs.CV · 2024-05-16 · conditional · novelty 6.0

A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

cs.CV · 2026-03-12 · unverdicted · novelty 5.0

InSpatio-WorldFM is a frame-independent generative model that uses explicit 3D anchors and spatial memory to deliver real-time multi-view consistent spatial intelligence via a three-stage training pipeline from pretrained diffusion models.

From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

ReflectiChain uses latent trajectory rehearsal and retrospective agentic RL inside an LLM world model to raise average step rewards by 250% and restore supply-chain operability from 13.3% to 88.5% on the Semi-Sim benchmark under extreme shocks.

VRAG: Learning World Models for Interactive Video Generation

cs.CV · 2025-05-28

citing papers explorer

Showing 11 of 11 citing papers.

Detecting AI-Generated Videos with Spiking Neural Networks cs.CV · 2026-05-07 · unverdicted · none · ref 4
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.
LPM 1.0: Video-based Character Performance Model cs.CV · 2026-04-09 · unverdicted · none · ref 15
LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.
Test-Time Training Done Right cs.LG · 2025-05-29 · conditional · none · ref 52
Large-chunk online updates during inference let test-time training scale state capacity to 40% of model size and handle contexts up to 1M tokens without custom kernels.
Improving Video Generation with Human Feedback cs.CV · 2025-01-23 · unverdicted · none · ref 5
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation cs.RO · 2024-10-08 · unverdicted · none · ref 3
GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.
VideoPhy: Evaluating Physical Commonsense for Video Generation cs.CV · 2024-06-05 · conditional · none · ref 17
VideoPhy benchmark shows state-of-the-art text-to-video models follow physical commonsense and text prompts in only 39.6% of cases for the best model.
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation cs.CV · 2024-06-04 · unverdicted · none · ref 5
CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
CAT3D: Create Anything in 3D with Multi-View Diffusion Models cs.CV · 2024-05-16 · conditional · none · ref 15
A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.
InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model cs.CV · 2026-03-12 · unverdicted · none · ref 7
InSpatio-WorldFM is a frame-independent generative model that uses explicit 3D anchors and spatial memory to deliver real-time multi-view consistent spatial intelligence via a three-stage training pipeline from pretrained diffusion models.
From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience cs.AI · 2026-04-13 · unverdicted · none · ref 43
ReflectiChain uses latent trajectory rehearsal and retrospective agentic RL inside an LLM world model to raise average step rewards by 250% and restore supply-chain operability from 13.3% to 88.5% on the Semi-Sim benchmark under extreme shocks.
VRAG: Learning World Models for Interactive Video Generation cs.CV · 2025-05-28 · unreviewed · ref 8

Video generation models as world simulators

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer