arXiv preprint arXiv:2512.07237 (2025)

Cheng Zhang, Boying Li, Meng Wei, Yan-Pei Cao, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai · 2025 · arXiv 2512.07237

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CRePE supplies depth-aware positional distributions along curved rays for stable unified-camera control in frozen video DiT models.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Warp-as-History enables zero-shot camera trajectory following in frozen video models by supplying camera-warped pseudo-history, with single-video LoRA fine-tuning improving generalization to unseen videos.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

citing papers explorer

Showing 4 of 4 citing papers.

CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 3
CRePE supplies depth-aware positional distributions along curved rays for stable unified-camera control in frozen video DiT models.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 55
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video cs.CV · 2026-05-14 · unverdicted · none · ref 16
Warp-as-History enables zero-shot camera trajectory following in frozen video models by supplying camera-warped pseudo-history, with single-video LoRA fine-tuning improving generalization to unseen videos.
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 12
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

arXiv preprint arXiv:2512.07237 (2025)

fields

years

verdicts

representative citing papers

citing papers explorer