PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Guofeng Zhang; Haomin Liu; Jialing Liu; Nan Wang; Xiaoyu Zhang; Xinyu Chen; Yipeng Chen; Zhenzhou Fang; Zhichao Ye

arxiv: 2511.17185 · v2 · pith:IUKOISYDnew · submitted 2025-11-21 · 💻 cs.CV

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Yipeng Chen , Zhichao Ye , Zhenzhou Fang , Xinyu Chen , Xiaoyu Zhang , Jialing Liu , Nan Wang , Guofeng Zhang

show 1 more author

Haomin Liu

This is my paper

classification 💻 cs.CV

keywords postcamvisualalignmentconsistencycontrolcross-attentiondetaildynamic

0 comments

read the original abstract

We propose PostCam, a streamlined framework for novel-view video generation that achieves superior detail preservation and precise camera trajectory editing in dynamic scenes. Current methods often struggle with a trade-off between pose-based control, which lacks visual detail, and rendering-based guidance, which is overly sensitive to geometric accuracy. Despite recent hybrid attempts, achieving precise motion and visual consistency remains challenging due to the lack of effective cross-modal alignment. We argue that robust control stems from the deep alignment of multimodal signals rather than increased input complexity. Our core contribution is the Query-Shared Cross-Attention mechanism, which projects 6-DoF poses and rendered features into a unified latent space. This allows the model to spontaneously achieve intrinsic consistency between motion cues and pixel-level guidance during denoising. Experiments demonstrate that PostCam maintains high-fidelity visual details while outperforming state-of-the-art methods by 20% in trajectory precision, exhibiting superior robustness in complex dynamic scenes. Our project webpage is publicly available at: https://cccqaq.github.io/PostCam.github.io/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
cs.CV 2026-04 unverdicted novelty 6.0

INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching...