arXiv preprint arXiv:2412.07721 (2024)

Wang, Z · 2024 · arXiv 2412.07721

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

QWERTY: Training-Free Motion Control via Query-Warped Video Diffusion Transformers

cs.CV · 2026-07-02 · unverdicted · novelty 7.0

QWERTY enables training-free motion control in pretrained image-to-video DiTs by warping the frame-invariant semantic subspace of queries in 3D full attention and using the predicted noise as self-guidance for latent optimization.

Perceptual 3D Simulation With Physical World Modeling

cs.CV · 2026-06-25 · unverdicted · novelty 5.0

P3Sim integrates a probabilistic physical world model with geometric conditioning and persistent memory to simulate 3D scenes under partial observations and incomplete transforms.

Unified 3D Scene Understanding Through Physical World Modeling

cs.CV · 2026-05-23 · unverdicted · novelty 5.0

A probabilistic graphical model called 3WM unifies 3D vision tasks into one system that performs them zero-shot by selecting different inference pathways through multimodal scene nodes.

citing papers explorer

Showing 3 of 3 citing papers after filters.

QWERTY: Training-Free Motion Control via Query-Warped Video Diffusion Transformers cs.CV · 2026-07-02 · unverdicted · none · ref 49
QWERTY enables training-free motion control in pretrained image-to-video DiTs by warping the frame-invariant semantic subspace of queries in 3D full attention and using the predicted noise as self-guidance for latent optimization.
Perceptual 3D Simulation With Physical World Modeling cs.CV · 2026-06-25 · unverdicted · none · ref 31
P3Sim integrates a probabilistic physical world model with geometric conditioning and persistent memory to simulate 3D scenes under partial observations and incomplete transforms.
Unified 3D Scene Understanding Through Physical World Modeling cs.CV · 2026-05-23 · unverdicted · none · ref 17
A probabilistic graphical model called 3WM unifies 3D vision tasks into one system that performs them zero-shot by selecting different inference pathways through multimodal scene nodes.

arXiv preprint arXiv:2412.07721 (2024)

fields

years

verdicts

representative citing papers

citing papers explorer