pith. machine review for the scientific record. sign in

arxiv: 2601.23286 · v3 · submitted 2026-01-30 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Authors on Pith no claims yet
classification 💻 cs.CV cs.AIcs.LG
keywords preferencegeometricvideovideogpacoherenceconsistencygeometryvdms
0
0 comments X
read the original abstract

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, geometric plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

    cs.CV 2026-05 conditional novelty 7.0

    CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight...

  2. Geometric 4D Stitching for Grounded 4D Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    Geometric 4D Stitching explicitly complements missing geometric regions in 4D generated scenes with grounded stitches to achieve consistent 4D representations in under 10 minutes on a single GPU.