pith. sign in

arxiv: 2510.21615 · v2 · pith:MNDTZI5Hnew · submitted 2025-10-24 · 💻 cs.CV

Epipolar Geometry Improves Video Generation Models

classification 💻 cs.CV
keywords geometricmodelsepipolargenerationvideoconstraintsdiffusiongeometry
0
0 comments X
read the original abstract

Video generation models have advanced significantly through the latent diffusion transformers trained with rectified flow techniques. Yet these models still struggle with geometric inconsistencies, unstable motion, and visual artifacts that break the illusion of realistic 3D scenes. 3D-consistent video generation could significantly impact numerous downstream applications in generation and reconstruction tasks. We explore how epipolar geometry constraints improve modern video diffusion models. Despite using massive training data, these models fail to capture fundamental geometric principles. We align diffusion models using pairwise epipolar geometry constraints via preference-based optimization, directly addressing unstable trajectories and geometric artifacts through mathematically principled geometric enforcement. Our approach efficiently enforces geometric principles without requiring end-to-end differentiability. Evaluation demonstrates that classical geometric constraints provide more stable optimization signals than modern learned metrics. Training on static scenes with dynamic cameras ensures metric quality while the model generalizes to various dynamic scenes. By bridging data-driven learning with classical computer vision, we reduce epipolar error by 31% and improve human-rated consistency from 54% to 72% without compromising visual quality.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

    cs.CV 2026-06 unverdicted novelty 7.0

    FLAT maps compressed video diffusion latents to explicit triangle splats via ray-centered rotation parameterization and a product window function, reporting better geometric accuracy than 3D Gaussian baselines under i...

  2. Geo-Align: Video Generation Alignment via Metric Geometry Reward

    cs.CV 2026-05 unverdicted novelty 7.0

    Geo-Align applies RL with a perceptual reward derived from 3D camera trajectory estimation to improve controllability and fidelity in video generation without paired training data.

  3. GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.

  4. VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

    cs.CV 2026-01 unverdicted novelty 6.0

    VideoGPA distills geometry priors via self-supervised DPO to enhance 3D consistency, temporal stability, and motion coherence in video diffusion models.

  5. Feed-Forward Gaussian Splatting from Sparse Aerial Views

    cs.CV 2026-05 unverdicted novelty 5.0

    AnyCity reconstructs coherent 3D Gaussian urban scenes from sparse aerial views in one feed-forward pass by anchoring observation-supported geometry and applying gated residual updates conditioned on an aerial-adapted...