Epipolar Geometry Improves Video Generation Models

Christian Rupprecht; Fabian Manhardt; Federico Tombari; Marta Tintore Gazulla; Orest Kupyn; Th\'eo Uscidda

arxiv: 2510.21615 · v2 · pith:MNDTZI5Hnew · submitted 2025-10-24 · 💻 cs.CV

Epipolar Geometry Improves Video Generation Models

Orest Kupyn , Th\'eo Uscidda , Marta Tintore Gazulla , Fabian Manhardt , Federico Tombari , Christian Rupprecht This is my paper

classification 💻 cs.CV

keywords geometricmodelsepipolargenerationvideoconstraintsdiffusiongeometry

0 comments

read the original abstract

Video generation models have advanced significantly through the latent diffusion transformers trained with rectified flow techniques. Yet these models still struggle with geometric inconsistencies, unstable motion, and visual artifacts that break the illusion of realistic 3D scenes. 3D-consistent video generation could significantly impact numerous downstream applications in generation and reconstruction tasks. We explore how epipolar geometry constraints improve modern video diffusion models. Despite using massive training data, these models fail to capture fundamental geometric principles. We align diffusion models using pairwise epipolar geometry constraints via preference-based optimization, directly addressing unstable trajectories and geometric artifacts through mathematically principled geometric enforcement. Our approach efficiently enforces geometric principles without requiring end-to-end differentiability. Evaluation demonstrates that classical geometric constraints provide more stable optimization signals than modern learned metrics. Training on static scenes with dynamic cameras ensures metric quality while the model generalizes to various dynamic scenes. By bridging data-driven learning with classical computer vision, we reduce epipolar error by 31% and improve human-rated consistency from 54% to 72% without compromising visual quality.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation
cs.CV 2026-06 unverdicted novelty 7.0

FLAT maps compressed video diffusion latents to explicit triangle splats via ray-centered rotation parameterization and a product window function, reporting better geometric accuracy than 3D Gaussian baselines under i...
Geo-Align: Video Generation Alignment via Metric Geometry Reward
cs.CV 2026-05 unverdicted novelty 7.0

Geo-Align applies RL with a perceptual reward derived from 3D camera trajectory estimation to improve controllability and fidelity in video generation without paired training data.
GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation
cs.CV 2026-05 unverdicted novelty 6.0

GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation
cs.CV 2026-01 unverdicted novelty 6.0

VideoGPA distills geometry priors via self-supervised DPO to enhance 3D consistency, temporal stability, and motion coherence in video diffusion models.
Feed-Forward Gaussian Splatting from Sparse Aerial Views
cs.CV 2026-05 unverdicted novelty 5.0

AnyCity reconstructs coherent 3D Gaussian urban scenes from sparse aerial views in one feed-forward pass by anchoring observation-supported geometry and applying gated residual updates conditioned on an aerial-adapted...