GT-SVJ turns video generative models into self-supervised reward judges via EBM reformulation and contrastive training on controlled synthetic degradations, claiming SOTA on GenAI-Bench and MonteBench with 30K annotations.
Vbench: Com- prehensive benchmark suite for video generative models,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.
citing papers explorer
-
GT-SVJ: Generative-Transformer-Based Self-Supervised Video Judge For Efficient Video Reward Modeling
GT-SVJ turns video generative models into self-supervised reward judges via EBM reformulation and contrastive training on controlled synthetic degradations, claiming SOTA on GenAI-Bench and MonteBench with 30K annotations.
-
FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs
FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.