RF-Sampling enhances flow matching models by implicitly performing gradient ascent on text-image alignment scores via linear textual combinations and flow inversion.
Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4verdicts
UNVERDICTED 4representative citing papers
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.
CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
citing papers explorer
-
Reflective Flow Sampling Enhancement
RF-Sampling enhances flow matching models by implicitly performing gradient ascent on text-image alignment scores via linear textual combinations and flow inversion.
-
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.
-
Open-Sora Plan: Open-Source Large Video Generation Model
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.