VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
hub
The unreasonable effectiveness of deep features as a perceptual metric
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
method 2polarities
use method 2representative citing papers
Presents the first large-scale benchmark for multi-frame geometric distortion removal in videos under severe refractive warping, using real and synthetic data across four distortion levels and evaluating classical and learning-based methods including a proposed diffusion-based V-cache.
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
MultiAnimate adds Identifier Assigner and Identifier Adapter modules to diffusion video models so they can handle multiple characters without identity mix-ups, generalizing from two-character training data to more characters.
MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.
A selective regularization framework lets scale-ambiguous monocular depth priors improve Gaussian Splatting geometry and rendering by isolating and supervising only ill-posed regions.
MaskDiME uses adaptive masked diffusion to produce 30x faster, localized, and semantically consistent visual counterfactual explanations without training, matching or exceeding prior performance on five datasets.
ComMark embeds covert watermarks in models using frequency-domain compressed samples and simulated attacks, claiming state-of-the-art covertness and robustness across image, speech, text, and video tasks.
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
citing papers explorer
-
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
-
A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping
Presents the first large-scale benchmark for multi-frame geometric distortion removal in videos under severe refractive warping, using real and synthetic data across four distortion levels and evaluating classical and learning-based methods including a proposed diffusion-based V-cache.
-
DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
-
Event6D: Event-based Novel Object 6D Pose Tracking
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
-
MultiAnimate: Pose-Guided Image Animation Made Extensible
MultiAnimate adds Identifier Assigner and Identifier Adapter modules to diffusion video models so they can handle multiple characters without identity mix-ups, generalizing from two-character training data to more characters.
-
Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework
MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.
-
In Depth We Trust: Reliable Monocular Depth Supervision for Gaussian Splatting
A selective regularization framework lets scale-ambiguous monocular depth priors improve Gaussian Splatting geometry and rendering by isolating and supervising only ill-posed regions.
-
MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
MaskDiME uses adaptive masked diffusion to produce 30x faster, localized, and semantically consistent visual counterfactual explanations without training, matching or exceeding prior performance on five datasets.
-
ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples
ComMark embeds covert watermarks in models using frequency-domain compressed samples and simulated attacks, claiming state-of-the-art covertness and robustness across image, speech, text, and video tasks.
-
SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
- TWINGS: Thin Plate Splines Warp-aligned Initialization for Sparse-View Gaussian Splatting