super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

486 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 486 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.

Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.

Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.

Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.

Midpoint Generative Models

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Midpoint Generative Models define a midpoint divergence from flow matching symmetry and derive its variational form as a tractable objective for training competitive one-step generators.

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.

DeltaCam: Differential Intrinsic Camera Modeling for Video Generation

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

DeltaCam models relative changes in camera intrinsics via Δ-parameterized neural adaptors in video diffusion models trained on synthetic data to enable controllable generation and real-world transfer.

Loki: Representation over Architecture for Diffusion-Based Portrait Animation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

DrawMotion: Generating 3D Human Motions by Freehand Drawing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

cs.CV · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

FlowErase-RL applies GRPO to reformulate concept erasure in flow matching models as reward optimization using a dynamic dual-path mechanism for target suppression and non-target preservation.

BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

citing papers explorer

Showing 50 of 486 citing papers.

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling cs.CV · 2026-04-19 · unverdicted · none · ref 9 · internal anchor
MetaEarth3D is the first generative foundation model for spatially consistent, unbounded 3D scene generation at planetary scale using optical Earth observation data.
Repurposing 3D Generative Model for Autoregressive Layout Generation cs.CV · 2026-04-17 · unverdicted · none · ref 71 · internal anchor
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing cs.LG · 2026-04-17 · unverdicted · none · ref 42 · internal anchor
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
CLIMB: Controllable Longitudinal Brain Image Generation using Mamba-based Latent Diffusion Model and Gaussian-aligned Autoencoder cs.CV · 2026-04-17 · unverdicted · none · ref 31 · internal anchor
CLIMB generates controllable longitudinal brain MRI images from baseline scans using a Mamba-based latent diffusion model and Gaussian-aligned autoencoder, reporting SSIM 0.9433 on the ADNI dataset of 6306 scans.
Frequency-Aware Flow Matching for High-Quality Image Generation cs.CV · 2026-04-16 · unverdicted · none · ref 53 · internal anchor
FreqFlow introduces frequency-aware conditioning and a two-branch architecture to flow matching, reaching FID 1.38 on ImageNet-256 and outperforming DiT and SiT.
Towards Design Compositing cs.CV · 2026-04-16 · unverdicted · none · ref 43 · internal anchor
GIST is a training-free identity-preserving image compositor that improves visual harmony when integrating disparate elements into design pipelines.
Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 64 · internal anchor
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model cs.CV · 2026-04-15 · unverdicted · none · ref 22 · internal anchor
M-GDM uses motion vectors and frame types to guide a diffusion model in blind recovery of bitstream-corrupted videos without manual masks.
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer cs.CV · 2026-04-15 · unverdicted · none · ref 28 · internal anchor
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving cs.RO · 2026-04-14 · unverdicted · none · ref 15 · internal anchor
FeaXDrive improves end-to-end autonomous driving by shifting diffusion planning to a trajectory-centric formulation with curvature-constrained training, drivable-area guidance, and GRPO post-training, yielding stronger closed-loop performance and feasibility on NAVSIM.
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment cs.CV · 2026-04-12 · unverdicted · none · ref 46 · internal anchor
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models cs.CV · 2026-04-12 · unverdicted · none · ref 36 · internal anchor
Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 38 · internal anchor
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Closed-Form Concept Erasure via Double Projections cs.LG · 2026-04-11 · unverdicted · none · ref 59 · internal anchor
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure cs.CV · 2026-04-10 · unverdicted · none · ref 42 · internal anchor
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
GroundingAnomaly: Spatially-Grounded Diffusion for Few-Shot Anomaly Synthesis cs.CV · 2026-04-09 · unverdicted · none · ref 32 · internal anchor
GroundingAnomaly uses a Spatial Conditioning Module and Gated Self-Attention in a frozen diffusion U-Net to synthesize spatially accurate few-shot anomalies, reaching SOTA on MVTec AD and VisA for detection, segmentation, and instance detection.
DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning cs.CV · 2026-04-09 · unverdicted · none · ref 21 · internal anchor
DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.
Monocular Depth Estimation From the Perspective of Feature Restoration: A Diffusion Enhanced Depth Restoration Approach cs.CV · 2026-04-09 · conditional · none · ref 46 · internal anchor
Monocular depth estimation is recast as indirect feature restoration via an invertible diffusion module plus auxiliary viewpoint enhancement, delivering 4-38% RMSE gains on KITTI over baselines.
Multimodal Large Language Models for Multi-Subject In-Context Image Generation cs.LG · 2026-04-08 · unverdicted · none · ref 6 · internal anchor
MUSIC is the first MLLM for multi-subject in-context image generation that uses an automatic data pipeline, vision chain-of-thought reasoning, and semantics-driven spatial layout planning to outperform prior methods on a new MSIC benchmark.
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 33 · internal anchor
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
MuPPet: Multi-person 2D-to-3D Pose Lifting cs.CV · 2026-04-08 · unverdicted · none · ref 53 · internal anchor
MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.
VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion cs.AI · 2026-04-08 · unverdicted · none · ref 35 · 2 links · internal anchor
VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.
Towards Robust Content Watermarking Against Removal and Forgery Attacks cs.CV · 2026-04-08 · unverdicted · none · ref 51 · internal anchor
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models cs.CV · 2026-04-06 · unverdicted · none · ref 30 · internal anchor
Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance cs.CV · 2026-04-06 · unverdicted · none · ref 46 · internal anchor
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads cs.DC · 2026-04-06 · unverdicted · none · ref 33 · internal anchor
GENSERVE improves SLO attainment by up to 44% for co-serving heterogeneous T2I and T2V diffusion workloads via step-level preemption, elastic parallelism, and joint scheduling.
MRI-to-CT synthesis using drifting models eess.IV · 2026-03-30 · unverdicted · none · ref 40 · internal anchor
Drifting models outperform diffusion, CNN, VAE, and GAN baselines in MRI-to-CT synthesis on two pelvis datasets with higher SSIM/PSNR, lower RMSE, and millisecond one-step inference.
Tunable Domain Adaptation Using Unfolding cs.LG · 2026-03-27 · unverdicted · none · ref 40 · internal anchor
Two tunable domain adaptation methods using unrolled networks achieve improved or comparable performance to domain-specific models on compressed sensing regression tasks.
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning cs.GR · 2026-03-25 · conditional · none · ref 34 · internal anchor
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control cs.CV · 2026-03-18 · unverdicted · none · ref 34 · internal anchor
O2MAG generates high-fidelity text-guided anomalies from a single image without training by manipulating self-attention in diffusion models with anomaly masks and dual enhancements.
SegviGen: Repurposing 3D Generative Model for Part Segmentation cs.CV · 2026-03-17 · unverdicted · none · ref 54 · internal anchor
SegviGen shows pretrained 3D generative models can be repurposed for part segmentation via voxel colorization, beating prior methods by 40% interactively and 15% on full segmentation using only 0.32% of labeled data.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 49 · internal anchor
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Intrinsic Concept Extraction Based on Compositional Interpretability cs.CV · 2026-03-12 · unverdicted · none · ref 40 · internal anchor
HyperExpress extracts composable intrinsic concepts from single images via hyperbolic concept learning and concept-wise optimization in diffusion-based models.
HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations cs.RO · 2026-03-03 · unverdicted · none · ref 28 · internal anchor
HoMMI learns whole-body mobile manipulation policies from robot-free human demonstrations by augmenting UMI with egocentric sensing and bridging the embodiment gap through an agnostic visual representation, relaxed head actions, and a whole-body controller.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 95 · 2 links · internal anchor
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 25 · internal anchor
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion cs.CV · 2026-02-08 · unverdicted · none · ref 83 · internal anchor
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes cs.CV · 2026-02-04 · unverdicted · none · ref 47 · internal anchor
SynthForensics is a people-centric benchmark where face-based detectors lose 13-55 AUC points on modern synthetic videos compared to legacy manipulation sets.
PixelGen: Improving Pixel Diffusion with Perceptual Supervision cs.CV · 2026-02-02 · accept · none · ref 18 · internal anchor
PixelGen augments pixel diffusion with gated perceptual supervision to reach FID 5.11 on ImageNet-256 and GenEval 0.79 in text-to-image, narrowing the gap to latent methods without VAEs.
Flexible Multitask Learning with Factorized Diffusion Policy cs.RO · 2025-12-26 · unverdicted · none · ref 37 · internal anchor
A factorized modular diffusion policy improves fitting of multimodal robot actions and enables flexible task adaptation without catastrophic forgetting.
Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving cs.CV · 2025-12-18 · unverdicted · none · ref 36 · internal anchor
A platform using flow matching for real-world image generation and an adversarial policy creates challenging corner cases to evaluate end-to-end autonomous driving models like UniAD and VAD, showing performance degradation.
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model cs.CV · 2025-11-30 · unverdicted · none · ref 43 · internal anchor
Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation cs.CV · 2025-11-24 · conditional · none · ref 51 · internal anchor
DeCo decouples high- and low-frequency generation in pixel diffusion via a DiT plus lightweight decoder and a frequency-aware flow-matching loss, reaching FID 1.62 at 256x256 and 2.22 at 512x512 on ImageNet while closing the gap to latent diffusion methods.
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on cs.CV · 2025-11-24 · unverdicted · none · ref 50 · internal anchor
A new dataset with high-fidelity close-up garment images and full/close-up try-on videos plus the VGID metric enables better texture and structure preservation in high-resolution video virtual try-on.
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation cs.CV · 2025-11-24 · unverdicted · none · ref 27 · internal anchor
ViPO enhances GRPO for visual generation by creating spatially and temporally aware advantage maps from pretrained vision models to focus optimization on perceptually important regions.
SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting cs.CV · 2025-11-21 · unverdicted · none · ref 23 · internal anchor
SPAGS reconstructs articulated objects from sparse single-state RGB images by constraining Gaussians to planar primitives, optimizing with depth and diffusion priors, and using a VLM for part segmentation and joint estimation.
Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection cs.CV · 2025-11-19 · unverdicted · none · ref 5 · internal anchor
Cross-matrix Krylov projection reuses shared subspaces from seed matrices to accelerate score pre-computation in diffusion models, delivering 15.8-43.7% time savings and up to 115x speedup versus DDPM baselines.
UniSER: A Foundation Model for Unified Soft Effects Removal cs.CV · 2025-11-18 · unverdicted · none · ref 63 · internal anchor
UniSER is a unified diffusion transformer foundation model that removes diverse soft image degradations by training on a large curated dataset of semi-transparent occlusions with fine-grained controls.
Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling cs.LG · 2025-11-12 · unverdicted · none · ref 3 · internal anchor
HSO is a bi-level optimization method with Midpoint Error Proxy and Spacing-Penalized Fitness that finds robust timestep schedules for low-NFE diffusion sampling and reports SOTA FID scores such as 11.94 at NFE=5.
Emu3.5: Native Multimodal Models are World Learners cs.CV · 2025-10-30 · unverdicted · none · ref 83 · internal anchor
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer