super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

478 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 478 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.

Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.

Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.

Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.

Midpoint Generative Models

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Midpoint Generative Models define a midpoint divergence from flow matching symmetry and derive its variational form as a tractable objective for training competitive one-step generators.

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.

DeltaCam: Differential Intrinsic Camera Modeling for Video Generation

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

DeltaCam models relative changes in camera intrinsics via Δ-parameterized neural adaptors in video diffusion models trained on synthetic data to enable controllable generation and real-world transfer.

Loki: Representation over Architecture for Diffusion-Based Portrait Animation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

DrawMotion: Generating 3D Human Motions by Freehand Drawing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

IPR improves valid solution rates on MNIST Sudoku from 55.8% to 75.0% by iteratively refining partial regions in sequential diffusion models without external verifiers or reward models.

PolycubeNet: A Dual-latent Diffusion Model for Polycube-Based Hexahedral Mesh Generation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

PolycubeNet applies a dual-latent diffusion architecture to generate polycube point clouds from input point clouds, enabling robust hexahedral mesh creation without surface segmentation or templates.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

citing papers explorer

Showing 50 of 478 citing papers.

RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing cs.CV · 2025-12-07 · conditional · none · ref 42 · internal anchor
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
Multimodal Diffusion Forcing for Forceful Manipulation cs.RO · 2025-11-06 · unverdicted · none · ref 46 · internal anchor
Multimodal Diffusion Forcing trains a diffusion model on partially masked multimodal robot trajectories to learn temporal and cross-modal dependencies for forceful manipulation.
Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models cs.CV · 2025-10-18 · unverdicted · none · ref 34 · internal anchor
Introduces noise aggregation analysis with single-step small-noise injection to enable efficient and accurate membership inference attacks on diffusion models.
Exploring Cross-Modal Flows for Few-Shot Learning cs.CV · 2025-10-16 · unverdicted · none · ref 21 · internal anchor
FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner cs.AI · 2025-10-03 · unverdicted · none · ref 37 · internal anchor
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies stat.ML · 2025-09-24 · unverdicted · none · ref 58 · internal anchor
Diffusion and flow processes forget dependencies to define valid copulas then learn to remember them for density estimation and sampling, outperforming prior copula methods on complex datasets.
DiffusionNFT: Online Diffusion Reinforcement with Forward Process cs.LG · 2025-09-19 · unverdicted · none · ref 20 · internal anchor
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.
FluentAvatar: Flicker-Free Talking-Head Animation via Phoneme-Guided Autoregressive Modeling cs.CV · 2025-09-15 · unverdicted · none · ref 20 · internal anchor
Phoneme-guided autoregressive framework for talking-head animation that reduces inter-frame flicker via causal keyframe generation and timestamp-aware interpolation, outperforming diffusion baselines on FVD and a new BG-Flicker metric.
Patient-Adaptive Echocardiography using Cognitive Ultrasound eess.SP · 2025-08-12 · unverdicted · none · ref 16 · internal anchor
A temporal diffusion model enables adaptive selection of focused ultrasound transmits, outperforming random subsampling and diverging waves on EchoNet-Dynamic and in-house echocardiogram datasets while supporting real-time operation.
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE cs.AI · 2025-07-29 · unverdicted · none · ref 35 · internal anchor
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
MAGIC: Few-Shot Mask-Guided Anomaly Inpainting with Prompt Perturbation, Spatially Adaptive Guidance, and Context Awareness cs.CV · 2025-07-03 · unverdicted · none · ref 33 · internal anchor
MAGIC is a few-shot mask-guided anomaly inpainting framework using Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment to produce high-fidelity, diverse anomalies that outperform prior methods on downstream detection tasks.
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation cs.LG · 2025-06-26 · unverdicted · none · ref 22 · internal anchor
Doloris introduces dual conditional diffusion implicit bridges plus a sparsity masking strategy to model unpaired single-cell perturbation responses and reports state-of-the-art results on public datasets.
GenHSI: Controllable Generation of Human-Scene Interaction Videos cs.CV · 2025-06-24 · unverdicted · none · ref 77 · internal anchor
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
Beyond Blur: A Fluid Perspective on Generative Diffusion Models cs.GR · 2025-06-20 · unverdicted · none · ref 34 · internal anchor
Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 15 · internal anchor
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation physics.flu-dyn · 2025-05-25 · unverdicted · none · ref 96 · internal anchor
FD-Bench supplies the first modular, reproducible benchmark and leaderboard for comparing neural PDE solvers on fluid dynamics tasks with direct numerical solver baselines.
COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations cs.CV · 2025-04-25 · unverdicted · none · ref 53 · internal anchor
COCO-Inpaint supplies a large-scale dataset and evaluation protocol focused on inpainting-based image forgeries to benchmark existing detection methods.
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models cs.CV · 2025-04-17 · unverdicted · none · ref 58 · internal anchor
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models cs.CV · 2025-02-28 · unverdicted · none · ref 5 · internal anchor
Gungnir shows that style-based triggers with RAN and STTR techniques can activate backdoors in diffusion models while evading detection and surviving fine-tuning.
History-Guided Video Diffusion cs.LG · 2025-02-10 · unverdicted · none · ref 53 · internal anchor
DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.
Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline cs.CV · 2024-12-27 · unverdicted · none · ref 34 · internal anchor
Introduces the MMTT dataset of 152k manipulated facial images with masks and text descriptions, plus the ForgeryTalker model that jointly outputs localization masks and explanatory text, reporting 59.3 CIDEr and 73.67 IoU.
Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms cs.SD · 2024-11-24 · conditional · none · ref 21 · internal anchor
Stylus achieves training-free music style transfer on Mel-spectrograms by repurposing image diffusion models via style-key injection in self-attention plus phase-preserving reconstruction, outperforming baselines by 34.1% in content preservation and 25.7% in perceptual quality per 2,925 human raters
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation cs.CV · 2024-10-17 · unverdicted · none · ref 71 · internal anchor
Janus decouples visual encoding into task-specific pathways inside a single autoregressive transformer to unify multimodal understanding and generation while outperforming earlier unified models.
One Step Diffusion via Shortcut Models cs.LG · 2024-10-16 · conditional · none · ref 23 · internal anchor
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 91 · internal anchor
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation cs.CV · 2024-06-10 · conditional · none · ref 30 · internal anchor
Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 56 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 233 · internal anchor
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference cs.CV · 2023-10-06 · unverdicted · none · ref 83 · internal anchor
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023-07-10 · unverdicted · none · ref 21 · internal anchor
A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.
Scalable Diffusion Models with Transformers cs.CV · 2022-12-19 · unverdicted · none · ref 55 · internal anchor
DiTs achieve SOTA FID of 2.27 on ImageNet 256x256 by scaling transformer-based latent diffusion models, with performance improving consistently as Gflops increase.
Imagen Video: High Definition Video Generation with Diffusion Models cs.CV · 2022-10-05 · unverdicted · none · ref 17 · internal anchor
Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.
DreamFusion: Text-to-3D using 2D Diffusion cs.CV · 2022-09-29 · accept · none · ref 147 · internal anchor
Optimizes a Neural Radiance Field via probability density distillation from a 2D diffusion model to produce text-conditioned 3D scenes viewable from any angle.
Human Motion Diffusion Model cs.CV · 2022-09-29 · unverdicted · none · ref 18 · internal anchor
MDM is a classifier-free diffusion model that generates expressive human motions by predicting clean samples rather than noise, supporting text and action conditioning and outperforming prior methods on standard benchmarks.
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning cs.LG · 2022-08-12 · unverdicted · none · ref 17 · internal anchor
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding cs.CV · 2022-05-23 · accept · none · ref 64 · internal anchor
Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
Hierarchical Text-Conditional Image Generation with CLIP Latents cs.CV · 2022-04-13 · accept · none · ref 48 · internal anchor
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 26 · internal anchor
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations cs.CV · 2021-08-02 · conditional · none · ref 10 · internal anchor
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 57 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection cs.CV · 2026-05-06 · unverdicted · none · ref 39
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
Cast3: Translating numerical weather prediction principles into data-driven forecasting physics.ao-ph · 2026-05-02 · unverdicted · none · ref 39
Cast3 translates NWP principles into a data-driven model using cubed-sphere grids, super-ensembles, and generative nudging to achieve state-of-the-art ensemble predictions that outperform baselines.
Generative diffusion models for spatiotemporal influenza forecasting cs.LG · 2026-04-27 · unverdicted · none · ref 20
Influpaint uses generative diffusion models on image-encoded influenza data to produce realistic and diverse epidemic trajectories that match leading ensemble methods in accuracy.
3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement cs.CV · 2026-06-29 · unverdicted · none · ref 28 · internal anchor
Presents a scene-adaptive 3D human animation method using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human trajectories and camera views, reporting gains on two benchmarks.
EMOSH: Expressive Motion and Shape Disentanglement for Human Animation cs.CV · 2026-06-26 · unverdicted · none · ref 61 · internal anchor
EMOSH proposes an Expressive Human Model with disentangled parameters, coarse-to-fine motion injection, and spatially-aligned conditioning to generate high-fidelity expressive human videos without driving-subject shape leakage.
Latent Visual Diffusion Reasoning with Monte Carlo Tree Search cs.CV · 2026-06-26 · unverdicted · none · ref 18 · internal anchor
LVDR integrates keypoint-guided MCTS into a latent diffusion reasoning model to deliver competitive skill assessment accuracy alongside explicit visual reasoning trajectories on four sports and surgical datasets.
RS-Diffuser: Risk-Sensitive Diffusion Planning with Distributional Value Guidance cs.LG · 2026-06-26 · unverdicted · none · ref 24 · internal anchor
RS-Diffuser integrates diffusion planners, quantile regression critics, and CVaR-style guidance to produce risk-averse to risk-seeking behaviors from one model in offline RL.
PixelU: A U-Shaped Transformer for Efficient End-to-End Pixel Diffusion cs.CV · 2026-06-26 · unverdicted · none · ref 40 · internal anchor
PixelU is a minimalist U-shaped Diffusion Transformer for pixel-space diffusion that decouples frequencies with zero-cost skip connections and constant-channel downsampling, outperforming baselines like JiT-G at 1/3 the compute cost with FID 1.63 on ImageNet 256x256.
GeoFace: Consistent Multi-View Face Generation with Geometry-Constrained Diffusion cs.CV · 2026-06-26 · unverdicted · none · ref 52 · internal anchor
GeoFace generates consistent multi-view face images and 3D geometry from one input via a dual-stream diffusion framework with geometry-guided attention alignment.
Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models cs.CR · 2026-06-25 · unverdicted · none · ref 61 · internal anchor
A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer