super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

489 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 489 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.

Diffusion Model Attribution via Spectral Coupling of Denoiser Responses

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

SDS extracts stable spectral signatures from diffusion model denoisers via frequency-controlled perturbations, achieving 99.9% attribution accuracy across eight models and 96.2% under prompt shift.

Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.

Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.

Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.

Midpoint Generative Models

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Midpoint Generative Models define a midpoint divergence from flow matching symmetry and derive its variational form as a tractable objective for training competitive one-step generators.

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.

DeltaCam: Differential Intrinsic Camera Modeling for Video Generation

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

DeltaCam models relative changes in camera intrinsics via Δ-parameterized neural adaptors in video diffusion models trained on synthetic data to enable controllable generation and real-world transfer.

Loki: Representation over Architecture for Diffusion-Based Portrait Animation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

DrawMotion: Generating 3D Human Motions by Freehand Drawing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

cs.CV · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

FlowErase-RL applies GRPO to reformulate concept erasure in flow matching models as reward optimization using a dynamic dual-path mechanism for target suppression and non-target preservation.

citing papers explorer

Showing 50 of 489 citing papers.

Generative diffusion learning for parametric partial differential equations math.NA · 2023-05-24 · unverdicted · none · ref 23 · internal anchor
A conditional DDPM framework is introduced to approximate solution operators for parameter-dependent PDEs, achieving accuracy comparable to FNO while recovering noise levels and providing confidence intervals.
Shap-E: Generating Conditional 3D Implicit Functions cs.CV · 2023-05-03 · accept · none · ref 61 · internal anchor
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
Scaling Robot Learning with Semantically Imagined Experience cs.RO · 2023-02-22 · unverdicted · none · ref 47 · internal anchor
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
Latent Video Diffusion Models for High-Fidelity Long Video Generation cs.CV · 2022-11-23 · unverdicted · none · ref 32 · internal anchor
Latent-space hierarchical diffusion models with targeted error-correction techniques generate realistic videos exceeding 1000 frames while using less compute than prior pixel-space approaches.
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed cs.LG · 2021-01-07 · unverdicted · none · ref 34 · internal anchor
Denoising Student distills the multi-step denoising process of score-based and diffusion models into a single forward pass, matching GAN sampling speed while producing comparable sample quality on CIFAR-10, CelebA, and 256x256 LSUN.
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning cs.LG · 2026-05-06 · unverdicted · none · ref 40
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling cs.CV · 2026-05-06 · unverdicted · none · ref 49
The contact matrix approach in a diffusion model, paired with specialized VQ-VAE, enables more precise and realistic generation of interactive duet dance motions compared to prior methods.
No Prompt, No Leaks: A Robust Generative Steganography Framework via Prompt-Free Diffusion cs.CV · 2026-06-30 · unverdicted · none · ref 37 · internal anchor
Proposes a prompt-free latent diffusion steganography method using style semantics, CACM for reversible mapping, and predictor-corrector sampling to improve stego image quality and secret recovery.
Scenario-conditioned flow matching for probabilistic generation of three-component ground-motion waveforms physics.geo-ph · 2026-06-30 · unverdicted · none · ref 64 · internal anchor
WaveFlowGMM generates scenario-conditioned three-component ground-motion waveforms by using symbolic learning for PGA amplitude and AlphaFlow for normalized wavelet-packet waveforms that are later rescaled.
Diffusion-based 4D Trajectory Prediction and Distributed Control for UAV Swarms cs.RO · 2026-06-30 · unverdicted · none · ref 34 · internal anchor
A diffusion-based 4D trajectory predictor with dimension decoupling and residual refinement, integrated into DNMPC, reduces UAV swarm tracking error by 10-15% while running at 34 FPS on a new multi-scenario dataset.
OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models cs.LG · 2026-06-30 · unverdicted · none · ref 38 · internal anchor
OTCache uses optimal transport to interpolate caching schedules between a graph-based reference and an Optuna-optimized anchor, delivering 3.66x-4.7x speedups on FLUX.1, Qwen-Image and HunyuanVideo with improved fidelity.
GPC: Large-Scale Generative Pretraining for Transferable Motor Control cs.CV · 2026-06-28 · unverdicted · none · ref 130 · internal anchor
GPC learns a motion vocabulary via Finite Scalar Quantization and end-to-end RL, then trains an autoregressive transformer for next-token control generation, achieving 99.98% motion reproduction success with emergent robustness.
pop-cosmos: Galaxy size evolution across structural and star-formation classifications in COSMOS-Web astro-ph.GA · 2026-06-26 · unverdicted · none · ref 128 · internal anchor
Galaxy size-mass relations exhibit double power-law breaks at different pivot masses for quiescent versus bulge-dominated samples, coinciding with AGN activity scales.
Scalable Multi-Task Data Generation via Reinforcement Learning for Language-Conditioned Bimanual Dexterous Manipulation cs.RO · 2026-06-21 · unverdicted · none · ref 36 · internal anchor
An RL data generation pipeline with generalizable rewards and language annotations produces diverse synthetic datasets that improve multi-task policy generalization on three bimanual manipulation tasks.
PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement cs.CV · 2026-06-18 · unverdicted · none · ref 25 · internal anchor
PSCT-Net proposes a geometry-aware neural framework for 3D CT reconstruction from bi-planar X-rays in pediatric cases, using differentiable back-projection, attention-guided projection, and bidirectional Mamba modules, tested on a new private PedSkull-CT dataset.
DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data cs.HC · 2026-06-02 · unverdicted · none · ref 36 · internal anchor
DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.
Flicker-DDPM: Accelerating Denoising Diffusion via 1/f Colored Noise Injection cs.LG · 2026-06-02 · unverdicted · none · ref 35 · internal anchor
Flicker-DDPM accelerates DDPM sampling by injecting 1/f colored noise matched to image spectra, achieving similar quality with 3.33 times fewer steps on CIFAR-10.
Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies cs.LG · 2026-05-31 · unverdicted · none · ref 24 · internal anchor
LP-DS improves generative policies for imitation and RL by optimizing latent noise perturbations with a constrained Lagrangian objective, showing up to 25% better returns on manipulation and locomotion tasks.
Probabilistic Precipitation Nowcasting with Rectified Flow Transformers cs.CV · 2026-05-29 · unverdicted · none · ref 127 · internal anchor
FREUD applies rectified flow transformers with frame-wise encoding and a unified decoder to achieve state-of-the-art probabilistic precipitation nowcasting on the SEVIR benchmark.
SteerFace: Debiasing Synthetic Face Generation via Adaptive Residue Perturbation cs.CV · 2026-05-29 · unverdicted · none · ref 53 · internal anchor
SteerFace perturbs identity embeddings toward random orthogonal directions on the hypersphere with an adaptive strategy to mitigate visual tendency in synthetic faces and improve downstream recognition performance.
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language cs.AI · 2026-05-28 · unverdicted · none · ref 30 · internal anchor
PhyDrawGen is a neuro-symbolic pipeline that extracts typed scene graphs via LLM, converts them to physically constrained PSLGs via deterministic solver, and refines via fine-tuned Qwen-VL, claiming superior performance over GPT-5-image and Gemini models on 1,449 physics problems.
Rethinking FID Through the Geometry of the Reference Dataset cs.CV · 2026-05-28 · unverdicted · none · ref 11 · internal anchor
FID improves with better samples only on concentrated reference datasets but can worsen on dispersed ones, as shown by density and effective rank in a controlled study across six datasets.
Sketch2Motion: Text-driven 2D Sketch to 3D Animation via Diffusion-guided Skeleton Optimization cs.CV · 2026-05-27 · unverdicted · none · ref 27 · internal anchor
Sketch2Motion is a diffusion-guided skeleton optimization framework that generates text-driven 3D animations from 2D sketches for biped, quadruped, and other articulated characters.
SANTS: A State-Adaptive Scheduler for World Action Models cs.RO · 2026-05-27 · unverdicted · none · ref 33 · internal anchor
SANTS adaptively chooses denoising depth in video-based robot action diffusion policies using a state-dependent stopping hazard and noise ratio, trained via downstream action reward to reduce latency.
Recursive Flow Matching cs.LG · 2026-05-26 · unverdicted · none · ref 19 · internal anchor
RecFM uses recursive self-consistency in flow matching to enable high-fidelity one- and few-step (2-4 step) generation of scientific dynamics, claiming 20x speedup over diffusion emulators and 15% lower MSE than vanilla flow matching.
Agent-Centric Social Trajectory Prediction: A Free Energy Principle Perspective cs.AI · 2026-05-25 · unverdicted · none · ref 46 · internal anchor
FEP-Diff uses a dual-branch spatiotemporal encoder, goal-conditioned belief learner optimized by free-energy objective with social consistency constraint, and residual diffusion generator to outperform prior methods on five benchmarks under restricted observability.
Adversarial Error Correction for Visual Autoregressive Generation cs.CV · 2026-05-24 · unverdicted · none · ref 44 · internal anchor
AID-VAR attaches an adversarial discriminator and lightweight guidance injector to frozen VAR backbones to diagnose and correct fidelity gaps across scales, reporting 16% FID gains with 3% added parameters.
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models cs.LG · 2026-05-22 · unverdicted · none · ref 13 · internal anchor
Precise is a new SDE-consistent stochastic sampler that balances exploration and stability for RL post-training of flow-matching models via a novel posterior-mean approximation.
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models cs.CV · 2026-05-20 · unverdicted · none · ref 60 · internal anchor
Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step
Efficient 3D Content Reconstruction and Generation cs.CV · 2026-05-18 · unverdicted · none · ref 232 · internal anchor
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 141 · 2 links · internal anchor
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling eess.SP · 2026-05-17 · unverdicted · none · ref 30 · internal anchor
VAMP-Diff is a jointly trained variational diffusion model using VampPrior on pooled latents to generate realistic PPG waveforms with better reconstruction fidelity and physiological rate preservation than Gaussian-prior baselines on CapnoBase data.
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization cs.RO · 2026-05-17 · unverdicted · none · ref 91 · internal anchor
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
AnimeAdapter: A Modular Adapter for Appearance-Consistent Anime Character Generation cs.CV · 2026-05-17 · unverdicted · none · ref 6 · 2 links · internal anchor
AnimeAdapter is a modular adapter for Stable Diffusion that enables appearance-consistent anime character generation from a single reference image using semantic-selective local attention and pose-aware conditioning, plus a new Danbooru-derived dataset.
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing cs.CV · 2026-05-16 · unverdicted · none · ref 40 · internal anchor
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing cs.CV · 2026-05-16 · unverdicted · none · ref 30 · internal anchor
Edit-GRPO decouples editing and preservation objectives via region-specific signals in a policy optimization framework to improve locality in image editing tasks.
HighSync: High-Quality Lip Synchronization via Latent Diffusion Models cs.CV · 2026-05-16 · unverdicted · none · ref 25 · internal anchor
HighSync is a diffusion-based lip synchronization system that operates natively at 512x512 resolution by eliminating data leakage to enforce genuine audio dependence and reports state-of-the-art results on quality and sync metrics.
HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations stat.ML · 2026-05-16 · unverdicted · none · ref 32 · internal anchor
HYVINT introduces an intensity-driven incidence mechanism and tractable variational estimator for hypergraph generation, with error bounds and empirical gains in fidelity, novelty, and diversity.
StateXDiff: Cell State-Contextualized Multimodal Diffusion for Single-Cell Perturbation Prediction q-bio.GN · 2026-05-15 · unverdicted · none · ref 23 · internal anchor
StateXDiff integrates transcriptomic profiles with inferred protein features via a conditional diffusion model and mechanism-aware drug templates to predict single-cell drug perturbation responses under unseen cell lines, drugs, and combinatorial settings.
FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery cs.CV · 2026-05-14 · unverdicted · none · ref 53 · internal anchor
FactorizedHMR recovers 3D human meshes from video by deterministically anchoring the torso-root then probabilistically completing distal articulations via flow-matching with geometry-aware supervision and a synthetic data pipeline.
EponaV2: Driving World Model with Comprehensive Future Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 62 · internal anchor
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations cs.LG · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
Di-BiLPS combines a variational autoencoder, latent diffusion, and contrastive learning to achieve state-of-the-art accuracy on PDE problems with as little as 3% observations while supporting zero-shot super-resolution and lower computational cost.
DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery cs.CV · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
DIVER applies a pre-trained diffusion model in a dual-stage process of semantic inheritance, guidance, and fusion to improve semantic expression and cross-architecture generalization in dataset distillation.
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing cs.CV · 2026-05-12 · unverdicted · none · ref 16 · internal anchor
Near-reversible Runge-Kutta ODE solvers combined with vector-field smoothing deliver more stable and higher-fidelity text-guided edits in diffusion models than exactly reversible schemes.
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation physics.ins-det · 2026-05-12 · unverdicted · none · ref 26 · internal anchor
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditional flow matching.
RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation cs.CV · 2026-05-12 · unverdicted · none · ref 31 · internal anchor
RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrative dynamism in sequential image generation.
diffGHOST: Diffusion based Generative Hedged Oblivious Synthetic Trajectories cs.AI · 2026-05-11 · unverdicted · none · ref 36 · internal anchor
diffGHOST is a conditional diffusion model that segments learned latent space to identify and mitigate memorization of critical trajectory samples, aiming to deliver privacy guarantees alongside data utility.
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice cs.CV · 2026-05-11 · unverdicted · none · ref 65 · internal anchor
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations cs.LG · 2026-05-07 · unverdicted · none · ref 42 · internal anchor
MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 22 · internal anchor
FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer