arxiv: 2207.12598 · v1 · submitted 2022-07-26 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Classifier-Free Diffusion Guidance

Jonathan Ho , Tim Salimans

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords diffusion modelsclassifier-free guidanceconditional generationscore estimatesgenerative modelsguidance scalesample quality

0 comments

The pith

Conditional diffusion models can guide their own sampling by combining conditional and unconditional score estimates without needing a separate classifier.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that guidance in diffusion models, which balances sample fidelity against diversity, works without training an external image classifier. The method jointly trains one diffusion model on both conditional and unconditional objectives, then mixes the two resulting score estimates during sampling with a tunable scale. This produces a quality-diversity trade-off comparable to the earlier classifier-guided approach. A reader would care because the technique removes an entire training step and its associated data and compute costs.

Core claim

We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.

What carries the argument

Classifier-free guidance: the scaled difference between the conditional score estimate and the unconditional score estimate, used to steer the reverse diffusion process at sampling time.

If this is right

Only one model needs to be trained to enable both conditional generation and guidance.
The guidance scale remains adjustable after training, just as with classifier guidance.
No auxiliary classifier or its training data is required.
The same sampling procedure yields controllable fidelity-diversity trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Guidance emerges from having access to both conditional and unconditional distributions inside the same generative model rather than from external class supervision.
The method lowers the setup cost for high-fidelity conditional sampling in settings where reliable classifiers are expensive or unavailable.
It suggests that unconditional score estimates already encode information useful for directing conditional trajectories.

Load-bearing premise

That linearly combining the conditional and unconditional score estimates produces guidance behavior comparable to classifier gradients without introducing new artifacts or mode collapse.

What would settle it

Train the joint model on a standard dataset such as ImageNet, then compare the quality-diversity curves obtained by varying the guidance scale against those from a separately trained classifier; mismatch at high scales would falsify equivalence.

read the original abstract

Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. It also raises the question of whether guidance can be performed without a classifier. We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows classifier-free guidance works by jointly training one diffusion model on conditional and unconditional paths then subtracting the scores, and the ImageNet results hold up without major holes.

read the letter

The punchline is that you can drop the separate classifier entirely and still get the same quality-diversity knob that classifier guidance provides. They train a single model that sometimes sees the condition and sometimes does not, then at inference combine the conditional score with a scaled difference from the unconditional score. The math follows directly from the score-function identity, so the guidance direction approximates the classifier gradient without ever training one.

Referee Report

0 major / 3 minor

Summary. The paper claims that guidance in conditional diffusion models can be performed without a separate classifier by jointly training a single model on conditional and unconditional objectives (via random condition dropout) and linearly combining the resulting conditional and unconditional score estimates at sampling time, achieving a quality-diversity trade-off comparable to classifier guidance.

Significance. If the empirical results hold, the work is significant for simplifying the training pipeline of conditional diffusion models, eliminating the need for an auxiliary classifier, and providing a practical inference-time control mechanism. The joint training procedure and the score-difference identity enable this, with ImageNet experiments demonstrating similar FID/IS trade-offs across guidance scales; this has become a foundational technique in the field.

minor comments (3)

§3.1, Eq. (5): the guidance formula is clearly derived, but the text should explicitly note that the linear combination is an approximation whose fidelity depends on the diffusion timestep and noise schedule, to better ground the 'similar trade-off' claim.
Table 1: the caption does not specify the exact number of generated samples used for FID computation or whether the same random seed protocol was used across guidance scales, which affects reproducibility of the reported trade-offs.
Figure 4: the y-axis scale for diversity metrics is not labeled consistently with the main text, making it difficult to directly compare the classifier-free and classifier-guided curves.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a new joint training procedure (condition dropout to learn both conditional and unconditional diffusion scores) and an inference-time linear combination rule. The guidance formula is derived directly from the Bayes identity relating score differences to classifier gradients, which is a mathematical fact independent of the present work. No step reduces a claimed prediction to a fitted input by construction, no load-bearing uniqueness theorem is imported via self-citation, and the central claim is tested empirically on ImageNet rather than asserted by redefinition or renaming. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that joint conditional-unconditional training is feasible and that score combination can substitute for classifier gradients; the guidance scale is an implicit free parameter.

free parameters (1)

guidance scale
Hyperparameter that controls the strength of the combination between conditional and unconditional scores; chosen to achieve the desired quality-diversity trade-off.

axioms (1)

domain assumption A diffusion model can be trained jointly on conditional and unconditional objectives without one degrading the other.
Invoked by the joint training step described in the abstract.

pith-pipeline@v0.9.0 · 5408 in / 1189 out tokens · 62087 ms · 2026-05-10T14:55:17.835716+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance
IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

classiﬁer guidance combines the score estimate of a diffusion model with the gradient of an image classiﬁer

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds
cs.LG 2026-05 unverdicted novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
cs.LG 2026-05 unverdicted novelty 8.0

Inference-time refinement of pre-trained tabular diffusion models via Bidirectional Chamfer Refinement achieves median 8.6% better downstream performance than real data across 15 benchmarks while preserving fidelity a...
A Priori Sampling of Transition States with Guided Diffusion
physics.chem-ph 2026-03 conditional novelty 8.0

ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
Large Language Diffusion Models
cs.CL 2025-02 unverdicted novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Guided Diffusion Sampling for Precipitation Forecast Interventions
cs.LG 2026-05 unverdicted novelty 7.0

Gradient-guided diffusion sampling reduces extreme precipitation forecasts in data-driven weather models while producing more physically plausible changes than adversarial perturbations.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
cs.CV 2026-05 unverdicted novelty 7.0

Text embeddings in MM-DiTs contain a detectable omission signal for missing concepts, and amplifying it via OSI reduces concept omission in generated images on FLUX.1-Dev and SD3.5-Medium.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 7.0

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation
cs.CV 2026-05 unverdicted novelty 7.0

HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target...
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation
cs.CV 2026-05 unverdicted novelty 7.0

A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Margin-calibrated Classifier Guidance for Property-driven Synthesis Planning
cs.LG 2026-05 unverdicted novelty 7.0

Margin-calibrated classifier guidance via Sequence Completion Ranking raises multi-step retrosynthesis solve rates from 16.8% to 95.3% on USPTO-190 and unlocks previously unsolvable targets.
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport
cs.CV 2026-05 unverdicted novelty 7.0

DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generativ...
Generative Motion In-betweening by Diffusion over Continuous Implicit Representations
cs.GR 2026-05 unverdicted novelty 7.0

A latent diffusion model over continuous implicit neural representations samples INR parameters from sparse keyframes to reconstruct plausible, smooth, and diverse motions while preserving keyframe accuracy.
Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling
cs.LG 2026-05 unverdicted novelty 7.0

Constraint-Aware Flow Matching integrates constraint projections into the flow matching training objective to align model dynamics with constrained sampling and reduce distributional shift.
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
cs.RO 2026-05 unverdicted novelty 7.0

MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
Is Monotonic Sampling Necessary in Diffusion Models?
cs.LG 2026-05 unverdicted novelty 7.0

Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
ScaleMoGen: Autoregressive Next-Scale Prediction for Human Motion Generation
cs.CV 2026-05 unverdicted novelty 7.0

ScaleMoGen introduces a scale-wise autoregressive framework that quantizes motions into hierarchical discrete tokens and predicts next-scale maps to achieve SOTA FID 0.030 on HumanML3D and text-guided editing.
Single-Shot HDR Recovery via a Video Diffusion Prior
cs.CV 2026-05 unverdicted novelty 7.0

Single-shot HDR is achieved by conditioning a video diffusion model on an LDR input to generate an exposure bracket and fusing the bracket with per-pixel weights from a lightweight UNet.
Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations
cs.RO 2026-05 unverdicted novelty 7.0

CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.
Efficient Adjoint Matching for Fine-tuning Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

EAM speeds up adjoint matching for diffusion model reward fine-tuning by switching to linear base drift, allowing deterministic few-step solvers and closed-form adjoints with up to 4x faster convergence on text-to-ima...
Composing diffusion priors with explicit physical context via generative Gibbs sampling
cs.LG 2026-05 unverdicted novelty 7.0

GG-PA composes diffusion priors with physical context via a derived Gibbs sampler that is asymptotically exact as diffusion time approaches zero and exact at finite times for quadratic interactions.
Muninn: Your Trajectory Diffusion Model But Faster
cs.RO 2026-05 unverdicted novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
Inverse Design for Conditional Distribution Matching
cs.LG 2026-05 unverdicted novelty 7.0

Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models ...
Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
cs.SD 2026-05 unverdicted novelty 7.0

MixtureTT performs direct per-stem timbre transfer on polyphonic mixtures via a shared diffusion transformer, outperforming single-stem baselines on SATB choral data while eliminating cascaded separation errors.
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment
cs.LG 2026-05 unverdicted novelty 7.0

TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment
cs.LG 2026-05 unverdicted novelty 7.0

TMPO replaces scalar reward maximization with trajectory-level matching to a Boltzmann distribution via Softmax-TB, improving generative diversity by 9.1% while keeping competitive reward performance.
TARO: Temporal Adversarial Rectification Optimization Using Diffusion Models as Purifiers
cs.LG 2026-05 unverdicted novelty 7.0

TARO builds a temporally guided score prior from high-noise and low-noise diffusion views to purify adversarial examples more robustly than uniform timestep methods.
Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models
cs.CL 2026-05 unverdicted novelty 7.0

Adaptive guidance trajectories learned via PPO outperform fixed-scale CFG on controllability-quality balance in three controlled NLP generation tasks with discrete diffusion models.
OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos
cs.CV 2026-05 unverdicted novelty 7.0

OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.
Test-Time Compositional Generalization in Diffusion Models via Concept Discovery
cs.LG 2026-05 unverdicted novelty 7.0

Diffusion models can extract reusable density-mode concepts from their time-indexed scores to enable compositional generation at test time on held-out benchmarks from ColorMNIST and CelebA.
DCR: Counterfactual Attractor Guidance for Rare Compositional Generation
cs.CV 2026-05 unverdicted novelty 7.0

DCR uses a counterfactual attractor and projection-based repulsion to suppress default completion bias in diffusion models, improving fidelity for rare compositional prompts while preserving quality.
A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions
cs.LG 2026-05 unverdicted novelty 7.0

FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.
Autoregressive Visual Generation Needs a Prologue
cs.CV 2026-05 unverdicted novelty 7.0

Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.
MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation
cs.RO 2026-05 unverdicted novelty 7.0

MaMi-HOI counters geometric forgetting in diffusion models via a Geometry-Aware Proximity Adapter for precise contacts and a Kinematic Harmony Adapter for natural whole-body postures in human-object interactions.
Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

DMGD achieves better performance than fine-tuned SOTA methods in dataset distillation on ImageNet subsets by using semantic matching through conditional likelihood optimization and OT-based distribution matching in a ...
FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution
cs.CV 2026-05 unverdicted novelty 7.0

FluxFlow is a conservative pixel-space flow-matching framework for astronomical super-resolution that incorporates real atmospheric uncertainty and a training-free Wiener correction, outperforming baselines on a new 1...
Tempered Guided Diffusion
stat.ML 2026-05 unverdicted novelty 7.0

Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
cs.CV 2026-05 unverdicted novelty 7.0

AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional ani...
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
cs.CV 2026-05 unverdicted novelty 7.0

AniMatrix generates anime videos using a production knowledge taxonomy, dual-channel conditioning, style-motion curriculum, and deformation-aware preference optimization, outperforming baselines in animator evaluation...
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
cs.CV 2026-05 unverdicted novelty 7.0

AniMatrix generates anime videos using a structured taxonomy of artistic production variables, dual-channel conditioning, a style-motion curriculum, and deformation-aware optimization to prioritize art over physics.
DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing
cs.CV 2026-05 unverdicted novelty 7.0

DirectEdit achieves step-level accurate inversion for flow-based image editing by directly aligning forward paths, using attention feature injection and mask-guided noise blending to balance fidelity and editability w...
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
cs.LG 2026-05 unverdicted novelty 7.0

Structured diffusion bridges with alignment constraints achieve near fully-paired quality in modality translation while working effectively in unpaired and semi-paired regimes.
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
cs.CV 2026-05 unverdicted novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast
cs.CL 2026-05 unverdicted novelty 7.0

FoCore uses self-contrast on early-converging high-density tokens to boost diffusion LLM quality on reasoning benchmarks while cutting decoding steps by over 2x.
ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Text
cs.CV 2026-05 conditional novelty 7.0

ScribbleEdit is a synthetic dataset combining scribbles and text for training image editing models that produce spatially aligned and semantically consistent results.
Posterior Augmented Flow Matching
cs.CV 2026-05 unverdicted novelty 7.0

PAFM augments flow matching with an importance-sampled mixture over an approximate posterior of target completions, yielding an unbiased lower-variance estimator that improves FID by up to 3.4 on ImageNet and CC12M.
Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding
cs.LG 2026-05 unverdicted novelty 7.0

Timestep embeddings in diffusion models function as a separable side channel that can carry dedicated information for adversarial injection or detection.
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
cs.GR 2026-04 unverdicted novelty 7.0

FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.
AMGenC: Generating Charge Balanced Amorphous Materials
cs.LG 2026-04 unverdicted novelty 7.0

AMGenC generates guaranteed charge-balanced amorphous materials using element noise initialization combined with per-step soft and final discrete projections in a generative model.
ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent
cs.CV 2026-04 unverdicted novelty 7.0

ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
cs.CV 2026-04 unverdicted novelty 7.0

Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement ...
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
cs.CV 2026-04 unverdicted novelty 7.0

Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
cs.CV 2026-04 unverdicted novelty 7.0

Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a di...
CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

CODA augments offline multi-agent RL with on-policy diffusion trajectories that evolve with the joint policy to enable coordination.
DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion
cs.CV 2026-04 unverdicted novelty 7.0

DCMorph generates face morphs via decoupled cross-attention in identity-conditioned diffusion and DDIM spherical interpolation, achieving higher attack success rates on four face recognition systems than prior methods...
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics
cs.AI 2026-04 conditional novelty 7.0

TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax
cs.CV 2026-04 unverdicted novelty 7.0

DanceCrafter generates high-fidelity, text-controlled dance sequences using a new Choreographic Syntax framework and a large fine-grained motion dataset.
ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation
cs.CV 2026-04 unverdicted novelty 7.0

ScenarioControl introduces the first vision-language controllable generator for realistic vectorized 3D driving scenarios with temporal consistency across actor views.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 222 Pith papers · 1 internal anchor

[1]

Large scale GAN training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019

work page 2019
[2]

WaveGrad : Estimating gradients for waveform generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. WaveGrad : Estimating gradients for waveform generation. International Conference on Learning Representations , 2021

work page 2021
[3]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat GAN s on image synthesis. arXiv preprint arXiv:2105.05233, 2021

work page internal anchor Pith review arXiv 2021
[4]

Semi-supervised learning by entropy minimization

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Proceedings of the 17th International Conference on Neural Information Processing Systems, pp.\ 529--536, 2004

work page 2004
[5]

Suboptimal behavior of bayes and mdl in classification under misspecification

Peter Gr \"u nwald and John Langford. Suboptimal behavior of bayes and mdl in classification under misspecification. Machine Learning, 66 0 (2-3): 0 119--149, 2007

work page 2007
[6]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp.\ 6626--6637, 2017

work page 2017
[7]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pp.\ 6840--6851, 2020

work page 2020
[8]

Cascaded diffusion models for high ﬁdelity image generation

Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282, 2021

work page arXiv 2021
[9]

Estimation of non-normalized statistical models by score matching

Aapo Hyv \"a rinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6 0 (4), 2005

work page 2005
[10]

Glow: Generative flow with invertible 1x1 convolutions

Diederik P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp.\ 10215--10224, 2018

work page 2018
[11]

Variational diffusion models

Diederik P Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. arXiv preprint arXiv:2107.00630, 2021

work page arXiv 2021
[12]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. DiffWave: A Versatile Diffusion Model for Audio Synthesis . International Conference on Learning Representations , 2021

work page 2021
[13]

Improved denoising diffusion probabilistic models

Alex Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. International Conference on Machine Learning , 2021

work page 2021
[14]

Generating diverse high-fidelity images with VQ-VAE-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ-VAE-2 . In Advances in Neural Information Processing Systems, pp.\ 14837--14847, 2019

work page 2019
[15]

ImageNet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115 0 (3): 0 211--252, 2015

work page 2015
[16]

Should EBM s model the energy or the score? In Energy Based Models Workshop-ICLR 2021, 2021

Tim Salimans and Jonathan Ho. Should EBM s model the energy or the score? In Energy Based Models Workshop-ICLR 2021, 2021

work page 2021
[17]

Improved techniques for training GAN s

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GAN s. In Advances in Neural Information Processing Systems, pp.\ 2234--2242, 2016

work page 2016
[18]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015

work page 2015
[19]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp.\ 11895--11907, 2019

work page 2019
[20]

Maximum likelihood training of score-based diffusion models

Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score-based diffusion models. arXiv e-prints, pp.\ arXiv--2101, 2021 a

work page 2021
[21]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations , 2021 b

work page 2021
[22]

A connection between score matching and denoising autoencoders

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23 0 (7): 0 1661--1674, 2011

work page 2011
[23]

Logan: Latent optimisation for generative adversarial networks

Yan Wu, Jeff Donahue, David Balduzzi, Karen Simonyan, and Timothy Lillicrap. LOGAN : Latent optimisation for generative adversarial networks. arXiv preprint arXiv:1912.00953, 2019

work page arXiv 1912