hub Canonical reference

Fine- tuning of continuous-time diffusion models as entropy-regularized control

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine · 2024 · arXiv 2402.15194

Canonical reference. 80% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

cs.LG · 2026-03-13 · unverdicted · novelty 8.0

Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.

Supervised Guidance Training for Infinite-Dimensional Diffusion Models

cs.LG · 2026-01-28 · conditional · novelty 8.0

Supervised Guidance Training enables conditioning of infinite-dimensional diffusion models via an extended Doob h-transform so that fine-tuned models accurately sample from posteriors in function space.

Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.

Personalizing Text-to-Image Generation to Individual Taste

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.

Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

cs.LG · 2025-07-11 · conditional · novelty 7.0

PG-DLM applies particle Gibbs sampling over full trajectories in diffusion language models to enable iterative refinement, yielding higher accuracy on reward-guided generation with theoretical convergence guarantees.

Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization

cs.LG · 2025-07-01 · unverdicted · novelty 7.0

Reformulates constrained black-box optimization as posterior inference in latent space of flow-based models amortized by outsourced diffusion models, claiming superior performance on synthetic and real tasks.

Hierarchical Variational Policies for Reward-Guided Diffusion

cs.LG · 2026-05-20 · conditional · novelty 6.0

A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

ZeNO frames noise optimization as a path-integral control problem solvable from zeroth-order reward evaluations, connecting to implicit Langevin dynamics for reward-tilted distributions.

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Derives RAM, a reward-adjusted consistency loss extending diffusion pretraining regression to efficient KL-regularized RL post-training, achieving peak rewards up to 50x faster than Flow-GRPO on Stable Diffusion 3.5M.

A unified perspective on fine-tuning and sampling with diffusion and flow models

stat.ML · 2026-04-30 · unverdicted · novelty 6.0

A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

FMRG is a training-free single-trajectory guidance framework for flow-based models that matches or exceeds baselines on reward-guided tasks and inverse problems using as few as 3 NFEs.

Robust mean field control: stochastic maximum principle and variational mean field games

math.OC · 2026-04-23 · unverdicted · novelty 6.0

A new min-max robust formulation for mean field control and variational mean field games is introduced, with existence, uniqueness, and a stochastic maximum principle established under convexity-concavity assumptions.

Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

Reward Score Matching unifies reward-based fine-tuning for flow and diffusion models by recasting alignment as score matching to a value-guided target.

Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control

math.OC · 2026-03-28 · unverdicted · novelty 6.0

Adjoint matching objectives derived from the Stochastic Maximum Principle have critical points satisfying HJB stationarity conditions for SOC problems with control-dependent drift and diffusion.

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

cs.LG · 2026-01-29 · unverdicted · novelty 6.0 · 2 refs

ETS performs training-free RL alignment for language models by energy-guided test-time scaling with Monte Carlo energy estimation and importance sampling acceleration.

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

cs.LG · 2025-07-07 · unverdicted · novelty 6.0

Stein Diffusion Guidance corrects approximate posteriors in diffusion sampling via a Stein variational mechanism and surrogate SOC objective to enable effective guidance beyond high-density regimes.

DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

cs.RO · 2025-02-09 · unverdicted · novelty 6.0

DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.

citing papers explorer

Showing 17 of 17 citing papers.

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages cs.LG · 2026-03-13 · unverdicted · none · ref 13
Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.
Supervised Guidance Training for Infinite-Dimensional Diffusion Models cs.LG · 2026-01-28 · conditional · none · ref 6
Supervised Guidance Training enables conditioning of infinite-dimensional diffusion models via an extended Doob h-transform so that fine-tuned models accurately sample from posteriors in function space.
Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline cs.AI · 2026-05-07 · unverdicted · none · ref 41
A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.
Personalizing Text-to-Image Generation to Individual Taste cs.CV · 2026-04-08 · unverdicted · none · ref 53
PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.
Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement cs.LG · 2025-07-11 · conditional · none · ref 51
PG-DLM applies particle Gibbs sampling over full trajectories in diffusion language models to enable iterative refinement, yielding higher accuracy on reward-guided generation with theoretical convergence guarantees.
Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization cs.LG · 2025-07-01 · unverdicted · none · ref 20
Reformulates constrained black-box optimization as posterior inference in latent space of flow-based models amortized by outsourced diffusion models, claiming superior performance on synthetic and real tasks.
Hierarchical Variational Policies for Reward-Guided Diffusion cs.LG · 2026-05-20 · conditional · none · ref 46
A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.
Gradient-Free Noise Optimization for Reward Alignment in Generative Models cs.LG · 2026-05-12 · unverdicted · none · ref 30 · 2 links
ZeNO frames noise optimization as a path-integral control problem solvable from zeroth-order reward evaluations, connecting to implicit Langevin dynamics for reward-tilted distributions.
Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models cs.LG · 2026-05-11 · unverdicted · none · ref 7 · 2 links
Derives RAM, a reward-adjusted consistency loss extending diffusion pretraining regression to efficient KL-regularized RL post-training, achieving peak rewards up to 50x faster than Flow-GRPO on Stable Diffusion 3.5M.
A unified perspective on fine-tuning and sampling with diffusion and flow models stat.ML · 2026-04-30 · unverdicted · none · ref 4
A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance cs.LG · 2026-04-29 · unverdicted · none · ref 19 · 2 links
FMRG is a training-free single-trajectory guidance framework for flow-based models that matches or exceeds baselines on reward-guided tasks and inverse problems using as few as 3 NFEs.
Robust mean field control: stochastic maximum principle and variational mean field games math.OC · 2026-04-23 · unverdicted · none · ref 96
A new min-max robust formulation for mean field control and variational mean field games is introduced, with existence, uniqueness, and a stochastic maximum principle established under convexity-concavity assumptions.
Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models cs.LG · 2026-04-19 · unverdicted · none · ref 44
Reward Score Matching unifies reward-based fine-tuning for flow and diffusion models by recasting alignment as score matching to a value-guided target.
Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control math.OC · 2026-03-28 · unverdicted · none · ref 13
Adjoint matching objectives derived from the Stochastic Maximum Principle have critical points satisfying HJB stationarity conditions for SOC problems with control-dependent drift and diffusion.
ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment cs.LG · 2026-01-29 · unverdicted · none · ref 33 · 2 links
ETS performs training-free RL alignment for language models by energy-guided test-time scaling with Monte Carlo energy estimation and importance sampling acceleration.
Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions cs.LG · 2025-07-07 · unverdicted · none · ref 44
Stein Diffusion Guidance corrects approximate posteriors in diffusion sampling via a Stein variational mechanism and surrogate SOC objective to enable effective guidance beyond high-density regimes.
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control cs.RO · 2025-02-09 · unverdicted · none · ref 54
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.

Fine- tuning of continuous-time diffusion models as entropy-regularized control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer