Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
super hub Canonical reference
Score-Based Generative Modeling through Stochastic Differential Equations
Canonical reference. 76% of citing Pith papers cite this work as background.
abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate
authors
co-cited works
representative citing papers
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
Quotient-space diffusion models generate correct symmetric distributions by removing redundancy on the quotient space, simplifying learning and improving results on small molecules and proteins under SE(3) symmetry.
The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
MF-PID turns independent diffusion samples into mean-field interacting agents, proving that quadratic interactions yield exact linear mean interpolation and delivering 19-24% energy savings in demand-response control.
Föllmer processes are variationally optimal among generative diffusions because they minimize the impact of drift estimation error on path-space KL divergence, rendering different interpolation schedules statistically equivalent.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.
Introduces structured DRO for learned inverse problem reconstructions with ambiguity sets aligned to the forward operator, yielding explicit dual representations and a worst-case bound that induces Tikhonov regularization on the operator Lipschitz constant.
CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
QMC applied to Euler-Maruyama yields faster sampling-error decay than Monte Carlo, and the new MSTG method based on exact simulation achieves super-exponential truncation-error decay that sharply reduces integration dimension.
STREAM decouples text and music conditioning in a diffusion transformer via AdaLN for structure and BEAM for beats, plus new Motorica++ dataset and editability metrics, claiming SOTA music alignment with preserved semantics.
Direct fixed-weight solver for free-support Wasserstein medians relocates atoms using OT barycentric projections and inverse-distance weights, achieving monotone descent on smoothed objectives with fewer subproblems than nested Weiszfeld baselines.
Chameleon proposes the first large-scale cross-domain compositing dataset and a disentangled encoder plus gated diffusion transformer that outperforms prior in-domain and cross-domain methods on plausibility and fidelity.
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.
Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.
Presents a controlled vector field framework for continuous generative modeling where velocity is formed from fixed bracket-generating fields modulated by scalar controls, with an expressivity principle under controllability assumptions.
citing papers explorer
-
Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection
Cross-matrix Krylov projection reuses shared subspaces from seed matrices to accelerate score pre-computation in diffusion models, delivering 15.8-43.7% time savings and up to 115x speedup versus DDPM baselines.
-
Forecasting implied volatility surface with generative diffusion models
A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.
-
Discrete Bayesian Sample Inference for Graph Generation
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
-
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
RAPO++ is a three-stage prompt optimization framework combining retrieval-augmented refinement, closed-loop test-time scaling, and LLM fine-tuning to enhance text-to-video generation quality.
-
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
UniWorld-V2 applies policy optimization via DiffusionNFT and MLLM logit feedback with group filtering to reach state-of-the-art scores of 4.49 on ImgEdit and 7.83 on GEdit-Bench while remaining model-agnostic.
-
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
-
Flow Matching for Measure Transport and Feedback Stabilization of Control-Affine Systems
Introduces flow matching for measure transport in control-affine systems and a complementary noising-time-reversal method for stabilization, with numerical examples on linear and nonlinear cases.
-
EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules
EnScale emulates high-resolution regional climate model outputs from global circulation models for multiple variables using a two-step generative process with sparse local stochastic layers and energy score optimization, including a temporally consistent variant.
-
ReNF: Rethinking the Design of Neural Long-Term Time Series Forecasters
ReNF proposes Boosted Direct Output (BDO) and parameter smoothing so a basic temporal MLP outperforms complex state-of-the-art models on long-term time series forecasting benchmarks by implicitly combining forecasts to reduce uncertainty.
-
Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CT
CDPIR integrates cross-distribution diffusion priors from a Scalable Interpolant Transformer trained with classifier-free guidance into model-based iterative reconstruction to improve sparse-view CT under out-of-distribution conditions.
-
Physics-constrained generative machine learning-based high-resolution downscaling of Greenland's surface mass balance and surface temperature
A physics-constrained consistency model downscales Greenland SMB and surface temperature by a factor of 32 while preserving coarse-scale sums and outperforming interpolation on test metrics.
-
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching
ZipVoice-Dialog is a flow-matching non-autoregressive model for zero-shot spoken dialogue generation that uses curriculum learning and speaker-turn embeddings, paired with a new 6.8k-hour OpenDialog dataset, and reports better speed and quality than autoregressive baselines.
-
Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions
Stein Diffusion Guidance corrects approximate posteriors in diffusion sampling via a Stein variational mechanism and surrogate SOC objective to enable effective guidance beyond high-density regimes.
-
2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching
2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.
-
Latent Stochastic Interpolants
Latent Stochastic Interpolants jointly optimize encoder-decoder and a latent-space stochastic interpolant using a continuous-time ELBO to transform arbitrary priors into aggregated posteriors.
-
Test-Time Training Done Right
Large-chunk online updates during inference let test-time training scale state capacity to 40% of model size and handle contexts up to 1M tokens without custom kernels.
-
Fast Kernel-Space Diffusion for Remote Sensing Pansharpening
KSDiff generates convolutional kernels in kernel space using low-rank core tensor and factor generators with multi-head attention for fast, high-quality pansharpening.
-
DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
-
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
-
Flow-based Generative Modeling of Potential Outcomes and Counterfactuals
PO-Flow uses continuous normalizing flows trained via flow matching to jointly model potential outcome distributions and enable factual-conditioned counterfactual prediction for causal inference tasks including CATE estimation.
-
MAGI-1: Autoregressive Video Generation at Scale
MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
-
DanceGRPO: Unleashing GRPO on Visual Generation
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
-
Sampling-Aware Quantization for Diffusion Models
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
-
Art3D: Training-Free 3D Generation from Flat-Colored Illustration
Art3D enhances flat-colored 2D illustrations with 3D illusion using pre-trained 2D model features and VLM realism evaluation, then generates 3D, while introducing the Flat-2D benchmark dataset.
-
Color Conditional Generation with Sliced Wasserstein Guidance
A training-free method modifies diffusion model sampling with differentiable Sliced 1-Wasserstein distance for color-conditional image generation.
-
Characterizing higher-order representations through generative diffusion models explains human decoded neurofeedback performance
NERD uses RL-trained diffusion models on fMRI data to model higher-order uncertainty representations, outperforming controls and linking individual differences to neurofeedback success.
-
Unified Video Action Model
UVA learns a joint video-action latent representation with decoupled diffusion decoding heads, enabling a single model to perform accurate fast policy learning, forward/inverse dynamics, and video generation without performance loss versus task-specific methods.
-
Distributional Autoencoders Know the Score
DPA provides closed-form relation from level-set geometry to data score and proves extra latent components are conditionally independent, revealing intrinsic dimension.
-
Improving Video Generation with Human Feedback
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
-
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
VisionReward learns multi-dimensional human preferences for image and video generation via hierarchical assessment and linear weighting, outperforming VideoScore by 17.2% in prediction accuracy and yielding 31.6% higher win rates in text-to-video models.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
Autoregressive Video Generation without Vector Quantization
NOVA reformulates video generation as non-quantized autoregressive frame-by-frame temporal prediction combined with set-by-set spatial prediction, outperforming prior AR video models and some diffusion models in efficiency and quality.
-
Regional climate risk assessment from climate models using probabilistic machine learning
GenFocal uses probabilistic ML to downscale coarse climate projections to fine-scale weather events without paired training data and samples rare high-impact events more accurately than prior methods.
-
Gravitational-Wave Parameter Estimation in non-Gaussian noise using Score-Based Likelihood Characterization
Score-based diffusion models learn the empirical distribution of real LIGO noise to enable unbiased gravitational-wave parameter estimation under only an additivity assumption.
-
Diffusion Policy Policy Optimization
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
-
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
-
Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration
Diff-PCR uses a diffusion model to learn denoising directions for refining doubly stochastic correspondence matrices, improving point cloud registration over one-shot normalization methods.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
-
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Moment-matched GMM kernels in DDIM yield lower FID and higher IS than Gaussian kernels at small sampling steps on CelebA-HQ, FFHQ, ImageNet, and Stable Diffusion tasks.
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
-
On Diffusion Modeling for Anomaly Detection
Diffusion models via DDPM work for anomaly detection but are slow; the proposed DTE method estimates diffusion time distribution analytically and with a neural net to deliver faster inference while outperforming DDPM on ADBench for unsupervised and semi-supervised settings.
-
Generative diffusion learning for parametric partial differential equations
A conditional DDPM framework is introduced to approximate solution operators for parameter-dependent PDEs, achieving accuracy comparable to FNO while recovering noise levels and providing confidence intervals.
-
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
-
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
IDQL generalizes IQL into an actor-critic framework and uses diffusion policies for robust policy extraction, outperforming prior offline RL methods.
-
Scaling Robot Learning with Semantically Imagined Experience
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
-
Latent Video Diffusion Models for High-Fidelity Long Video Generation
Latent-space hierarchical diffusion models with targeted error-correction techniques generate realistic videos exceeding 1000 frames while using less compute than prior pixel-space approaches.
-
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
Denoising Student distills the multi-step denoising process of score-based and diffusion models into a single forward pass, matching GAN sampling speed while producing comparable sample quality on CIFAR-10, CelebA, and 256x256 LSUN.
-
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
-
Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling
The contact matrix approach in a diffusion model, paired with specialized VQ-VAE, enables more precise and realistic generation of interactive duet dance motions compared to prior methods.