Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
super hub Canonical reference
Score-Based Generative Modeling through Stochastic Differential Equations
Canonical reference. 76% of citing Pith papers cite this work as background.
abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate
authors
co-cited works
representative citing papers
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
Quotient-space diffusion models generate correct symmetric distributions by removing redundancy on the quotient space, simplifying learning and improving results on small molecules and proteins under SE(3) symmetry.
The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
MF-PID turns independent diffusion samples into mean-field interacting agents, proving that quadratic interactions yield exact linear mean interpolation and delivering 19-24% energy savings in demand-response control.
Föllmer processes are variationally optimal among generative diffusions because they minimize the impact of drift estimation error on path-space KL divergence, rendering different interpolation schedules statistically equivalent.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.
Introduces structured DRO for learned inverse problem reconstructions with ambiguity sets aligned to the forward operator, yielding explicit dual representations and a worst-case bound that induces Tikhonov regularization on the operator Lipschitz constant.
CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
QMC applied to Euler-Maruyama yields faster sampling-error decay than Monte Carlo, and the new MSTG method based on exact simulation achieves super-exponential truncation-error decay that sharply reduces integration dimension.
STREAM decouples text and music conditioning in a diffusion transformer via AdaLN for structure and BEAM for beats, plus new Motorica++ dataset and editability metrics, claiming SOTA music alignment with preserved semantics.
Direct fixed-weight solver for free-support Wasserstein medians relocates atoms using OT barycentric projections and inverse-distance weights, achieving monotone descent on smoothed objectives with fewer subproblems than nested Weiszfeld baselines.
Chameleon proposes the first large-scale cross-domain compositing dataset and a disentangled encoder plus gated diffusion transformer that outperforms prior in-domain and cross-domain methods on plausibility and fidelity.
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.
Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.
Presents a controlled vector field framework for continuous generative modeling where velocity is formed from fixed bracket-generating fields modulated by scalar controls, with an expressivity principle under controllability assumptions.
citing papers explorer
-
Structured State-Space Regularization for Generation-Friendly Image Tokenization
Structured state-space regularization induces spectral structure in image tokenizer latent spaces via an SSM-derived objective, improving generative performance with minimal reconstruction loss.
-
FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models on new and existing benchmarks.
-
NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models: Datasets, Methods and Results
The NTIRE 2026 challenge releases the KwaiVIR benchmark for short-form UGC video restoration and reports strong results from 12 teams using generative models on both subjective and objective tracks.
-
Rethinking the Diffusion Model from a Langevin Perspective
Diffusion models are reorganized under a Langevin perspective that unifies ODE and SDE formulations and shows flow matching is equivalent to denoising under maximum likelihood.
-
NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods
NTIRE 2026 challenge introduces OpenRR-5k real-world dataset for single-image reflection removal and reports that top participant methods advance the state of the art.
-
Not all tokens contribute equally to diffusion learning
DARE mitigates neglect of important tokens in conditional diffusion models via distribution-rectified guidance and spatial attention alignment.
-
TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Guided Optimization
TIGFlow-GRPO uses a Trajectory-Interaction-Graph in conditional flow matching plus Flow-GRPO optimization to produce more accurate, socially compliant, and physically feasible trajectory forecasts on ETH/UCY and SDD datasets.
-
Uncertainty-Aware Distribution-to-Distribution Flow Matching for Scientific Imaging
SFM improves generalization under distribution shift for scientific imaging tasks while AVUQ supplies sample-efficient epistemic and aleatoric uncertainty estimates plus anomaly scores.
-
EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection
EmDT combines UMAP clustering with a Transformer-based diffusion process to create synthetic fraud samples that improve XGBoost classification on credit card fraud data while preserving correlations and privacy.
-
C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis
C²FG provides a time-dependent guidance controller for diffusion models derived from score discrepancy upper bounds, implemented as an exponential decay function without retraining.
-
TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration
TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.
-
Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
-
D2 Actor Critic: Diffusion Actor Meets Distributional Critic
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.
-
Incomplete Data, Complete Dynamics: A Diffusion Approach
A conditional diffusion model trained on partitioned incomplete samples for physical dynamics achieves asymptotic convergence to the true generative process under mild conditions and outperforms baselines in imputation.
-
Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics
Introduces higher-order Langevin dynamics with auxiliary variables as a defense that mixes randomness early to reduce membership inference success on diffusion models, measured via AUROC and FID on toy and speech data.
-
MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation
MedShift applies flow matching and Schrödinger bridges for class-conditional unpaired translation between synthetic and real skull X-rays, benchmarked on the new X-DigiSkull dataset.
-
Aleatoric Uncertainty Medical Image Segmentation Estimation via Flow Matching
Conditional flow matching produces segmentation samples whose pixel-wise variance quantifies aleatoric uncertainty in medical images by learning an exact density rather than relying on stochastic diffusion sampling.
-
The Serial Scaling Hypothesis
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
-
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
-
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
VADD augments masked diffusion models with an auxiliary recognition model and variational inference to implicitly model inter-dimensional correlations, yielding higher-quality samples than standard MDMs at low denoising step counts on toy data, images, and text.
-
Dual Ascent Diffusion for Inverse Problems
A dual ascent optimization framework is introduced for MAP estimation with diffusion priors, claimed to outperform prior methods on image restoration in quality, noise robustness, speed, and data fidelity.
-
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
ConsDreamer refines conditional and unconditional terms in score distillation via view disentanglement and geometric consistency loss to reduce the Janus problem in zero-shot text-to-3D.
-
Exploring the flavor structure of leptons via diffusion models
Applies diffusion models to generate 10,000 neutrino mass matrices consistent with oscillation parameters in a seesaw model, revealing non-trivial distributions in CP phases and 0νββ effective mass.
-
Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers
D3DR optimizes inserted 3DGS objects with a DDS-inspired diffusion objective plus a new personalization step to match scene lighting, reporting 2 dB PSNR gain over prior methods.
-
RectifiedHR: Enable Efficient High-Resolution Synthesis via Energy Rectification
RectifiedHR is a training-free method that uses noise refresh and latent energy analysis to enable efficient high-resolution synthesis in diffusion models.
-
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
DepthMaster proposes a single-step diffusion model with Feature Alignment and Fourier Enhancement modules in a two-stage training process to improve generalization and detail preservation in monocular depth estimation over prior diffusion methods.
-
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
-
Ergodicity of Langevin Dynamics and its Discretizations for Non-smooth Potentials
Subgradient Langevin dynamics and certain discretizations are shown to be ergodic for strongly convex non-smooth potentials, with the discrete versions also satisfying the law of large numbers.
-
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
-
A Survey on Diffusion Models for Inverse Problems
A survey that introduces taxonomies for categorizing pre-trained diffusion model methods applied to inverse problems and analyzes their connections and challenges.
-
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
I2VGen-XL applies cascaded diffusion models with a base stage for semantic preservation via hierarchical encoders and a refinement stage for detail and resolution, trained on 35 million text-video and 6 billion text-image pairs.
-
The Score-Difference Flow for Implicit Generative Modeling
Score-difference flow reduces KL divergence between distributions and is formally equivalent to denoising diffusion models and a hidden subproblem in optimal GAN training under stated conditions.
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
-
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
-
Human2Any: Human-to-Robot Transfer via Constraint-Aware Compositional Planning
Human2Any transfers human video demonstrations to robots by representing tasks as object-object interactions and composing learned priors with robot-side planning.
-
Flow Matching for Convective-Scale Precipitation Downscaling
Flow matching produces better spatial structure than diffusion models for convective precipitation downscaling but underestimates heavy rainfall amounts.
-
DebFilter: Eradicating Biases Stashed in Value
DebFilter mitigates biases in text-to-image diffusion models by applying a fixed offset to the guidance embedding slice in cross-attention during inference.
-
Stabilizing, Scaling & Enhancing MeanFlow for Large-scale Diffusion Distillation
Stabilizes MeanFlow for large-scale diffusion distillation via discrete warm-up and trajectory alignment, reporting better results on FLUX.1-dev and HunyuanImage 3.0.
-
Accelerating Redshift-Conditioned Galaxy Image Synthesis with One-step Generative Modeling
One-step pixel-MeanFlow models recover key galaxy morphology statistics at orders-of-magnitude lower computational cost than standard DDPM sampling while remaining weaker on fine-grained structure.
-
Noise scheduling and linear dynamics in diffusion models on Lie groups
A specific noise schedule in Lie-group diffusion models yields linear decay of the Wilson action expectation value versus diffusion time, emerging naturally without an added drift term.
-
Harnessing AI for Inverse Partial Differential Equation Problems: Past, Present, and Prospects
A survey organizing AI methods for inverse PDE problems into inverse problems, inverse design, and control categories, covering applications and future challenges like physics-informed models and uncertainty quantification.
-
Privacy Evaluation of Generative Models for Trajectory Generation
Generative models for trajectory data do not inherently preserve privacy, as membership inference attacks can identify training data points in representative models.
-
Elucidating Representation Degradation Problem in Diffusion Model Training
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
-
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
-
Technical Note on Relating Scores of Tilted Distributions
Extends score relations for tilted distributions to constant negative diagonal tilts by linking denoisers via Tweedie's formula, yielding location and time shifts in the score operator.
-
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
-
Discrete Meanflow Training Curriculum
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
-
Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.
-
Towards a Universal Foundation Model for Protein Dynamics: A Multi-Chain Tree-Structured Framework with Transformer Propagators
Proposes TSCG hierarchical representation and Transformer propagator for universal coarse-grained protein MD with claimed 10k-20k times acceleration over all-atom MD while preserving statistical properties.
-
ModelScope Text-to-Video Technical Report
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.