Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
super hub Canonical reference
Score-Based Generative Modeling through Stochastic Differential Equations
Canonical reference. 76% of citing Pith papers cite this work as background.
abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate
authors
co-cited works
representative citing papers
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
Quotient-space diffusion models generate correct symmetric distributions by removing redundancy on the quotient space, simplifying learning and improving results on small molecules and proteins under SE(3) symmetry.
The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
MF-PID turns independent diffusion samples into mean-field interacting agents, proving that quadratic interactions yield exact linear mean interpolation and delivering 19-24% energy savings in demand-response control.
Föllmer processes are variationally optimal among generative diffusions because they minimize the impact of drift estimation error on path-space KL divergence, rendering different interpolation schedules statistically equivalent.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.
Introduces structured DRO for learned inverse problem reconstructions with ambiguity sets aligned to the forward operator, yielding explicit dual representations and a worst-case bound that induces Tikhonov regularization on the operator Lipschitz constant.
CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
QMC applied to Euler-Maruyama yields faster sampling-error decay than Monte Carlo, and the new MSTG method based on exact simulation achieves super-exponential truncation-error decay that sharply reduces integration dimension.
STREAM decouples text and music conditioning in a diffusion transformer via AdaLN for structure and BEAM for beats, plus new Motorica++ dataset and editability metrics, claiming SOTA music alignment with preserved semantics.
Direct fixed-weight solver for free-support Wasserstein medians relocates atoms using OT barycentric projections and inverse-distance weights, achieving monotone descent on smoothed objectives with fewer subproblems than nested Weiszfeld baselines.
Chameleon proposes the first large-scale cross-domain compositing dataset and a disentangled encoder plus gated diffusion transformer that outperforms prior in-domain and cross-domain methods on plausibility and fidelity.
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.
Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.
Presents a controlled vector field framework for continuous generative modeling where velocity is formed from fixed bracket-generating fields modulated by scalar controls, with an expressivity principle under controllability assumptions.
citing papers explorer
-
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
Denoising Student distills the multi-step denoising process of score-based and diffusion models into a single forward pass, matching GAN sampling speed while producing comparable sample quality on CIFAR-10, CelebA, and 256x256 LSUN.
-
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
-
Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling
The contact matrix approach in a diffusion model, paired with specialized VQ-VAE, enables more precise and realistic generation of interactive duet dance motions compared to prior methods.
-
FlowAWR: Online Adaptive Flow Reinforcement via Advantage-Weighted Rectification
FlowAWR derives an advantage-weighted rectification for optimal velocity fields in flow models, claiming 2-5x faster convergence than DiffusionNFT on SD3.5-Medium.
-
Your Data Manifold is Secretly a Reward Model: Shell-LCC for Text-to-Video Generation
Shell-LCC models the high-quality data manifold as an isotropic shell to derive cost-free reward signals that improve realism and high-frequency details in text-to-video generation.
-
Stochastic Optimal Control Sampling for Diffusion Inverse Problems
SOCS derives per-step closed-form control signals from stochastic optimal control to steer diffusion sampling trajectories toward measurements while preserving the generative prior.
-
Learning Climate Variability from Scarce Data with Diffusion Models: A Test Case for ENSO
Diffusion models recover known ENSO variability structure from synthetic LIM data when given enough samples, but require pre-training on CMIP6 plus fine-tuning to match observations with the ~700 samples available in ERSSTv5.
-
Multiscale reconstruction of protein conformations from cryo-EM images
A multiscale optimization method using explicit protein backbone geometry reconstructs atomic models from cryo-EM data, showing improved RMSD and TM scores on three simulated datasets.
-
My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents
My Chemical Harness performs evolutionary molecular design by searching over validated synthetic routes with LLMs restricted to high-level preferences, outperforming baselines on an sEH proxy task across multiple metrics.
-
Multiscale Fourier Neural Operator for Inverse Wave Scattering in Highly Oscillatory Media
MscaleFNO learns mappings from oscillatory media to wavefields for Helmholtz inverse problems and pairs it with diffusion regularization for partial-aperture 2D reconstructions.
-
DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data
DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.
-
Flicker-DDPM: Accelerating Denoising Diffusion via 1/f Colored Noise Injection
Flicker-DDPM accelerates DDPM sampling by injecting 1/f colored noise matched to image spectra, achieving similar quality with 3.33 times fewer steps on CIFAR-10.
-
Density Evolution: A Multiscale View of Density Estimation
A review reframing density estimation as 'density evolution' across scales, linking kernel smoothing to heat flow, mixtures to compression, and topology to level sets, while stating three structural results on modes, Gaussian semigroups, and log-concavity.
-
APE: Agentic Prompt Enhancer for Image Generation and Editing
APE post-trains small language models as single-agent or multi-agent prompt enhancers that improve visual alignment on image generation and editing benchmarks without altering the downstream visual model.
-
MARS Policy: Multimodality Only When It Matters
MARS policy adaptively activates multimodal generation only when beneficial in robotic tasks, claiming 16.67% higher success and 83.20% lower inference latency than baselines in real-world tests.
-
Latent Diffusion for Missing Data
A VAE-based latent diffusion model trained on incomplete data maintains sample quality and imputation performance up to 50% missingness while pixel-space diffusion degrades.
-
High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework
Hybrid CoMeTS-GAN plus diffusion model generates multivariate financial time series claimed to better reproduce stylized facts and inter-asset correlations than prior generative methods.
-
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization
MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.
-
Agent-Centric Social Trajectory Prediction: A Free Energy Principle Perspective
FEP-Diff uses a dual-branch spatiotemporal encoder, goal-conditioned belief learner optimized by free-energy objective with social consistency constraint, and residual diffusion generator to outperform prior methods on five benchmarks under restricted observability.
-
Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning
A semi-supervised MOL framework for diffusion models with generalization bounds depending only on specialist model complexity, extended to diffusion policies for sequential decisions.
-
Three-Step Conditional Diffusion 3D Reconstruction for Light-Field Microscopy
Proposes TCD, a three-step conditional diffusion model with ICD module, claiming superior fidelity and generalization for LFM 3D reconstruction.
-
Precipitation diffusion downscaling and application to out-of-distribution simulations with and without stratospheric aerosol injection
Diffusion downscaling trained on MESACLIP data applied to CESM2 indicates SAI nearly halves the CONUS-average increase in yearly maximum precipitation.
-
Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation
Diffusion-based denoising score matching avoids the mode-separation degradation that affects vanilla score matching error bounds, via suitable hyperparameter choice.
-
Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction
A generative solver separates data-driven prior learning from inference-time enforcement of conservation laws using martingale-regularized score matching and physics-informed sampling for stable field reconstruction.
-
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
-
Divergence-Suppressing Couplings for Rectified Flow
Divergence-suppressing couplings attenuate the divergent part of the velocity field when generating training couplings for Rectified Flow, yielding straighter paths and better generation quality at no extra inference cost.
-
Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing
Edit-GRPO decouples editing and preservation objectives via region-specific signals in a policy optimization framework to improve locality in image editing tasks.
-
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
Causal Forcing++ applies causal consistency distillation to enable scalable frame-wise 1-2 step autoregressive video generation, outperforming prior 4-step chunk-wise methods on quality metrics while halving first-frame latency.
-
On the Limits of Latent Reuse in Diffusion Models
Reusing source latent spaces in diffusion models under distribution shift produces target score error set by principal-angle misalignment and diffusion-time-amplified ambient noise.
-
Lossless Anti-Distillation Sampling
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
-
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing
Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditional flow matching.
-
RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation
RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrative dynamism in sequential image generation.
-
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
-
Predicting 3D structure by latent posterior sampling
A two-stage method trains NeRF latents then a diffusion prior to sample posteriors for 3D reconstruction from varied observations including single-view, multi-view, noisy, sparse pixels, and sparse depth.
-
A Stability Benchmark of Generative Regularizers for Inverse Problems
Numerical benchmarks indicate generative regularizers deliver strong reconstructions in some imaging inverse problem settings but can be unstable or problematic under imperfect conditions compared to variational methods.
-
Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning
A generative framework using geometric diffusion for brain networks and tabular diffusion for other organs integrates ICD-coded SDoH proxies to improve disease reasoning on UK Biobank data.
-
Learning Unified Representations of Normalcy for Time Series Anomaly Detection
U²AD learns unified normal data representations via score-based generative modeling and a novel time-dependent score network to outperform prior methods in accuracy and early anomaly detection for multivariate time series.
-
Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.
-
Exploring and Exploiting Stability in Latent Flow Matching
LFM models exhibit stability to data reduction and capacity shrinkage that is tied to the flow matching objective, enabling reduced-data training and coarse-to-fine inference with over 2x speedup.
-
Texture Independently Drives Liking in AI-Generated Alternative Protein Burgers
Resilience is the strongest mechanical predictor of meatiness and texture liking in alternative protein burgers, with texture driving overall liking separately from flavor.
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
-
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations
MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.
-
FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution
FluxFlow uses conservative pixel-space flow-matching with uncertainty weights and Wiener test-time correction to outperform baselines on photometric and scientific accuracy for ground-to-space super-resolution, validated on a new real 19,500-pair DESI-HST dataset.
-
Unifying Deep Stochastic Processes for Image Enhancement
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.
-
Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models
Target-based prompting lets users define fairness distributions for skin tones in generative AI, shifting outputs closer to chosen targets across 36 tested prompts for occupations and contexts.
-
The Amazing Stability of Flow Matching
Flow matching generative models preserve sample quality, diversity, and latent representations despite pruning 50% of the CelebA-HQ dataset or altering architecture and training configurations.
-
SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation
SubFlow restores full mode coverage in one-step flow matching by conditioning on sub-modes from semantic clustering, yielding higher diversity on ImageNet-256 while preserving FID.
-
Structured State-Space Regularization for Generation-Friendly Image Tokenization
Structured state-space regularization induces spectral structure in image tokenizer latent spaces via an SSM-derived objective, improving generative performance with minimal reconstruction loss.
-
FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models on new and existing benchmarks.