AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
super hub Mixed citations
Flow Matching for Generative Modeling
Mixed citation behavior. Most common role is method (47%).
abstract
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more
authors
co-cited works
representative citing papers
TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
Derives closed-form posterior covariance for flow matching from divergence of velocity field, enabling post-hoc uncertainty on pre-trained models including one-step generators.
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
DVG-WM disentangles dynamics learning and visual synthesis in video world models using flow matching and latent degradation to achieve faster inference up to 3.97 times with improved quality on LIBERO and real-world robotic platforms.
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
OOPSIEVERSE is a new damage-aware simulation benchmark for household robot manipulation that converts contact, thermal, and fluid signals into task-agnostic damage metrics and demonstrates uses in safer policy learning and benchmarking.
FlexiSLM is the first spoken language model supporting dynamic and controllable frame rates on speech input and output, outperforming fixed-rate 7B models at high quality and enabling faster inference at lower rates like 6.25 Hz.
MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.
Introduces structured DRO for learned inverse problem reconstructions with ambiguity sets aligned to the forward operator, yielding explicit dual representations and a worst-case bound that induces Tikhonov regularization on the operator Lipschitz constant.
SurgVLA-Bench supplies a hierarchical task taxonomy and multi-dimensional evaluation framework for VLA models in laparoscopic robotics simulation, showing autoregressive models excel at semantics while flow-matching models achieve higher precision but all fall short due to endoscopic view constraint
CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
Flow models reach 99.2% Sudoku accuracy in 7 passes and 96.1% on out-of-distribution Sudoku-Extreme by selecting dynamically stable candidates and training with self-conditioning plus DPO to avoid failed outputs.
Masked diffusion LMs can use continuous x-prediction flow with token-wise asynchronous updates and an RL policy network to reach 97% performance on HumanEval using only 25% of the usual decoding budget.
citing papers explorer
-
What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
-
Generative Actor-Critic with Soft Bridge Policies
SoftGAC defines a stochastic bridge from base to action latent that converts the MaxEnt objective into a tractable relative-entropy term reducible to control energy, achieving competitive returns with one-pass sampling.
-
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
-
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.
-
Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
Slowly Annealed Langevin Dynamics provides non-asymptotic KL-based convergence guarantees for tracking moving targets and enables training-free guided generation via a velocity-aware correction that accounts for pretrained marginals.
-
Toward Better Geometric Representations for Molecule Generative Models
LENSEs improves representation-conditioned molecule generation by jointly training a multi-level representation head, perceptual loss, and REPA alignment on pretrained encoders, yielding 97.28% validity and 98.51% stability on GEOM-DRUG.
-
Excluding the Target Domain Improves Extrapolation: Deconfounded Hierarchical Physics Constraints
Deconfounded Hierarchical Gate with counterfactual estimation and hierarchical constraints achieves 46% better RMSE on out-of-distribution battery temperature extrapolation, with excluding target data from pretraining outperforming inclusion.
-
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
PCBF learns return distributions via source-consistent Bellman-coupled paths with shared noise and λ-parameterized control variates, reporting improved fidelity and stability on MRPs, OGBench, and D4RL.
-
BRICKS: Compositional Neural Markov Kernels for Zero-Shot Radiation-Matter Simulation
BRICKS creates compositional neural Markov kernels via hybrid transformers and Riemannian Flow Matching on product manifolds to enable zero-shot simulation of radiation-matter interactions across arbitrary material distributions.
-
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.
-
Fisher Decorator: Refining Flow Policy via a Local Transport Map
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
-
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
-
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.
-
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
-
Elucidating Representation Degradation Problem in Diffusion Model Training
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
-
Flow Matching Guide and Code
Flow Matching is a generative modeling framework with mathematical foundations, design choices, extensions, and open-source PyTorch code for applications like image and text generation.