pith. sign in

arxiv: 2605.00941 · v3 · pith:3MQ3NOMBnew · submitted 2026-05-01 · 💻 cs.LG · cs.CV

Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching

Pith reviewed 2026-05-22 10:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords flow matchingposterior covarianceuncertainty quantificationTweedie's formulavelocity field divergencegenerative modelingone-step generation
0
0 comments X

The pith

Flow matching uncertainty reduces exactly to the divergence of the learned velocity field.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that uncertainty in samples from flow matching models has an exact closed-form expression that requires no new training. By extending Tweedie's formula from denoising models to the flow matching interpolant, the posterior covariance at any point on the generative trajectory depends only on the divergence of the velocity field. This quantity is available from any pre-trained model and can be evaluated without architectural changes. For one-step generators the same expression gives the full end-to-end uncertainty in a single forward pass. Experiments indicate that the resulting uncertainty maps highlight semantically meaningful regions and that the scalar score tracks actual error, all at far lower cost than ensembles or Monte Carlo methods.

Core claim

By extending Tweedie's formula from the denoising setting to the flow matching interpolant, we derive an exact, closed-form expression for the posterior covariance at every point along the generative trajectory. The result depends on a single quantity, namely the divergence of the learned velocity field, which can be computed post-hoc on any pre-trained flow matching model, requiring no retraining and no architectural modification.

What carries the argument

The closed-form posterior covariance obtained by extending Tweedie's formula to the flow matching interpolant, expressed solely in terms of the divergence of the learned velocity field.

If this is right

  • Uncertainty can be obtained for any pre-trained flow matching model without retraining or auxiliary heads.
  • One-step generators such as MeanFlow produce end-to-end generation uncertainty in a single forward pass.
  • Per-pixel uncertainty maps concentrate on high-variation regions such as digit boundaries.
  • A scalar uncertainty score derived from the same expression tracks actual prediction error.
  • Uncertainty evaluation requires orders of magnitude less compute than ensembling or Monte Carlo dropout.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same divergence-based relation may supply closed-form uncertainty for other continuous-time generative models.
  • These uncertainty maps could be used to prioritize regions for refinement or to weight samples in downstream tasks.
  • Direct comparison of the derived covariance against ground-truth variance on controlled synthetic data would provide a sharper test.

Load-bearing premise

The mathematical extension of Tweedie's formula from the denoising setting to the continuous flow matching interpolant holds.

What would settle it

Train a flow matching model on MNIST, compute the formula's covariance using the velocity divergence at selected points, then compare it directly to the empirical covariance measured across many independent generated samples at those same points.

Figures

Figures reproduced from arXiv: 2605.00941 by Jian Wang, Jiarui Xing, Song Wang.

Figure 1
Figure 1. Figure 1: Our closed-form uncertainty for flow matching. For any pre-trained flow matching model, our formula Cov(x1 | xt) = (1−t) 2 t [I + (1−t)Jvθ ] produces per-pixel uncertainty maps directly from the velocity Jacobian, with no retraining, no ensembling, and no extra forward passes. At small t (near noise) the maps are diffuse; as t grows toward the data, uncertainty progressively concentrates on digit boundarie… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical scalar uncertainty U(xt, t) vs. flow time. Blue: U computed from the trained flow matching model via Eq. (22) (mean ± std over 16 test samples, 50 Hutchinson probes). Red dashed: prior baseline (1−t) 2 /t d corresponding to div vθ = 0. The 1 to 2 orders-of-magnitude gap is the quantitative footprint of the learned flow’s contractive (negative-divergence) behaviour. makes this precise: negative di… view at source ↗
Figure 4
Figure 4. Figure 4: Euler trajectory (odd rows) and corresponding Tweedie UQ maps (even rows) for four MNIST samples. Un￾certainty evolves from diffuse (early t) to boundary-localised (late t), aligning with the model’s progressive resolution of digit identity, topology, and stroke boundary. 5.3. Correlation with Prediction Error To answer Q2, we compute the Spearman rank correlation ρ between the scalar score U(xt, t) and th… view at source ↗
Figure 5
Figure 5. Figure 5: Total UQ cost (training + inference, log scale) for 16 samples. Tweedie+FM and Tweedie+MF require no retraining and produce uncertainty in a single inference pass; MC Dropout requires retraining a dropout-enabled model plus 50 stochastic passes; deep ensembles require 5 independent training runs. Our method is roughly 104× cheaper end-to-end. simultaneously (i) retraining-free, (ii) exact at the time at wh… view at source ↗
read the original abstract

Flow matching has become a leading framework for generative modeling, but quantifying the uncertainty of its samples remains an open problem. Existing approaches retrain the model with auxiliary variance heads, maintain costly ensembles, or propagate approximate covariance through many integration steps, trading off training cost, inference cost, or accuracy. We show that none of these trade-offs is necessary. By extending Tweedie's formula from the denoising setting to the flow matching interpolant, we derive an exact, closed-form expression for the posterior covariance at every point along the generative trajectory. The result depends on a single quantity, namely the divergence of the learned velocity field, which can be computed post-hoc on any pre-trained flow matching model, requiring no retraining and no architectural modification. For one-step generators such as MeanFlow, the same formula yields the end-to-end generation uncertainty in a single forward pass, eliminating the multi-step variance propagation required by all prior methods. Experiments on MNIST confirm that the resulting per-pixel uncertainty maps are semantically meaningful, concentrating on digit boundaries where inter-sample variation is highest, and that the scalar uncertainty score tracks actual prediction error, all at roughly $10^4 \times$ less total compute than ensembling or Monte Carlo dropout.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that by extending Tweedie's formula from the denoising setting to the flow-matching interpolant x_t = (1-t)x_0 + t x_1, one obtains an exact closed-form expression for the posterior covariance at any point along the trajectory; this expression depends only on the divergence of the learned velocity field v_t and can be evaluated post-hoc on any pre-trained flow-matching model. Experiments on MNIST are presented to show that the resulting per-pixel uncertainty maps are semantically meaningful and that a scalar uncertainty score correlates with prediction error, all at far lower compute than ensembles.

Significance. If the central derivation holds, the result would be a practical advance for uncertainty quantification in flow matching, eliminating the need for retraining, auxiliary heads, or multi-step covariance propagation. The post-hoc nature and applicability to one-step generators such as MeanFlow are attractive. The MNIST results provide initial evidence that the uncertainty is interpretable, but the overall significance is conditional on the exactness of the mathematical extension.

major comments (2)
  1. [Section 3 (derivation)] The derivation that extends Tweedie's formula to the flow-matching interpolant and concludes that posterior covariance is exactly the divergence of v_t (Section 3, around the statement following Eq. (7) or equivalent): the posterior covariance at an intermediate t generally depends on the full Jacobian of the flow map, not solely on its trace (divergence). Please supply the explicit steps showing how Jacobian determinant or eigenvalue contributions reduce to the scalar divergence, including any isotropy or straight-path assumptions required for the reduction to be exact.
  2. [Section 3] The claim that the formula is 'exact' and 'closed-form' for general learned velocity fields (abstract and Section 3): if the reduction relies on the velocity field satisfying additional structure beyond the standard flow-matching objective, this should be stated explicitly as a modeling assumption, because the ODE marginalization otherwise retains off-diagonal covariance terms.
minor comments (2)
  1. [Experiments] The MNIST experiments are described only at a high level; adding quantitative metrics (e.g., correlation coefficients between uncertainty score and actual error) and a comparison against a simple baseline such as MC dropout would strengthen the empirical support.
  2. [Notation and preliminaries] Notation: ensure consistent use of subscripts for time t and clarify whether the divergence is evaluated on the learned or ground-truth velocity field in the theoretical statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and have revised the manuscript to improve the presentation and explicitness of the derivation in Section 3.

read point-by-point responses
  1. Referee: [Section 3 (derivation)] The derivation that extends Tweedie's formula to the flow-matching interpolant and concludes that posterior covariance is exactly the divergence of v_t (Section 3, around the statement following Eq. (7) or equivalent): the posterior covariance at an intermediate t generally depends on the full Jacobian of the flow map, not solely on its trace (divergence). Please supply the explicit steps showing how Jacobian determinant or eigenvalue contributions reduce to the scalar divergence, including any isotropy or straight-path assumptions required for the reduction to be exact.

    Authors: We thank the referee for this observation. The original manuscript presented the core extension concisely. Under the straight-path interpolant x_t = (1-t)x_0 + t x_1 that defines standard flow matching, the map from x_t to the posterior over x_0 is an affine transformation whose Jacobian is a scalar multiple of the identity. All eigenvalues are therefore identical, and the determinant and eigenvalue contributions to the posterior covariance reduce exactly to the trace of the Jacobian, which is the divergence of v_t. We have added the full step-by-step derivation, including the explicit invocation of the straight-path assumption and the resulting isotropy, to the revised Section 3. revision: yes

  2. Referee: [Section 3] The claim that the formula is 'exact' and 'closed-form' for general learned velocity fields (abstract and Section 3): if the reduction relies on additional structure beyond the standard flow-matching objective, this should be stated explicitly as a modeling assumption, because the ODE marginalization otherwise retains off-diagonal covariance terms.

    Authors: We appreciate the referee highlighting the need for clarity. The derivation is exact for any velocity field obtained from the standard flow-matching objective on the linear interpolant; no further structure is imposed. Because the interpolant is linear, the conditional distributions along the trajectory yield a posterior covariance whose off-diagonal contributions are eliminated by the change-of-variables and the definition of the learned velocity as the conditional expectation, leaving only the divergence (trace) as the closed-form scalar expression. Per-pixel uncertainty maps are obtained by evaluating the relevant diagonal contributions component-wise. We have partially revised Section 3 to state these modeling choices explicitly while preserving the claim of exactness under the standard flow-matching setup. revision: partial

Circularity Check

0 steps flagged

Derivation extends external Tweedie's formula to flow-matching interpolant without self-definition or fitted-input reduction

full rationale

The paper's central result is obtained by extending Tweedie's formula (an external identity from the denoising literature) to the flow-matching interpolant x_t = (1-t)x_0 + t x_1, yielding posterior covariance expressed solely in terms of the divergence of the already-trained velocity field v_t. This quantity is computed post-hoc on a pre-trained model and is not a newly fitted parameter or a self-defined quantity. No equations in the abstract or described derivation reduce the claimed covariance back to the input velocity field by construction, nor do they rely on load-bearing self-citations, uniqueness theorems from the same authors, or smuggled ansatzes. The derivation remains self-contained once the validity of the Tweedie extension is granted; any doubt about missing Jacobian terms concerns correctness rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of extending Tweedie's formula to the flow-matching setting and on the assumption that the learned velocity field supports meaningful covariance extraction.

axioms (1)
  • domain assumption Tweedie's formula extends exactly to the flow matching interpolant
    The derivation begins from this extension as stated in the abstract.

pith-pipeline@v0.9.0 · 5746 in / 1139 out tokens · 31312 ms · 2026-05-22T10:28:55.259282+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 6 internal anchors

  1. [1]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023

  2. [2]

    Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721, 2024

    Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721, 2024

  3. [3]

    Learning patient-specific disease dynamics with latent flow matching for longitudinal imaging generation.arXiv preprint arXiv:2512.09185, 2025

    Hao Chen, Rui Yin, Yifan Chen, Qi Chen, and Chao Li. Learning patient-specific disease dynamics with latent flow matching for longitudinal imaging generation.arXiv preprint arXiv:2512.09185, 2025

  4. [4]

    Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

    Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

  5. [5]

    Scaling rectified flow transformers for high-resolution image synthesis.Interna- tional Conference on Machine Learning, 2024

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas M¨uller, Harry Saini, Yam Levi, Dominik Lorber, Dustin Podell, Robin Rombach, et al. Scaling rectified flow transformers for high-resolution image synthesis.Interna- tional Conference on Machine Learning, 2024

  6. [6]

    Dropout as a Bayesian approximation: Representing model uncertainty in deep learn- ing

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learn- ing. InInternational Conference on Machine Learning, pages 1050–1059, 2016

  7. [7]

    Mean flows for one-step generative model- ing.Advances in Neural Information Processing Systems, 38: 75460–75482, 2025

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, Zico Kolter, and Kaiming He. Mean flows for one-step generative model- ing.Advances in Neural Information Processing Systems, 38: 75460–75482, 2025. 8

  8. [8]

    Quantifying epistemic uncertainty in diffusion models.arXiv preprint arXiv:2602.09170, 2026

    Aditi Gupta, Raphael A Meyer, Yotam Yaniv, Elynn Chen, and N Benjamin Erichson. Quantifying epistemic uncertainty in diffusion models.arXiv preprint arXiv:2602.09170, 2026

  9. [9]

    Flow Matching with Uncertainty Quantification and Guidance

    Juyeop Han, Lukas Lao Beyer, and Sertac Karaman. Flow matching with uncertainty quantification and guidance.arXiv preprint arXiv:2602.10326, 2026

  10. [10]

    Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  11. [11]

    A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Com- munications in Statistics—Simulation and Computation, 18 (3):1059–1076, 1989

    Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Com- munications in Statistics—Simulation and Computation, 18 (3):1059–1076, 1989

  12. [12]

    Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946, 2025

    Metod Jazbec, Eliot Wong-Toi, Guoxuan Xia, Dan Zhang, Eric Nalisnick, and Stephan Mandt. Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946, 2025

  13. [13]

    Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference.arXiv preprint arXiv:2310.11142, 2023

    Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, and Zhijie Deng. Bayesdiff: Estimating pixel-wise uncertainty in diffu- sion via bayesian inference.arXiv preprint arXiv:2310.11142, 2024

  14. [14]

    Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in Neural Information Processing Systems, 30, 2017

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in Neural Information Processing Systems, 30, 2017

  15. [15]

    Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 1998

    Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 1998

  16. [16]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matthew Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  17. [17]

    Im- ageflownet: Forecasting multiscale image-level trajectories of disease progression with irregularly-sampled longitudinal medical images.arXiv preprint arXiv:2406.14794, 2024

    Chen Liu, Ke Xu, Liangbo L Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C Wang, Lucian V Del Priore, and Smita Krishnaswamy. Im- ageflownet: Forecasting multiscale image-level trajectories of disease progression with irregularly-sampled longitudinal medical images.arXiv preprint arXiv:2406.14794, 2024

  18. [18]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2023

  19. [19]

    On the posterior distribu- tion in denoising: Application to uncertainty quantification

    Hila Manor and Tomer Michaeli. On the posterior distribu- tion in denoising: Application to uncertainty quantification. International Conference on Learning Representations, 2024

  20. [20]

    Movie Gen: A Cast of Media Foundation Models

    Adam Polyak et al. Movie Gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720, 2024

  21. [21]

    Free hunch: Denoiser covariance estimation for diffusion models without extra costs.arXiv preprint arXiv:2410.11149, 2024

    Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs.arXiv preprint arXiv:2410.11149, 2024

  22. [22]

    An empirical Bayes approach to statistics

    Herbert E Robbins. An empirical Bayes approach to statistics. Proceedings of the Third Berkeley Symposium on Mathemati- cal Statistics and Probability, 1956

  23. [23]

    Progressive distillation for fast sampling of diffusion models.International Conference on Learning Representations, 2022

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.International Conference on Learning Representations, 2022

  24. [24]

    Eigenscore: Ood detection using covariance in diffusion models.arXiv preprint arXiv:2510.07206, 2025

    Shirin Shoushtari, Yi Wang, Xiao Shi, M Salman Asif, and Ulugbek S Kamilov. Eigenscore: OOD detection using posterior covariance in diffusion models.arXiv preprint arXiv:2510.07206, 2025

  25. [25]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021

  26. [26]

    Consistency models.International Conference on Machine Learning, 2023

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models.International Conference on Machine Learning, 2023

  27. [27]

    A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661– 1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661– 1674, 2011

  28. [28]

    Uncertainty-Aware Distribution-to-Distribution Flow Matching for Scientific Imaging

    Dongxia Wu, Yuhui Zhang, Serena Yeung-Levy, Emma Lundberg, and Emily B Fox. Uncertainty quantification for distribution-to-distribution flow matching in scientific imag- ing.arXiv preprint arXiv:2603.21717, 2026. 9