Flowing with Confidence

Dario Coscia; Erik Bekkers; Friso de Kruiff; Max Welling

arxiv: 2605.18472 · v1 · pith:JXRH5IKCnew · submitted 2026-05-18 · 📊 stat.ML · cs.AI· cs.LG

Flowing with Confidence

Friso de Kruiff , Dario Coscia , Max Welling , Erik Bekkers This is my paper

Pith reviewed 2026-05-20 08:19 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords flow matchinggenerative modelsuncertainty estimationconfidence scoresODE integrationvariance propagationcontinuous normalizing flows

0 comments

The pith

Flow Matching with Confidence yields per-sample scores by injecting input-dependent multiplicative noise and propagating its variance in closed form along the ODE.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative models produce outputs of uneven reliability, yet existing ways to measure that reliability multiply compute by running ensembles or repeated trajectories. The paper introduces Flow Matching with Confidence, which adds multiplicative noise at chosen layers, tracks how its variance spreads through the network exactly, and integrates the result along the sampling ODE. This produces a usable per-sample confidence number at ordinary sampling cost. If correct, the score lets users filter unreliable outputs for higher quality, rewind and redirect uncertain trajectories, and focus extra steps only where the flow is ambiguous. A sympathetic reader would care because it turns opaque generation into something whose trustworthiness can be assessed without slowing the process.

Core claim

Flow Matching with Confidence (FMwC) injects input-dependent multiplicative noise at selected layers, propagates its variance through the network in closed form, and integrates it along the ODE trajectory, yielding a per-sample confidence score at standard sampling cost.

What carries the argument

Closed-form propagation of variance from injected multiplicative noise, integrated along the continuous flow ODE.

If this is right

Filtering low-confidence samples raises image quality and improves thermodynamic stability of generated crystals.
Trajectories can be edited by rewinding to high-uncertainty points and redirecting the flow.
Adaptive ODE stepping concentrates computation where the velocity field is most ambiguous.
The confidence score correlates with the magnitude of the divergence of the learned velocity field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The variance-propagation technique could be adapted to other continuous normalizing flow or diffusion models that evolve via ODEs.
The observed correlation with velocity divergence might motivate new regularization terms during training to produce more stable flows.
Surgical guidance methods could be developed that intervene only at the moments the score flags as uncertain.

Load-bearing premise

The method assumes that the tracked variance of the injected noise accurately reflects the model's true uncertainty about its outputs rather than merely recording an auxiliary signal.

What would settle it

Measure whether the computed scores predict actual per-sample error rates on data with ground truth, or test whether filtering generations by the score measurably raises average quality metrics; failure on either test would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.18472 by Dario Coscia, Erik Bekkers, Friso de Kruiff, Max Welling.

**Figure 1.** Figure 1: Flow-Matching model with Confidence. (Left) Variance is propagated alongside the sample as it is generated along the path p0→p1: well-placed samples (green) contract into a tight ±σ band, while misplaced ones (red) stay diffuse. (Right) Sorting by the resulting confidence score recovers quality across modalities—low-energy crystals and clean digits at high confidence, implausible structures and malformed d… view at source ↗

**Figure 2.** Figure 2: σ 2 t recovers the divergence structure of the learned velocity field at bifurcation time. Spatial fields at t=0.6 on the Checkerboard, where |∇·vt,θ| (right) is available analytically. Quantitative correlation across modalities in Sec. 4. 3.4 Where and When Does Uncertainty Arise? The construction above is mechanically motivated, but the reason the resulting σ 2 t is informative is geometric. The optimal … view at source ↗

**Figure 3.** Figure 3: FMwC’s per-channel t ⋆ targets a specific bifurcation; FM at random t misses it. (a) Seed Dy Er Ni4 (on the hull) under chem-swap (atoms-only at t ⋆ atom) and polymorph (coords-only at t ⋆ coord): FMwC’s edits stay near the hull, FM’s drift off. (b–d) Per-mode success rate over 50 seeds × 20 replicates; FMwC (t=t ⋆ , solid) vs FM (uniform random t, hatched), at matched channel mask and σ. Success requires … view at source ↗

**Figure 4.** Figure 4: A single per-sample adaptive-stepping signal, σ 2 t (t), generalises across modalities. Each panel plots the application’s headline quality metric against integrator step budget N, oriented so down-and-right is better (Crystal y-axis inverted). Methods: FM Uniform (grey), FMwC Uniform (blue), and FMwC Online Adaptive (green), with the controller chosen per modality. Adaptive stepping reallocates compute to… view at source ↗

**Figure 5.** Figure 5: On the Crystals model, one forward pass of FMwC matches Hutchinson’s divergence. Spearman and Pearson correlation against the high-K Hutchinson reference vs. measured FLOPs per query, on the FlowMM backbone; sweep over K ∈ {1, . . . , 500}. Second, the correspondence is universal and geometry-graded [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Generative models can produce nonsensical text, unrealistic images, and unstable materials faster than simulation or human review can absorb; without per-sample confidence, trust erodes. Existing fixes run $k$ ensembles or stochastic trajectories at $k\times$ compute, measuring variability between models, not model confidence. We propose Flow Matching with Confidence (FMwC). FMwC injects input-dependent multiplicative noise at selected layers, propagates its variance through the network in closed form, and integrates it along the ODE trajectory, yielding a per-sample confidence score at standard sampling cost. The score supports multiple uses: filtering improves image quality and thermodynamic stability of crystals; editing rewinds trajectories to the points where the model commits and redirects them; and adaptive stepping concentrates ODE compute where the flow is ambiguous. We find that the confidence score correlates with the magnitude of the divergence of the learned velocity field, which gives us a window to understand the generative process, opening up surgical forms of guidance that target the moments that matter, new sampling algorithms and interpretability of generative models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FMwC gives a low-cost per-sample confidence score for flow models by closed-form propagation of injected multiplicative noise variance along the ODE, but the evidence tying it to actual uncertainty is still indirect.

read the letter

The main thing to know is that this paper proposes Flow Matching with Confidence, where they add input-dependent multiplicative noise at selected layers, propagate its variance analytically through the network, and integrate the result along the flow-matching ODE to produce a per-sample score at normal sampling cost. That mechanism looks new compared to the ensemble or stochastic-trajectory baselines they cite, and the practical angle is clear: the score is meant to support filtering, trajectory editing, and adaptive stepping without extra compute. They also report a correlation between the score and the magnitude of the velocity-field divergence, which at least gives a window into where the model is locally sensitive. That part is useful for thinking about interpretability in generative flows. The experiments they sketch on image quality and crystal stability suggest the score can be applied downstream, which is the kind of concrete use case that matters for practitioners. The soft spot is the validation step. Correlation with divergence is presented as supporting evidence, but it does not directly test whether the propagated variance predicts actual sample errors or epistemic uncertainty better than simpler proxies. Without seeing direct comparisons to ensemble variance or held-out reconstruction error, it is possible the score mainly tracks sensitivity to the auxiliary noise rather than true model doubt. The choice of injection layers and noise scale are free parameters, so any robustness checks on those would strengthen the claim. The derivation is described as closed-form, which is promising if it holds for realistic architectures, but the abstract leaves the details thin. This paper is aimed at researchers working on flow-based and diffusion generative models who need cheaper uncertainty estimates for filtering or guidance. A reader focused on sampling algorithms or model interpretability would find the ideas worth trying. I would send it for peer review because the core technique is distinct enough and the applications are practical, even if the current validation needs tightening with more direct tests.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Flow Matching with Confidence (FMwC) for generative flow models. It injects input-dependent multiplicative noise at selected layers, derives a closed-form propagation of the resulting variance through the network, and integrates this quantity along the ODE trajectory to obtain a per-sample confidence score at standard sampling cost. The score is applied to sample filtering (improving image quality and crystal stability), trajectory editing (rewinding to commitment points), and adaptive ODE stepping. The authors additionally report a correlation between the derived confidence and the magnitude of the divergence of the learned velocity field.

Significance. If the propagated variance can be shown to track genuine model uncertainty rather than an auxiliary quantity, the method would supply an efficient, single-trajectory uncertainty measure for flow-matching models. This could enable practical improvements in generation reliability and open new directions for interpretability and adaptive sampling without ensemble overhead.

major comments (2)

[Abstract and §3] Abstract and §3 (method description): the central claim that closed-form variance propagation of the injected multiplicative noise yields a meaningful per-sample confidence score rests on the untested modeling assumption that this auxiliary quantity corresponds to true epistemic uncertainty or predictive error. The reported correlation with velocity-field divergence magnitude is presented as validation, yet no direct comparisons to ground-truth measures (ensemble variance, reconstruction error on held-out trajectories, or calibration metrics) are described, leaving the mapping from propagated variance to 'confidence' as an open assumption rather than an established result.
[§4] §4 (experimental validation): the free parameters (choice of layers for noise injection and the noise scale) are listed in the axiom ledger but their selection procedure and sensitivity are not quantified. If these choices are post-hoc tuned on the same data used to demonstrate correlation with divergence, the reported utility for filtering and editing may not generalize.

minor comments (2)

[§3] Notation for the variance propagation step should be introduced with an explicit equation (e.g., Eq. (X)) rather than described only in prose, to allow readers to verify the closed-form claim for standard residual or attention blocks.
[Figures 4-6] Figure captions for the editing and adaptive-stepping examples should include quantitative metrics (e.g., FID improvement or wall-clock savings) alongside qualitative visuals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of validation and experimental details that we will address in revision. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method description): the central claim that closed-form variance propagation of the injected multiplicative noise yields a meaningful per-sample confidence score rests on the untested modeling assumption that this auxiliary quantity corresponds to true epistemic uncertainty or predictive error. The reported correlation with velocity-field divergence magnitude is presented as validation, yet no direct comparisons to ground-truth measures (ensemble variance, reconstruction error on held-out trajectories, or calibration metrics) are described, leaving the mapping from propagated variance to 'confidence' as an open assumption rather than an established result.

Authors: We agree that the correspondence between the propagated variance and true epistemic uncertainty is an assumption that would benefit from stronger empirical support. The closed-form propagation is derived directly from the network architecture and the input-dependent noise injection, ensuring it consistently tracks the effect of input perturbations through the ODE. The correlation with velocity-field divergence is offered as supporting evidence, since divergence magnitude in flow models often reflects regions of higher generative sensitivity. We did not perform direct comparisons to ensemble variance or calibration metrics in the submitted version in order to emphasize the single-trajectory efficiency. We will revise the manuscript to explicitly discuss this modeling assumption as a limitation and to add preliminary comparisons against ensemble-based uncertainty estimates. revision: yes
Referee: [§4] §4 (experimental validation): the free parameters (choice of layers for noise injection and the noise scale) are listed in the axiom ledger but their selection procedure and sensitivity are not quantified. If these choices are post-hoc tuned on the same data used to demonstrate correlation with divergence, the reported utility for filtering and editing may not generalize.

Authors: The referee correctly identifies that the selection procedure and sensitivity of the noise-injection layers and scale require clearer documentation. These hyperparameters were chosen to target layers operating at multiple feature scales while keeping the injected variance small enough to act as a perturbation rather than dominate the signal; details appear in the supplementary material. To address potential concerns about post-hoc tuning and generalization, we will revise §4 to include an explicit description of the selection criteria and a sensitivity analysis demonstrating that filtering and editing performance remains stable across reasonable variations of these parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the FMwC confidence score derivation

full rationale

The paper constructs the per-sample confidence score directly from the closed-form propagation of input-dependent multiplicative noise variance through the network layers and its integration along the ODE trajectory. This is presented as the definition of the score rather than a prediction of an independent quantity. The correlation with the magnitude of the divergence of the learned velocity field is reported as a post-hoc empirical observation, not as a definitional identity or fitted result. No load-bearing self-citations or uniqueness theorems from prior work by the authors are invoked to justify the core method. The derivation chain is self-contained and does not reduce to tautology by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions of flow matching (continuous ODE trajectories, differentiable velocity fields) and neural-network forward passes; no new physical entities are postulated. Layer selection and noise scale are implicit free parameters whose values are not reported in the abstract.

free parameters (2)

selected layers for noise injection
Choice of which layers receive the multiplicative noise; affects both computational overhead and the resulting confidence signal.
noise scale parameter
Magnitude of the injected input-dependent noise; must be chosen to produce a useful confidence range.

axioms (2)

standard math The network is differentiable and the velocity field admits a well-defined ODE trajectory.
Invoked when integrating variance along the sampling path.
domain assumption Variance propagation remains accurate under the chosen network architecture and activation functions.
Required for the closed-form claim to hold without simulation.

pith-pipeline@v0.9.0 · 5713 in / 1539 out tokens · 35863 ms · 2026-05-20T08:19:34.620753+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FMwC injects input-dependent multiplicative noise at selected layers, propagates its variance through the network in closed form, and integrates it along the ODE trajectory
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the confidence score correlates with the magnitude of the divergence of the learned velocity field

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 2 internal anchors

[1]

The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability.arXiv preprint arXiv:2310.17467,

Luca Ambrogioni. The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability.arXiv preprint arXiv:2310.17467,

work page arXiv
[2]

Efficient epistemic uncertainty estimation in regression ensemble models using pairwise-distance estimators.arXiv preprint arXiv:2308.13498,

Lucas Berry and David Meger. Efficient epistemic uncertainty estimation in regression ensemble models using pairwise-distance estimators.arXiv preprint arXiv:2308.13498,

work page arXiv
[3]

Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clémentine Fourrier, Nikita Kazeev, Chaitanya K

Siddharth Betala, Samuel P Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clé- mentine Fourrier, Nikita Kazeev, Chaitanya K Joshi, Sékou-Oumar Kaba, et al. Lemat-genbench: A unified evaluation framework for crystal generative models.arXiv preprint arXiv:2512.04562,

work page arXiv
[4]

Riemannian flow matching on general geometries,

Ricky TQ Chen and Yaron Lipman. Flow matching on general geometries.arXiv preprint arXiv:2302.03660,

work page arXiv
[5]

BLIPs: Bayesian Learned Interatomic Potentials

Dario Coscia, Pim de Haan, and Max Welling. BLIPs: Bayesian Learned Interatomic Potentials. arXiv preprint arXiv:2508.14022, 2025a. Dario Coscia, Max Welling, Nicola Demo, and Gianluigi Rozza. Barnn: A bayesian autoregressive and recurrent neural network.arXiv preprint arXiv:2501.18665, 2025b. Thomas M Cover and Joy A Thomas. Network information theory. I...

work page arXiv
[6]

arXiv preprint arXiv:1912.02757 (2019)

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A Loss Landscape Perspective.arXiv preprint arXiv:1912.02757,

work page arXiv 1912
[7]

Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946, 2025

Metod Jazbec, Eliot Wong-Toi, Guoxuan Xia, Dan Zhang, Eric Nalisnick, and Stephan Mandt. Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946,

work page arXiv
[8]

A geometric explanation of the likelihood ood detection paradox

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C Cresswell, Anthony L Caterini, Rahul G Krishnan, and Gabriel Loaiza-Ganem. A geometric explanation of the likelihood ood detection paradox. arXiv preprint arXiv:2403.18910,

work page arXiv
[9]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Flowmm: Generating materials with riemannian flow matching.arXiv preprint arXiv:2406.04713,

Benjamin Kurt Miller, Ricky TQ Chen, Anuroop Sriram, and Brandon M Wood. Flowmm: Generating materials with riemannian flow matching.arXiv preprint arXiv:2406.04713,

work page arXiv
[11]

Do Deep Generative Models Know What They Don't Know?

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know?arXiv preprint arXiv:1810.09136,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263,

Yiming Qin, Manuel Madeira, Dorina Thanou, and Pascal Frossard. Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263,

work page arXiv
[13]

Uma: A family of universal models for atoms

Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R Kitchin, Daniel S Levine, et al. UMA: A Family of Universal Models for Atoms.arXiv preprint arXiv:2506.23971,

work page arXiv
[14]

14 A.2 Continuous Transformations for Generative Modeling

12 FLOWING WITHCONFIDENCEAPPENDIX TABLE OFCONTENTS A Notation and Background 14 A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Continuous Transformations for Generative Modeling . . . . . . . . . . . . . . . . 14 A.3 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15...

work page 2022
[15]

Instead of treating neural network parameters as fixed, Bayesian approaches consider the weights as random variables drawn from a prior distribution, ω∼p(ω)

2i .(14) A.3 Bayesian Neural Networks Bayesian perspectives have played an important role in the development of deep learning methods, offering a coherent probabilistic framework for representing and reasoning about uncertainty [Hinton and Van Camp, 1993, Graves, 2011, Blundell et al., 2015, Gal and Ghahramani, 2016]. Instead of treating neural network pa...

work page 1993
[16]

Parameterizing function-space distributions.So far, we have reasoned about a variational poste- rior over vector fields vt in an abstract, infinite-dimensional function space

for reference. Parameterizing function-space distributions.So far, we have reasoned about a variational poste- rior over vector fields vt in an abstract, infinite-dimensional function space. While mathematically elegant, this formulation is not directly implementable: we cannot store or sample arbitrary functions. A natural solution is toparameterize the ...

work page 1991
[17]

(37) This establishes equivalence in the deterministic case

showed: ∇θEt∼U(0,1),x∼p t(x) ∥ut(x)−v t,θ(x)∥2 =∇ θEx1∼pdata, t∼U(0,1),x t∼pt(x|x1) ∥vt,θ(xt)−u t(xt |x 1)∥2 . (37) This establishes equivalence in the deterministic case. Step 2: Bayesian lifting via reparameterization.Suppose ω∼q ψ(ω) admits a reparameterization ω=g ψ(ϵ) with ϵ∼p(ϵ) . Then, for any function f(ω) , the reparametrization trick [Kingma and...

work page 2014
[18]

keep nothing

2i −KL({α i}i≥1).(42) With,x 1 ∼p data, t∼ U[0,1],x t ∼p t(x|x 1),ϵ∼ N(0, I). Here: •g ψ(x1,ϵ) implements the V AD reparameterization (41), with input-dependent adaptive dropout scalesα i(x1). • Sampling fromϵimplements the stochasticity of the variational posteriorq ψ. • The Kullback Leibler term only depends on the adaptive dropout coefficients, as impl...

work page 2024
[19]

Class conditioning, when used, is provided as a learned class-embedding vector added to the time embedding before the residual broadcast. Inference networks Eγ are attached to every convolutional and linear layer of the UNet (downsampling stages, bottleneck, upsampling stages), with the same two-layer- MLP construction used on Checkerboard; each Eγ takes ...

work page 2024
[20]

(1) Stochastic: draw one weight sample ω∼q ψ(ω|x, t) at each ODE step and integrate that single trajectory

Method Mispl.%↓KL↓ Output-FMwC MAP 4.3% 0.088 FMwC MAP 4.6% 0.084 D.2 Sampling Strategy Ablation The variational posterior over weights admits three natural decoders for sample generation. (1) Stochastic: draw one weight sample ω∼q ψ(ω|x, t) at each ODE step and integrate that single trajectory. (2)Mean velocity ( k=5): draw k weight samples per step, ave...

work page 2013
[21]

trajectory-integrated AUPRC on the Checkerboard for the analytical variance estimators

Propagation Endpoint AUPRC↑ 1st-order 0.30 2nd-order 0.30 Gauss-Hermite 0.30 Table 11:Trajectory-accumulated scoring lifts FMwC’s analytical AUPRC by +0.03 at one trajectory.Endpoint vs. trajectory-integrated AUPRC on the Checkerboard for the analytical variance estimators. Method Endpoint AUPRC↑Traj-int AUPRC↑ FMwC 0.30 0.33 FMwC Online Adaptive, deploye...

work page 2024

[1] [1]

The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability.arXiv preprint arXiv:2310.17467,

Luca Ambrogioni. The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability.arXiv preprint arXiv:2310.17467,

work page arXiv

[2] [2]

Efficient epistemic uncertainty estimation in regression ensemble models using pairwise-distance estimators.arXiv preprint arXiv:2308.13498,

Lucas Berry and David Meger. Efficient epistemic uncertainty estimation in regression ensemble models using pairwise-distance estimators.arXiv preprint arXiv:2308.13498,

work page arXiv

[3] [3]

Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clémentine Fourrier, Nikita Kazeev, Chaitanya K

Siddharth Betala, Samuel P Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clé- mentine Fourrier, Nikita Kazeev, Chaitanya K Joshi, Sékou-Oumar Kaba, et al. Lemat-genbench: A unified evaluation framework for crystal generative models.arXiv preprint arXiv:2512.04562,

work page arXiv

[4] [4]

Riemannian flow matching on general geometries,

Ricky TQ Chen and Yaron Lipman. Flow matching on general geometries.arXiv preprint arXiv:2302.03660,

work page arXiv

[5] [5]

BLIPs: Bayesian Learned Interatomic Potentials

Dario Coscia, Pim de Haan, and Max Welling. BLIPs: Bayesian Learned Interatomic Potentials. arXiv preprint arXiv:2508.14022, 2025a. Dario Coscia, Max Welling, Nicola Demo, and Gianluigi Rozza. Barnn: A bayesian autoregressive and recurrent neural network.arXiv preprint arXiv:2501.18665, 2025b. Thomas M Cover and Joy A Thomas. Network information theory. I...

work page arXiv

[6] [6]

arXiv preprint arXiv:1912.02757 (2019)

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A Loss Landscape Perspective.arXiv preprint arXiv:1912.02757,

work page arXiv 1912

[7] [7]

Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946, 2025

Metod Jazbec, Eliot Wong-Toi, Guoxuan Xia, Dan Zhang, Eric Nalisnick, and Stephan Mandt. Generative uncertainty in diffusion models.arXiv preprint arXiv:2502.20946,

work page arXiv

[8] [8]

A geometric explanation of the likelihood ood detection paradox

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C Cresswell, Anthony L Caterini, Rahul G Krishnan, and Gabriel Loaiza-Ganem. A geometric explanation of the likelihood ood detection paradox. arXiv preprint arXiv:2403.18910,

work page arXiv

[9] [9]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Flowmm: Generating materials with riemannian flow matching.arXiv preprint arXiv:2406.04713,

Benjamin Kurt Miller, Ricky TQ Chen, Anuroop Sriram, and Brandon M Wood. Flowmm: Generating materials with riemannian flow matching.arXiv preprint arXiv:2406.04713,

work page arXiv

[11] [11]

Do Deep Generative Models Know What They Don't Know?

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know?arXiv preprint arXiv:1810.09136,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263,

Yiming Qin, Manuel Madeira, Dorina Thanou, and Pascal Frossard. Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263,

work page arXiv

[13] [13]

Uma: A family of universal models for atoms

Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R Kitchin, Daniel S Levine, et al. UMA: A Family of Universal Models for Atoms.arXiv preprint arXiv:2506.23971,

work page arXiv

[14] [14]

14 A.2 Continuous Transformations for Generative Modeling

12 FLOWING WITHCONFIDENCEAPPENDIX TABLE OFCONTENTS A Notation and Background 14 A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Continuous Transformations for Generative Modeling . . . . . . . . . . . . . . . . 14 A.3 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15...

work page 2022

[15] [15]

Instead of treating neural network parameters as fixed, Bayesian approaches consider the weights as random variables drawn from a prior distribution, ω∼p(ω)

2i .(14) A.3 Bayesian Neural Networks Bayesian perspectives have played an important role in the development of deep learning methods, offering a coherent probabilistic framework for representing and reasoning about uncertainty [Hinton and Van Camp, 1993, Graves, 2011, Blundell et al., 2015, Gal and Ghahramani, 2016]. Instead of treating neural network pa...

work page 1993

[16] [16]

Parameterizing function-space distributions.So far, we have reasoned about a variational poste- rior over vector fields vt in an abstract, infinite-dimensional function space

for reference. Parameterizing function-space distributions.So far, we have reasoned about a variational poste- rior over vector fields vt in an abstract, infinite-dimensional function space. While mathematically elegant, this formulation is not directly implementable: we cannot store or sample arbitrary functions. A natural solution is toparameterize the ...

work page 1991

[17] [17]

(37) This establishes equivalence in the deterministic case

showed: ∇θEt∼U(0,1),x∼p t(x) ∥ut(x)−v t,θ(x)∥2 =∇ θEx1∼pdata, t∼U(0,1),x t∼pt(x|x1) ∥vt,θ(xt)−u t(xt |x 1)∥2 . (37) This establishes equivalence in the deterministic case. Step 2: Bayesian lifting via reparameterization.Suppose ω∼q ψ(ω) admits a reparameterization ω=g ψ(ϵ) with ϵ∼p(ϵ) . Then, for any function f(ω) , the reparametrization trick [Kingma and...

work page 2014

[18] [18]

keep nothing

2i −KL({α i}i≥1).(42) With,x 1 ∼p data, t∼ U[0,1],x t ∼p t(x|x 1),ϵ∼ N(0, I). Here: •g ψ(x1,ϵ) implements the V AD reparameterization (41), with input-dependent adaptive dropout scalesα i(x1). • Sampling fromϵimplements the stochasticity of the variational posteriorq ψ. • The Kullback Leibler term only depends on the adaptive dropout coefficients, as impl...

work page 2024

[19] [19]

Class conditioning, when used, is provided as a learned class-embedding vector added to the time embedding before the residual broadcast. Inference networks Eγ are attached to every convolutional and linear layer of the UNet (downsampling stages, bottleneck, upsampling stages), with the same two-layer- MLP construction used on Checkerboard; each Eγ takes ...

work page 2024

[20] [20]

(1) Stochastic: draw one weight sample ω∼q ψ(ω|x, t) at each ODE step and integrate that single trajectory

Method Mispl.%↓KL↓ Output-FMwC MAP 4.3% 0.088 FMwC MAP 4.6% 0.084 D.2 Sampling Strategy Ablation The variational posterior over weights admits three natural decoders for sample generation. (1) Stochastic: draw one weight sample ω∼q ψ(ω|x, t) at each ODE step and integrate that single trajectory. (2)Mean velocity ( k=5): draw k weight samples per step, ave...

work page 2013

[21] [21]

trajectory-integrated AUPRC on the Checkerboard for the analytical variance estimators

Propagation Endpoint AUPRC↑ 1st-order 0.30 2nd-order 0.30 Gauss-Hermite 0.30 Table 11:Trajectory-accumulated scoring lifts FMwC’s analytical AUPRC by +0.03 at one trajectory.Endpoint vs. trajectory-integrated AUPRC on the Checkerboard for the analytical variance estimators. Method Endpoint AUPRC↑Traj-int AUPRC↑ FMwC 0.30 0.33 FMwC Online Adaptive, deploye...

work page 2024