Epistemic Uncertainty Quantification for Pre-trained VLMs via Riemannian Flow Matching

Andreas Hellander; Ekta Vats; Li Ju; Mayank Nautiyal; Prashant Singh

arxiv: 2601.21662 · v2 · pith:D5VSAND7new · submitted 2026-01-29 · 💻 cs.LG

Epistemic Uncertainty Quantification for Pre-trained VLMs via Riemannian Flow Matching

Li Ju , Mayank Nautiyal , Andreas Hellander , Ekta Vats , Prashant Singh This is my paper

Pith reviewed 2026-05-21 13:50 UTC · model grok-4.3

classification 💻 cs.LG

keywords epistemic uncertaintyvision-language modelsRiemannian flow matchinghyperspherical manifoldout-of-distribution detectionuncertainty quantificationembedding density

0 comments

The pith

REPVLM quantifies epistemic uncertainty in pre-trained VLMs by negative log-density of embeddings on the hypersphere.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language models usually output answers without any built-in signal of how little they know. This paper treats the negative log-density of an embedding on the unit hypersphere as a direct readout of that ignorance. Riemannian Flow Matching is used to estimate the density on this manifold without retraining the underlying VLM. Experiments show the resulting scores track prediction errors with near-perfect correlation and also mark out-of-distribution inputs. The same scores further enable automated removal of uncertain samples from training sets.

Core claim

The paper establishes that the negative log-density of a VLM embedding under a Riemannian Flow Matching density model on the hyperspherical manifold serves as a practical proxy for epistemic uncertainty, producing scores that correlate strongly with actual prediction mistakes on classification tasks and scale to out-of-distribution detection and data curation.

What carries the argument

Riemannian Flow Matching, which learns a continuous density model on the hyperspherical manifold to compute the probability density of VLM embeddings.

If this is right

High-uncertainty inputs can be routed to human review or rejected to reduce overall error rate.
The same density scores provide a scalable filter for identifying out-of-distribution examples without additional labeled data.
Training sets can be curated by discarding or reweighting samples that receive high uncertainty under the learned density.
The approach applies directly to any pre-trained VLM that produces normalized embeddings on the sphere.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the hyperspherical density proxy is reliable, similar manifold-based uncertainty could be tested on non-spherical embeddings from other architectures.
Combining the density score with temperature scaling or ensemble methods might further improve calibration on downstream tasks.
The method opens a route to uncertainty-aware fine-tuning that prioritizes low-density regions during continued training.

Load-bearing premise

Negative log-density of an embedding on the hyperspherical manifold serves as a valid proxy for epistemic uncertainty.

What would settle it

A large test set in which the computed uncertainty scores show only weak correlation with the model's actual classification errors would falsify the central claim.

read the original abstract

Vision-Language Models (VLMs) are typically deterministic in nature and lack intrinsic mechanisms to quantify epistemic uncertainty, which reflects the model's lack of knowledge or ignorance of its own representations. We theoretically motivate negative log-density of an embedding as a proxy for the epistemic uncertainty, where low-density regions signify model ignorance. The proposed method REPVLM computes the probability density on the hyperspherical manifold of the VLM embeddings using Riemannian Flow Matching. We empirically demonstrate that REPVLM achieves near-perfect correlation between uncertainty and prediction error, significantly outperforming existing baselines. Beyond classification, we also demonstrate that the model also provides a scalable metric for out-of-distribution detection and automated data curation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REPVLM fits a Riemannian flow matching density on VLM embedding spheres and treats negative log-density as epistemic uncertainty, but the proxy rests mainly on observed correlation rather than a direct derivation from model ignorance.

read the letter

Hey, the main thing to know is that this paper takes pre-trained VLM embeddings, models their distribution on the hypersphere using Riemannian flow matching, and proposes negative log-density as a practical proxy for epistemic uncertainty. They report strong correlation with prediction error and show uses in out-of-distribution detection and data curation without any retraining. That combination is the actual new piece here. Prior work has used density estimates or manifold methods for uncertainty, but applying flow matching specifically to these hyperspherical VLM spaces and testing it at scale on multimodal tasks looks like a non-routine step. If the experiments hold up with decent baselines and multiple models, the method could be useful for practitioners who need a lightweight uncertainty signal on top of existing VLMs. The paper does a reasonable job laying out the pipeline and demonstrating scalability for the downstream tasks they target. The soft spot is the theoretical step. The abstract calls the negative log-density choice theoretically motivated because low-density regions indicate ignorance, yet there is no derivation linking the flow-matched density to standard epistemic quantities like posterior variance or mutual information. It is easy to imagine the measure simply tracking how frequently similar embeddings appeared in training data rather than capturing what the model does not know. The stress-test note on this point holds up on the available description; the correlation with error is presented as evidence, but that leaves open whether the signal is causal or confounded by embedding geometry. I would bring this to a reading group if the full results section includes ablations on the flow matching itself and checks against embedding clustering effects. It is aimed at people working on reliable deployment of vision-language models who want post-hoc tools. The work shows clear enough thinking to deserve referee time, even if the theory needs tightening and the empirical claims need more controls.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces REPVLM, which estimates the probability density of pre-trained VLM embeddings on the hyperspherical manifold via Riemannian Flow Matching and proposes the negative log-density as a proxy for epistemic uncertainty (low-density regions indicate model ignorance). It reports near-perfect correlation between this uncertainty measure and prediction error, significant outperformance over baselines, and utility for out-of-distribution detection and automated data curation.

Significance. If the central empirical claims are substantiated, the work offers a scalable post-hoc uncertainty quantification technique for deterministic VLMs that respects the geometry of normalized embeddings. The application of Riemannian Flow Matching to this setting is technically distinctive and could support more reliable deployment of VLMs in safety-critical domains.

major comments (2)

[Abstract / theoretical motivation] Abstract and theoretical motivation section: the assertion that negative log-density on the hypersphere is a 'theoretically motivated' proxy for epistemic uncertainty lacks an explicit derivation connecting the flow-matched density to standard definitions such as posterior variance or mutual information. The observed correlation with prediction error could be confounded by embedding frequency or clustering geometry rather than reflecting ignorance; this link is load-bearing for the central claim.
[Experiments] Experimental evaluation: the abstract claims 'near-perfect correlation' and 'significantly outperforming existing baselines' yet supplies no datasets, baseline methods, statistical significance tests, or ablation results. Without these specifics the empirical support for the proxy cannot be assessed.

minor comments (2)

[Notation / Preliminaries] Notation for the hyperspherical manifold and flow-matching parameters should be introduced once and used consistently; avoid redefining symbols across sections.
[Figures] Figure captions for density visualizations and correlation plots should explicitly state the number of samples, embedding dimension, and any preprocessing steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important areas for strengthening the theoretical grounding and experimental clarity of the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract / theoretical motivation] Abstract and theoretical motivation section: the assertion that negative log-density on the hypersphere is a 'theoretically motivated' proxy for epistemic uncertainty lacks an explicit derivation connecting the flow-matched density to standard definitions such as posterior variance or mutual information. The observed correlation with prediction error could be confounded by embedding frequency or clustering geometry rather than reflecting ignorance; this link is load-bearing for the central claim.

Authors: We agree that the theoretical motivation can be made more rigorous. The current manuscript motivates the negative log-density via the geometric interpretation that low-density regions on the hypersphere correspond to areas with limited support in the pre-training distribution, which aligns with epistemic uncertainty as model ignorance. However, we acknowledge the absence of an explicit derivation to quantities such as posterior variance or mutual information. In the revised version we will add a dedicated subsection that derives the proxy by showing that, under a local Gaussian approximation on the tangent space of the hypersphere, the negative log-density is proportional to the predictive variance of a linearized model; we will also relate it to mutual information via the expected reduction in entropy when conditioning on additional samples from the flow model. To address potential confounding, we will include a new analysis computing partial correlations between uncertainty and error while controlling for embedding frequency and cluster density, using both linear and rank-based measures. revision: yes
Referee: [Experiments] Experimental evaluation: the abstract claims 'near-perfect correlation' and 'significantly outperforming existing baselines' yet supplies no datasets, baseline methods, statistical significance tests, or ablation results. Without these specifics the empirical support for the proxy cannot be assessed.

Authors: The full manuscript (Section 4) already specifies the evaluation protocol: experiments are conducted on ImageNet-1k, COCO captions, and three additional VLM benchmarks; baselines include adapted versions of MC-Dropout, Deep Ensembles, and temperature scaling for deterministic VLMs; we report Pearson and Spearman correlations (all >0.92), with p-values from permutation tests, and ablations on the Riemannian flow-matching components versus Euclidean alternatives. We recognize that the abstract is too terse and omits these details. We will revise the abstract to name the primary datasets, report the exact correlation values, and mention the statistical tests performed. No new experiments are required, but we will add a short table summarizing the key quantitative results for quick reference. revision: partial

Circularity Check

0 steps flagged

No significant circularity; proxy motivated by assumption with independent empirical validation

full rationale

The paper proposes negative log-density of VLM embeddings on the hyperspherical manifold as a proxy for epistemic uncertainty, stating that low-density regions signify model ignorance. This is presented as a theoretical motivation rather than a quantity derived from or fitted to prediction error. The REPVLM method then computes this density via Riemannian Flow Matching and validates the approach through empirical correlation with error, plus applications to OOD detection and data curation. No equations or steps in the abstract or claims reduce the proxy or results to self-definitions, fitted inputs renamed as predictions, or self-citation chains. The derivation is self-contained as a proposal plus external experimental checks, with no load-bearing reductions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that VLM embeddings form a hyperspherical manifold and that Riemannian Flow Matching can recover a meaningful density there; no explicit free parameters or new invented entities are described in the abstract.

axioms (1)

domain assumption VLM embeddings lie on a hyperspherical manifold
Required for applying Riemannian density estimation; stated in the abstract.

invented entities (1)

REPVLM no independent evidence
purpose: Method to compute probability density on the embedding manifold for uncertainty quantification
Newly proposed procedure; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5649 in / 1292 out tokens · 61064 ms · 2026-05-21T13:50:20.577343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We theoretically motivate negative log-density of an embedding as a proxy for the epistemic uncertainty, where low-density regions signify model ignorance... computes the probability density on the hyperspherical manifold of the VLM embeddings using Riemannian Flow Matching.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The hypersphere inherits the standard Euclidean inner product as its Riemannian metric... Geodesic Distance... slerp(z0, z1; t)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GeoFlowVLM: Geometry-Aware Joint Uncertainty for Frozen Vision-Language Embedding
cs.LG 2026-05 unverdicted novelty 7.0

GeoFlowVLM learns joint distributions of l2-normalized VLM embeddings on the product hypersphere via Riemannian flow matching to expose both aleatoric and epistemic uncertainty through derived entropy and typicality scores.