{"paper":{"title":"GeoFlowVLM: Geometry-Aware Joint Uncertainty for Frozen Vision-Language Embedding","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A single masked velocity field on paired hyperspherical embeddings yields valid joint and conditional Riemannian flows for uncertainty in frozen vision-language models.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andreas Hellander, Ekta Vats, Li Ju, Mayank Nautiyal, Prashant Singh","submitted_at":"2026-05-13T11:12:18Z","abstract_excerpt":"Standard dual-encoder vision-language models that map images and text to deterministic points on a shared unit hypersphere through $\\ell_2$ normalization typically expose neither \\emph{aleatoric} uncertainty (cross-modal ambiguity) nor \\emph{epistemic} uncertainty (lack of training-distribution support). Existing post-hoc methods either recover at most one of the two uncertainty components, or ignore the hyperspherical geometry of these models' embeddings. We propose \\textbf{GeoFlowVLM} as a post-hoc adapter that learns the joint distribution of paired $\\ell_2$-normalised dual-encoder VLM embe"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"A consistency result shows that, in the population limit, the trained network exposes the joint flow and both cross-modal conditional flows as valid Riemannian flow-matching velocity fields on their respective domains.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a single masked velocity field trained via Riemannian flow matching on paired hyperspherical embeddings will yield practically useful conditional and marginal distributions whose derived entropy and typicality scores remain calibrated on real benchmarks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"GeoFlowVLM learns joint distributions of l2-normalized VLM embeddings on the product hypersphere via Riemannian flow matching to expose both aleatoric and epistemic uncertainty through derived entropy and typicality scores.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A single masked velocity field on paired hyperspherical embeddings yields valid joint and conditional Riemannian flows for uncertainty in frozen vision-language models.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e0725ac57e419337e2573ef8c43a94e634e9f343c396a4db2788f4aa952fc3e0"},"source":{"id":"2605.13352","kind":"arxiv","version":1},"verdict":{"id":"e53b1abf-85d4-4cf7-921b-ecdc37dd8e2a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:40:39.196016Z","strongest_claim":"A consistency result shows that, in the population limit, the trained network exposes the joint flow and both cross-modal conditional flows as valid Riemannian flow-matching velocity fields on their respective domains.","one_line_summary":"GeoFlowVLM learns joint distributions of l2-normalized VLM embeddings on the product hypersphere via Riemannian flow matching to expose both aleatoric and epistemic uncertainty through derived entropy and typicality scores.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a single masked velocity field trained via Riemannian flow matching on paired hyperspherical embeddings will yield practically useful conditional and marginal distributions whose derived entropy and typicality scores remain calibrated on real benchmarks.","pith_extraction_headline":"A single masked velocity field on paired hyperspherical embeddings yields valid joint and conditional Riemannian flows for uncertainty in frozen vision-language models."},"references":{"count":44,"sample":[{"doi":"","year":2021,"title":"Learning transferable visual models from natural language supervision","work_id":"7f9713a7-9534-42a1-8f0d-22e3e2d5df4f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Sigmoid loss for lan- guage image pre-training","work_id":"539235e9-9cb6-482a-8ba4-6335aef79d6d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Probabilistic embeddings for cross-modal retrieval","work_id":"790d36ac-ab8f-49a3-a188-9d3b633b6e42","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1910,"title":"Prob- vlm: Probabilistic adapter for frozen vison-language models","work_id":"385db3eb-5648-4cc6-8903-2b70f5646fb4","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Probabilistic embeddings for frozen vision-language models: uncertainty quantiﬁcation with gaussian process latent vari- able models","work_id":"f60ff7f5-4e3c-4a24-884e-97464280f7de","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":44,"snapshot_sha256":"961b6f24325e60a962c2fb888e320a1c7f2c8b5e322241532a452f0603001c12","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}