Global Geometry Is Not Enough for Vision Representations

Jiwan Chung; Seon Joo Kim

arxiv: 2602.03282 · v2 · pith:KH7XXOTLnew · submitted 2026-02-03 · 💻 cs.CV · cs.AI

Global Geometry Is Not Enough for Vision Representations

Jiwan Chung , Seon Joo Kim This is my paper

classification 💻 cs.CV cs.AI

keywords geometryglobalbindingcompetencecompositionalembeddingfunctionalinput--output

0 comments

read the original abstract

A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives and evaluation protocols, implicitly treating global geometry as a proxy for representational competence. While global geometry effectively encodes which elements are present, it is often insensitive to how they are composed. We investigate this limitation by testing the ability of geometric metrics to predict compositional binding across a diverse suite of vision encoders. We find that standard geometry-based statistics exhibit near-zero correlation with compositional binding. In contrast, functional sensitivity, as measured by the input--output Jacobian, reliably tracks this capability. We further provide an analytic account showing that this disparity arises from objective design, as existing losses explicitly constrain embedding geometry but leave the local input--output mapping unconstrained. These results suggest that global embedding geometry captures only a partial view of representational competence and establish functional sensitivity as a critical complementary axis for modeling composite structure.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval
cs.IR 2026-04 unverdicted novelty 5.0

Anisotropic self-supervised vision representations degrade approximate nearest-neighbor retrieval performance while more isotropic ones with local purity improve it.