VLMs possess a latent 3D scene topology subspace corresponding to Laplacian eigenmaps that can be causally shaped via Dirichlet energy regularization to improve spatial task performance by up to 12.1%.
Visual symbolic mechanisms: Emergent sym- bol processing in vision language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Pretrained vision transformers use specific attention heads sensitive to Gestalt continuity for object binding, shown via probes on synthetic datasets and ablation experiments.
citing papers explorer
-
Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models
VLMs possess a latent 3D scene topology subspace corresponding to Laplacian eigenmaps that can be causally shaped via Dirichlet energy regularization to improve spatial task performance by up to 12.1%.
-
I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers
Pretrained vision transformers use specific attention heads sensitive to Gestalt continuity for object binding, shown via probes on synthetic datasets and ablation experiments.