Can Machines Really See Objects in Images? A Study Based on Syntactic Distance and Visual Self-Referential Instances

· 2026 · cs.CV · arXiv 2606.29416

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Can a vision model truly see an object, or does it only fit surface-level visual cues? Following Wittgenstein's view that the limits of language are the limits of the world, we view a model's recognition ability as bounded by the descriptive system it has learned. In current vision models, this system is often realized through learned feature representations that exploit local statistical cues. We therefore ask whether a model can still classify correctly when such local cues provide no stable basis for distinction. We formalize this question with syntactic distance, which measures class separability through the symmetry of the operations mapping one class to the other: positive distance exposes exploitable local features, whereas zero distance requires global semantics rather than local rules. We construct a visual self-referential task in maximum-variance binary noise: positive samples contain a closed square, while negative samples contain an otherwise identical square with one flipped boundary pixel. The two classes differ in global semantics but have zero syntactic distance, making local statistical shortcuts unreliable. Experiments on ResNets and Vision Transformers reveal a consistent phase-transition phenomenon, with accuracy collapsing to random guessing once the image scale crosses a critical point and does not recover within the tested range. Larger training sets and models only delay this collapse, while globally attentive ViTs reach it earlier. These results reveal a structural capability boundary of current architectures on global-concept tasks, suggesting that general intelligence may require creating new language, not reusing an existing one.

representative citing papers

Self-Referential $K$-SAT and the Finite Analogue of G\"odel's Incompleteness Theorem

cs.CC · 2026-07-02 · unverdicted · novelty 5.0

Claims a finite analogue of Gödel incompleteness in K-SAT by building locally indistinguishable SAT/UNSAT pairs in log-width ensemble, implying sublinear deductive systems require wide clauses and exponential proof sizes, reframing SETH as Gödel projection.

citing papers explorer

Showing 1 of 1 citing paper.

Self-Referential $K$-SAT and the Finite Analogue of G\"odel's Incompleteness Theorem cs.CC · 2026-07-02 · unverdicted · none · ref 43 · internal anchor
Claims a finite analogue of Gödel incompleteness in K-SAT by building locally indistinguishable SAT/UNSAT pairs in log-width ensemble, implying sublinear deductive systems require wide clauses and exponential proof sizes, reframing SETH as Gödel projection.

Can Machines Really See Objects in Images? A Study Based on Syntactic Distance and Visual Self-Referential Instances

fields

years

verdicts

representative citing papers

citing papers explorer