arxiv: 2604.23030 · v1 · submitted 2026-04-24 · 🧬 q-bio.NC

Recognition: unknown

Vision as looking and seeing through a bottleneck

Li Zhaoping

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:39 UTC · model grok-4.3

classification 🧬 q-bio.NC

keywords visionbottleneckprimary visual cortexsaliency mapsaccadeslooking and seeingtop-down feedbackgaze

0 comments

The pith

Vision is better understood as looking and seeing through a bottleneck that starts in primary visual cortex.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Because only a tiny fraction of retinal input ever gets recognized, the paper proposes treating vision as two distinct but linked stages. Looking uses the peripheral field to pick out what matters and shifts gaze to center it at the fovea. Seeing then recognizes the selected central content. V1 begins this bottleneck by producing a bottom-up saliency map that drives where the eyes move next, while later top-down signals sharpen the recognition stage. This framing accounts for slower progress on higher vision and calls for experiments that let eyes move freely rather than forcing fixation.

Core claim

To a first approximation, vision is better formulated as looking and seeing through a bottleneck. Looking, mainly by the peripheral visual field, selects visual information to enter this bottleneck, largely via gaze shifts that center selected contents at fovea. Seeing, mainly by the central visual field, recognizes this content. V1 initiates the bottleneck and contributes to looking by generating a bottom-up saliency map that guides saccades exogenously, and top-down feedback along the visual pathway, targeting mainly the representation of the central visual field, refines seeing.

What carries the argument

The looking-seeing bottleneck initiated by V1's bottom-up saliency map, which separates selection of information via peripheral vision and gaze shifts from recognition of centered content.

If this is right

Theories of vision must explicitly connect neural activity in V1 to observable gaze behavior rather than treating fixation as a default state.
Experimental designs that force fixation will miss the main function of the early bottleneck and slow progress on higher visual areas.
Top-down feedback should be studied mainly for its effects on central-field representations that support seeing after selection has occurred.
Recognition models will need to incorporate the prior selection step performed by peripheral saliency and saccades.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This split predicts that peripheral vision should excel at guiding gaze but remain poor at fine recognition even when attention is directed there.
Disrupting the V1 saliency map should alter where eyes land more than it alters what is ultimately recognized once gaze is centered.
The framework suggests testable links to attention research by treating bottom-up saliency as the entry gate and top-down signals as post-entry refinement.
It implies that computational models of object recognition should first simulate gaze selection before applying recognition algorithms to the selected patch.

Load-bearing premise

That only a tiny fraction of retinal input is recognized and that this constraint has been largely overlooked, making the looking/seeing split the central organizing principle rather than one of many constraints.

What would settle it

An experiment that measures V1 activity while allowing free gaze and shows that the saliency map does not predict the locations of exogenous saccades, or that recognition accuracy remains unchanged when gaze shifts are prevented from centering selected content.

Figures

Figures reproduced from arXiv: 2604.23030 by Li Zhaoping.

**Figure 1.** Figure 1: A new framework to formulate vision as mainly looking and seeing through a bottleneck. view at source ↗

**Figure 2.** Figure 2: Demonstration of looking before (A) and without (B) seeing in visual search. In both view at source ↗

**Figure 3.** Figure 3: V1’s roles in vision and the central-peripheral dichotomy (CPD) in seeing. A: schematic of V1 functions. V1 creates a saliency map of the visual field to guide gaze shift exogenously(reflexively), initiates the bottleneck in information flow, and supplies additional information queried by top-down feedback from downstream stages to support ongoing perceptual processing under this bottleneck. The feedforwar… view at source ↗

**Figure 4.** Figure 4: Generalizing vision as looking and seeing to multisensory sensing as orienting (selec view at source ↗

read the original abstract

Progress in vision research has been slower downstream than upstream of primary visual cortex (V1). Traditional frameworks have largely overlooked a central constraint: only a tiny fraction of retinal input is recognized. Thus, to a first approximation, vision is better formulated as looking and seeing through a bottleneck. Looking, mainly by the peripheral visual field, selects visual information to enter this bottleneck, largely via gaze shifts that center selected contents at fovea. Seeing, mainly by the central visual field, recognizes this content. Converging evidence suggests that V1 initiates the bottleneck and contributes to looking by generating a bottom-up saliency map that guides saccades exogenously, and that top-down feedback along the visual pathway, targeting mainly the representation of the central visual field, refines seeing. Progress will accelerate through falsifiable theories that explicitly link behavior with neural substrates, and by experimental designs that avoid forced fixation and precisely track gaze.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes that vision research has progressed more slowly downstream of V1 because traditional frameworks have largely overlooked the central constraint that only a tiny fraction of retinal input is recognized. It reframes vision to a first approximation as 'looking and seeing through a bottleneck,' with looking (primarily peripheral) selecting information via bottom-up saliency maps in V1 that guide exogenous saccades, and seeing (primarily central) performing recognition refined by top-down feedback along the visual pathway. The paper synthesizes converging evidence for V1's role in initiating the bottleneck and calls for falsifiable theories linking behavior to neural substrates plus experimental designs that avoid forced fixation while tracking gaze.

Significance. If the reframing holds, it could usefully shift emphasis toward active, gaze-contingent paradigms and explicit integration of peripheral selection with central recognition, potentially accelerating downstream progress. The manuscript gives credit to existing work on saliency, saccades, and feedback while highlighting the value of avoiding passive viewing. However, as a conceptual synthesis without new data, quantitative models, or falsification tests, its significance hinges on whether the 'overlooked constraint' premise stimulates targeted research rather than restating known capacity limits in active vision.

major comments (2)

[Abstract and Introduction] Abstract and Introduction: The central motivation—that traditional frameworks have 'largely overlooked' the tiny fraction of retinal input that is recognized, making the looking/seeing split the primary organizing principle—is asserted without specific citations to active-vision or attention models that purportedly ignore foveation, saliency, or capacity limits. This premise is load-bearing for the reframing claim but risks circularity if standard models (e.g., those incorporating exogenous saccades and peripheral selection) already treat it as a core constraint.
[V1 and the bottleneck] Section on V1 and the bottleneck (converging evidence discussion): The claim that V1 initiates the bottleneck via a bottom-up saliency map guiding exogenous saccades is presented as synthesis of existing findings, yet no new quantitative predictions, model comparisons, or falsification criteria are derived. Without explicit tests (e.g., predicted effects of V1 disruption on looking vs. seeing), the proposal remains a restatement rather than an advance that would accelerate progress as asserted.

minor comments (2)

[Abstract] Abstract: The terms 'looking' and 'seeing' are introduced without immediate contrast to their usage in prior eye-movement literature, which could reduce clarity for readers familiar with active-vision terminology.
[Discussion] Discussion: The call for experiments that 'precisely track gaze' would benefit from one or two concrete examples of how such designs would distinguish the proposed bottleneck from existing saliency-based accounts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us clarify the manuscript's contributions. We address each major point below and have made targeted revisions to strengthen the arguments with additional citations and explicit predictions while preserving the conceptual nature of the work.

read point-by-point responses

Referee: [Abstract and Introduction] The central motivation—that traditional frameworks have 'largely overlooked' the tiny fraction of retinal input that is recognized, making the looking/seeing split the primary organizing principle—is asserted without specific citations to active-vision or attention models that purportedly ignore foveation, saliency, or capacity limits. This premise is load-bearing for the reframing claim but risks circularity if standard models (e.g., those incorporating exogenous saccades and peripheral selection) already treat it as a core constraint.

Authors: We agree that explicit citations strengthen the claim and reduce any risk of circularity. In the revised manuscript, we have added references to key active-vision and attention models (e.g., Itti & Koch saliency frameworks, foveated vision models by Geisler, and capacity-limit studies in attention) and clarified how our framing differs: while these models incorporate selection and recognition components, they do not treat the severe bottleneck—only a tiny fraction of input reaching recognition—as the primary organizing principle explaining slower progress downstream of V1. This positions the looking/seeing distinction as a unifying lens rather than restating isolated constraints. revision: yes
Referee: [V1 and the bottleneck] The claim that V1 initiates the bottleneck via a bottom-up saliency map guiding exogenous saccades is presented as synthesis of existing findings, yet no new quantitative predictions, model comparisons, or falsification criteria are derived. Without explicit tests (e.g., predicted effects of V1 disruption on looking vs. seeing), the proposal remains a restatement rather than an advance that would accelerate progress as asserted.

Authors: As a conceptual synthesis without new data or quantitative models, we cannot provide original empirical tests or model fits. However, the revised discussion now explicitly derives falsifiable predictions from the framework, including differential impacts of V1 disruption on exogenous saccade guidance (looking) versus central recognition accuracy (seeing), and the selective role of top-down feedback on central-field representations. These are presented as testable hypotheses to guide future experiments, distinguishing the proposal from prior syntheses by emphasizing the bottleneck as the core constraint that can accelerate downstream research through active, gaze-contingent designs. revision: partial

Circularity Check

0 steps flagged

No significant circularity in conceptual reframing

full rationale

The paper advances a perspective that vision should be formulated as looking (peripheral selection via saliency-guided saccades) and seeing (central recognition) through a bottleneck initiated at V1, motivated by the constraint that only a tiny fraction of retinal input is recognized. This is presented as an organizing principle that traditional frameworks have overlooked. No equations, fitted parameters, or explicit self-citations appear in the text that reduce the central claim to its own inputs by construction. The argument is a synthesis of known visual constraints (foveation, capacity limits, bottom-up saliency) rather than a tautological derivation or renaming that forces the outcome. The derivation chain is self-contained and does not rely on load-bearing steps that collapse into prior results from the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard domain assumptions in visual neuroscience about retinal input volume, V1 saliency computation, and top-down feedback targeting; no free parameters, new entities, or ad-hoc axioms are introduced beyond these.

axioms (2)

domain assumption Only a tiny fraction of retinal input is recognized
Invoked in the opening to justify the bottleneck as the central constraint.
domain assumption V1 generates a bottom-up saliency map that guides exogenous saccades
Stated as converging evidence for V1's role in looking.

pith-pipeline@v0.9.0 · 5442 in / 1373 out tokens · 50663 ms · 2026-05-08T08:39:52.809807+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 1 canonical work pages

[1]

Kuffler SW:Discharge patterns and functional organization of mammalian retina.Journal of neurophysiology1953,16:37–68
[2]

Hubel DH, Wiesel TN:Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.The Journal of physiology1962,160:106–154
[3]

Sziklai G:Some studies in the speed of visual perception.IRE Transactions on Information Theory1956,2:125–8. 7
[4]

Zhaoping L:Understanding vision: theory, models, and data.Oxford University Press 2014,

2014
[5]

•[6] Liang J, Zhaoping L:Trans-saccadic integration for object recognition peters out with pre-saccadic object eccentricity as target-directed saccades become more saliency-driven

Levi D:Crowding—an essential bottleneck for object recognition: a mini-review.Vision Research2008,48:635–654. •[6] Liang J, Zhaoping L:Trans-saccadic integration for object recognition peters out with pre-saccadic object eccentricity as target-directed saccades become more saliency-driven. Vision Research2025,226:number 108500. By measuring locations and ...
[6]

Zhaoping L:A new framework for understanding vision from the perspective of the primary visual cortex.Current Opinion in Neurobiology2019,58:1–10
[7]

Simons D, Chabris C:Gorillas in our midst: sustained inattentional blindness for dy- namic events.Perception1999,28:1059–1074
[8]

Strasburger H, Rentschler I, J ¨uttner M:Peripheral vision and pattern recognition: A re- view.Journal of vision2011,11:Article 13
[9]

MIT Press; 1999

Palmer S:Vision Science: Photons to Phenomenology. MIT Press; 1999

1999
[10]

Freeman, San Francisco; 1982

Marr D:Vision: A computational investigation into the human representation and processing of visual information. Freeman, San Francisco; 1982

1982
[11]

••[13] Zhaoping L:Peripheral vision is mainly for looking rather than seeing.Neuroscience Research2024,201:18–26

Thomas NJ:Are theories of imagery theories of imagination? an active perception ap- proach to conscious mental content.Cognitive science1999,23:207–245. ••[13] Zhaoping L:Peripheral vision is mainly for looking rather than seeing.Neuroscience Research2024,201:18–26. An understanding of a wide array of phenomena in the peripheral visual field, e.g., crowdi...
[12]

Treisman AM, Gelade G:A feature-integration theory of attention.Cognitive Psychology 1980,12:97–136

1980
[13]

Wolfe J, Cave K, Franzel SL:Guided search: an alternative to the feature integration model for visual search.Journal of Experimental Psychology: Human Perception and Performance 1989,15:419–433

1989
[14]

Duncan J, Humphreys G:Visual search and stimulus similarity.Psychological Review 1989,96:433–58. 8

1989
[15]

Zhaoping L:Attention capture by eye of origin singletons even without awareness—a hallmark of a bottom-up saliency map in the primary visual cortex.Journal of Vision2008, 8:article 1
[16]

https://doi.org/10.1177/2041669520938408

Zhaoping L:The flip tilt illusion: Visible in peripheral vision as predicted by the central-peripheral dichotomy.i-Perception2020,11:(4). https://doi.org/10.1177/2041669520938408

work page doi:10.1177/2041669520938408
[17]

May KA, Hess RF:Ladder contours are undetectable in the periphery: A crowding ef- fect?Journal of Vision2007,7:article 9
[18]

Li Z:A saliency map in primary visual cortex.Trends in Cognitive Sciences2002,6:9–16
[19]

Rockland K, Lund J:Intrinsic laminar lattice connections in primate visual cortex.The Journal of Comparative Neurology1983,216:303–18
[20]

Gilbert C, Wiesel T:Clustered intrinsic connections in cat visual cortex.The Journal of Neuroscience1983,3:1116–33
[21]

Allman J, Miezin F, McGuinness E:Stimulus specific responses from beyond the classi- cal receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons.Annual Review of Neuroscience1985,8:407–30
[22]

Knierim J, Van Essen D:Neuronal responses to static texture patterns in area V1 of the alert macaque monkey.Journal of Neurophysiology1992,67:961–80
[23]

Yan Y, Zhaoping L, Li W:Bottom-up saliency and top-down learning in the primary visual cortex of monkeys.Proceedings of the National Academy of Sciences2018,115:10499– 10504
[24]

Westerberg JA, Schall JD, Woodman GF, Maier A:Feedforward attentional selection in sensory cortex.Nature Communications2023,14:article number: 5993
[25]

Nakayama K, Mackeben M:Sustained and transient components of focal visual atten- tion.Vision Research1989,29:631–47
[26]

M ¨uller HJ, Rabbitt PM:Reflexive and voluntary orienting of visual attention: time course of activation and resistance to interruption.Journal of Experimental Psychology: Hu- man Perception and Performance1989,15:315–330
[27]

Bisley J, Goldberg M:Attention, intention, and priority in the parietal lobe.Annual Review of Neuroscience2010,33:1–21
[28]

Zhou H, Desimone R:Feature-based attention in the frontal eye field and area V4 during visual search.Neuron2011,70:1205–1217
[29]

Klink PC, Teeuwen RR, Lorteije JA, Roelfsema PR:Inversion of pop-out for a distracting feature dimension in monkey visual cortex.Proceedings of the National Academy of Sciences 2023,120:e2210839120

2023
[30]

Sims SA, Demirayak P , Cedotal S, Visscher KM:Frontal cortical regions associated with attention connect more strongly to central than peripheral V1.NeuroImage2021, 238:article:118246. 9 ••[33] Morales-Gregorio A, Kurth AC, Ito J, Kleinjohann A, Barth ´elemy FV , Brochier T, Gr¨un S, van Albada SJ:Neural manifolds in V1 change with top-down signals from V...
[31]

•[35] Moore CM, Zheng Q, Semizer Y:Perceptual organization is limited in peripheral vision: Evidence from configural superiority.Journal of Vision2025,25:article number: 16

Majka L P Zhaoping, Rosa M:A central-field focus in ventral-stream feedback to v1 in primates: theoretical prediction confirmed.Oral presentation, Vision Sciences Society Annual Meeting, May 20262026, . •[35] Moore CM, Zheng Q, Semizer Y:Perceptual organization is limited in peripheral vision: Evidence from configural superiority.Journal of Vision2025,25:...
[32]

Zhaoping L:Exploring the flip tilt illusion in central vision by impairing the top-down feedback via backward masking.Journal of Vision2025,23:5755
[33]

Egly R, Driver J, Rafal RD:Shifting visual attention between objects and locations: ev- idence from normal and parietal lesion subjects.Journal of Experimental Psychology: General 1994,123:161–177

1994
[34]

Chen Z:Object-based attention: A tutorial review.Attention, Perception, & Psychophysics 2012,74:784–802

2012
[35]

Annual review of vision science2018,4:215–237

Wurtz RH:Corollary discharge contributions to perceptual continuity across saccades. Annual review of vision science2018,4:215–237
[36]

Irwin DE:Information integration across saccadic eye movements.Cognitive psychology 1991,23:420–456

1991
[37]

Stewart EE, Sch ¨utz AC:Optimal trans-saccadic integration relies on visual working memory.Vision research2018,153:70–81
[38]

Williams MA, Baker CI, De Beeck HPO, Shim WM, Dang S, Triantafyllou C, Kanwisher N:Feedback of visual object information to foveal retinotopic cortex.Nature neuroscience 2008,11:1439–1445. 10

2008
[39]

Fan X, Wang L, Shao H, Kersten D, He S:Temporally flexible feedback signal to foveal cortex for peripheral object recognition.Proceedings of the National Academy of Sciences2016, 113:11627–11632
[40]

Knapen T, Swisher JD, Tong F, Cavanagh P:Oculomotor remapping of visual informa- tion to foveal retinotopic cortex.Frontiers in systems neuroscience2016,10:54
[41]

New York: Optical Society of America, Dover Press; 1925

Helmholtz Hv:Physiological Optics (translated by J P C Southall). New York: Optical Society of America, Dover Press; 1925

1925
[42]

Dayan P , Hinton G, Neal R, Zemel R:The Helmholtz machine.Neural Computation1995, 7:889–904
[43]

Cell Reports2024,43:113820

Wang X, Zhang C, Yang L, Jin M, Goldberg ME, Zhang M, Qian N:Perisaccadic and attentional remapping of receptive fields in lateral intraparietal area and frontal eye fields. Cell Reports2024,43:113820
[44]

Cavanagh P , Melcher D:Steerable autoencoders underlying remapping, spatiotopy, and visual stability.PsyArXiv,2026, osfio/preprints/psyarxiv/5cku8 v22025,

2026
[45]

•[51] Witten JL, Lukyanova V , Harmening WM:Sub-cone visual resolution by active, adaptive sampling in the human foveola.Elife2024,13:RP98648

Intoy J, Rucci M:Finely tuned eye movements enhance visual acuity.Nature communi- cations2020,11:795. •[51] Witten JL, Lukyanova V , Harmening WM:Sub-cone visual resolution by active, adaptive sampling in the human foveola.Elife2024,13:RP98648. With in vivo foveal cone-resolved imaging and simultaneous microscopic photo- stimulation, this paper showed tha...
[46]

Luck SJ, Vogel EK:Visual working memory capacity: from psychophysics and neurobi- ology to individual differences.Trends in cognitive sciences2013,17:391–400
[47]

Van der Stigchel S, Hollingworth A:Visuospatial working memory as a fundamental component of the eye movement system.Current Directions in Psychological Science2018, 27:136–143
[48]

Zhaoping L:Peripheral and central sensation: multisensory orienting and recognition across species.Trends in Cognitive Sciences2023,27:539–552
[49]

Brecht M, Preilowski B, Merzenich MM:Functional architecture of the mystacial vibris- sae.Behavioural brain research1997,84:81–97
[50]

Diamond ME, Von Heimendahl M, Knutsen PM, Kleinfeld D, Ahissar E: ’where’and’what’in the whisker sensorimotor system.Nature Reviews Neuroscience2008, 9:601–612
[51]

Goodale M, Milner A:Separate visual pathways for perception and action.Trends in Neurosciences1992,15:20–25
[52]

Land MF, Hayhoe M:In what ways do eye movements contribute to everyday activities? Vision research2001,41:3559–3565. 11
[53]

Sereno MI, Huang RS:Multisensory maps in parietal cortex.Current opinion in neurobiol- ogy2014,24:39–46
[54]

Vater C, Wolfe B, Rosenholtz R:Peripheral vision in real-world tasks: A systematic re- view.Psychonomic bulletin & review2022,29:1531–1557. ••[61] Yates JL, Coop SH, Sarch GH, Wu RJ, Butts DA, Rucci M, Mitchell JF:Detailed character- ization of neural selectivity in free viewing primates.Nature Communications2023,14:article number 3656. Using high-resolut...