Policy-based Foveated Imaging and Perception

Boyang Deng; Gordon Wetzstein; Howard Xiao; Jan Ackermann

arxiv: 2606.02565 · v1 · pith:VOTK7ZKOnew · submitted 2026-06-01 · 💻 cs.CV

Policy-based Foveated Imaging and Perception

Howard Xiao , Jan Ackermann , Boyang Deng , Gordon Wetzstein This is my paper

Pith reviewed 2026-06-28 15:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords foveated imagingpolicy learningdual-stream sensortask-aware acquisitionbandwidth efficiencyhigh-resolution perceptionreal-time imaging

0 comments

The pith

A learned policy uses prior low-resolution frames to direct a dual-stream sensor to capture high-resolution pixels only in task-relevant regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that foveated imaging at acquisition time can be solved by casting region selection as a policy-learning problem that closes the loop between perception and sensor control. A sympathetic reader would care because ultra-high-resolution sensors generate data volumes that exceed practical bandwidth and latency limits, and standard downsampling discards information before its task value can be assessed. The method maintains a low-resolution global view while allocating the remaining pixel budget dynamically, and it reports higher task accuracy than baselines at identical total pixel counts. Validation includes both simulation across perception tasks and real capture on a 200-megapixel dual-stream sensor under realistic constraints.

Core claim

The paper claims that foveated acquisition can be formulated as a sensor attention policy-learning problem in which past low-resolution observations guide actions that select high-resolution regions for the next measurement; when this policy is learned and executed on dual-stream hardware, the resulting system achieves high task performance under strict pixel budgets, significantly outperforms relevant baselines at the same bandwidth, and operates in real time on a 200-megapixel sensor capturing real-world video.

What carries the argument

The sensor attention policy, a learned mapping from previous low-resolution frames to high-resolution region-of-interest selections that determines the next acquisition.

If this is right

Task performance remains high even when the total number of pixels acquired per frame is severely restricted.
The same policy-driven allocation outperforms standard spatial or temporal downsampling baselines across multiple perception tasks.
The approach runs on existing 200-megapixel dual-stream hardware while respecting realistic bandwidth and latency limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same policy formulation could be adapted to other selective-readout sensor designs if they permit frame-by-frame region specification.
Joint training of the policy with the downstream task network might further reduce the pixel budget needed for a given accuracy level.
Power and heat savings would follow for mobile or embedded perception systems that avoid processing irrelevant high-resolution pixels.

Load-bearing premise

The dual-stream sensor hardware can be controlled at acquisition time to read arbitrary high-resolution regions based on a policy output computed from the previous low-resolution frame, with negligible added latency.

What would settle it

Running the full system on the 200-megapixel dual-stream sensor and finding that task accuracy under the learned policy is no higher than under uniform or fixed-pattern sampling at the same total pixel count, or that the added control latency exceeds real-time requirements, would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2606.02565 by Boyang Deng, Gordon Wetzstein, Howard Xiao, Jan Ackermann.

**Figure 1.** Figure 1: Emerging sensors, such as our prototype based on Samsung’s ISOCELL platform (left), provide hundreds of megapixels of resolution, which then fuel [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: illustrates this issue for three different downstream applications. Once lost during acquisition, this information cannot be recovered by subsequent processing, often resulting in degraded performance on detail-critical tasks. Emerging dual-stream sensors with hundreds of millions of pixels support multiple streams of video data to be read out simultaneously, including low-resolution full-field-of-view fr… view at source ↗

**Figure 3.** Figure 3: Policy-based foveated perception pipeline. A captured low-resolution frame provides the full-field-of-view global context and is processed to determine salient candidate regions (left). Past observations then guide a per-candidate motion predictor (center left). Our sensor attention policy selects the ROI, which is then read out at full sensor resolution. Both low-resolution context frame and high-resoluti… view at source ↗

**Figure 4.** Figure 4: Policy-based foveated imaging and perception for simulated video tasks. Row (a): Our foveated imaging approach correctly allocates higher resolution for pursuing objects of interest in an object tracking task. ROIs from our foveated imaging framework provide fine spatial details required to distinguish similar objects and provide fine temporal details for motion continuity, significantly improving downstre… view at source ↗

**Figure 5.** Figure 5: Overview of key components of our foveated imaging framework. Left: At frame 𝑘 − 1, multiple task-relevant object locations are proposed by the saliency detector using only low-resolution context. Middle: We associate each object detection over the past 𝑇𝑜 frames and predict object motion for the future 𝑇𝑝 frames using a constant-velocity Kalman Filter. Right: Our scanpath selection policy uses high-resolu… view at source ↗

**Figure 6.** Figure 6: Policy-based foveated imaging in real-world captures. Under realistic bandwidth and acquisition latency constraints, our proposed method runs in real time on our 200 MP-resolution foveated imaging prototype. We demonstrate expected smooth-pursuit scanpaths for (a) object tracking and saccading scanpaths for (b) scene text recognition across diverse scenes and lighting conditions. This predictive loop ensur… view at source ↗

**Figure 7.** Figure 7: Additional qualitative results for simulated video tasks. We show additional examples of our foveated imaging framework across diverse scenes for object tracking, scene text recognition, and robotic manipulation, demonstrating consistent performance improvements over task-agnostic baselines [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Ultra-high-resolution image sensors offer the potential to capture fine spatial details critical for many visual perception tasks, but acquiring and processing all pixels at full resolution is often infeasible under realistic bandwidth, latency, and power constraints. Existing approaches address this challenge through acquisition strategies such as spatial or temporal downsampling, which irrevocably discard information before task relevance can be assessed. In this work, we introduce a real-time, predictive, and task-aware foveated imaging system that operates directly at image acquisition time. Leveraging emerging dual-stream sensor architectures, our method dynamically allocates limited pixel bandwidth to task-relevant regions of interest while maintaining a low-resolution global context. We formulate foveated acquisition as a sensor attention policy-learning problem, in which past observations guide actions that determine future measurements, closing the perception-acquisition loop. Through extensive simulation across multiple perception tasks, we demonstrate that our approach achieves high task performance under strict pixel budgets and significantly outperforms relevant baselines operating at the same bandwidth. We further validate our system on a 200-megapixel dual-stream sensor, capturing real-world videos under realistic bandwidth and latency constraints, demonstrating the practical feasibility of task-driven, acquisition-time foveated imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames foveated acquisition as a closed-loop policy-learning problem on dual-stream sensors and shows simulation gains plus one hardware run, but the real-time control claim lacks latency data or interface details.

read the letter

The new piece here is treating sensor attention as a learned policy that picks high-res regions from the prior low-res frame, closing the loop at acquisition time rather than post-capture. That formulation plus the 200 MP dual-stream demo is the concrete step beyond earlier static or open-loop foveation work.

Simulations across tasks look useful: the method keeps task accuracy while staying inside tight pixel budgets and beats the listed baselines. The hardware run on real video under bandwidth limits is the part that matters for embedded use.

The soft spot is exactly the one the stress-test flags. The hardware claim needs the policy output to trigger the next high-res readout with negligible added latency, yet the text gives no end-to-end timing numbers, no sensor API description, and no frame-rate budget comparison. Without those, the practical feasibility stays unverified. Training details and exact policy architecture are also missing from what is visible, so the performance edge cannot be reproduced or stress-tested yet.

This is for groups building high-resolution perception on power- or bandwidth-limited platforms. A reader who needs a working policy-learning baseline for foveated sensors would get value from the simulations and the sensor choice. The work is coherent on its own terms and shows honest engagement with the constraints, so it deserves a serious referee to check the missing measurements and code.

Referee Report

1 major / 1 minor

Summary. The paper introduces a real-time policy-based foveated imaging system for dual-stream sensors that learns a sensor attention policy to allocate limited high-resolution pixels to task-relevant regions based on prior low-resolution observations. It reports that simulations across multiple perception tasks achieve high performance under strict pixel budgets and outperform relevant baselines at equivalent bandwidth, and further claims validation via real-world video capture on a 200-megapixel dual-stream sensor under realistic constraints.

Significance. If the closed-loop hardware control and simulation results hold with full details, the work would demonstrate a practical advance in task-aware acquisition-time foveation for bandwidth-constrained perception, with potential impact on real-time vision systems. The multi-task simulation scope and attempt at hardware validation are noted strengths.

major comments (1)

[Hardware validation] Hardware validation section: the claim of capturing real-world videos on the 200-megapixel dual-stream sensor under realistic bandwidth and latency constraints rests on an unverified assumption of real-time closed-loop control, but the manuscript provides no quantitative end-to-end latency measurements, no description of the sensor control API or timing guarantees, and no comparison to frame-rate budgets.

minor comments (1)

[Simulation results] Simulation results section: training details for the policy, exact policy architecture, and precise baseline implementations should be provided to allow assessment of the reported performance gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments. We address the single major comment below.

read point-by-point responses

Referee: [Hardware validation] Hardware validation section: the claim of capturing real-world videos on the 200-megapixel dual-stream sensor under realistic bandwidth and latency constraints rests on an unverified assumption of real-time closed-loop control, but the manuscript provides no quantitative end-to-end latency measurements, no description of the sensor control API or timing guarantees, and no comparison to frame-rate budgets.

Authors: We agree that the hardware validation section would be strengthened by explicit quantitative support for the real-time claims. In the revised manuscript we will add measured end-to-end latency values from the 200 MP dual-stream sensor experiments, a description of the sensor control API and timing guarantees used, and a direct comparison of achieved latency against the frame-rate budgets required by the target perception tasks. These additions will substantiate the closed-loop feasibility under realistic constraints. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical policy learning validated externally

full rationale

The paper formulates foveated acquisition as a policy-learning problem and reports performance via simulation across perception tasks plus hardware validation on a 200 MP dual-stream sensor. No derivation, equation, or prediction reduces to its own fitted inputs by construction; no self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. All claims rest on independent task metrics and external hardware benchmarks, making the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The central claim implicitly rests on the existence and controllability of dual-stream sensor hardware and on the transferability of simulation-trained policies to real sensors.

pith-pipeline@v0.9.1-grok · 5736 in / 1138 out tokens · 17537 ms · 2026-06-28T15:21:34.402289+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

180 extracted references · 1 canonical work pages

[1]

Proceedings of the Int’l Image Sensor Workshop (IISW), Crieff, UK , pages=

World smallest 200Mp CMOS image sensor with 0.56 m pixel equipped with novel deep trench isolation structure for better sensitivity and higher CG , author=. Proceedings of the Int’l Image Sensor Workshop (IISW), Crieff, UK , pages=
[2]

Canon develops CMOS sensor with 410 megapixels, the largest number of pixels ever achieved in a 35 mm full-frame sensor , howpublished =
[3]

, title =

Samsung Electronics Co., Ltd. , title =. 2025 , url =

2025
[4]

2025 , url =

Xiaomi Corporation , title =. 2025 , url =

2025
[5]

, title =

vivo Communications Technology Co., Ltd. , title =. 2025 , url =

2025
[6]

ISOCELL HP2 | Mobile Image Sensor , year =
[7]

ISOCELL Zoom Anyplace , year =
[8]

Proceedings of the 1990 symposium on interactive 3d graphics , pages=

Gaze-directed volume rendering , author=. Proceedings of the 1990 symposium on interactive 3d graphics , pages=

1990
[9]

Human vision and electronic imaging III , volume=

Real-time foveated multiresolution system for low-bandwidth video communication , author=. Human vision and electronic imaging III , volume=. 1998 , organization=

1998
[10]

Vision research , volume=

Chart demonstrating variations in acuity with retinal position , author=. Vision research , volume=. 1974 , publisher=

1974
[11]

Experimental brain research , volume=

Human express saccades: extremely short reaction times of goal directed eye movements , author=. Experimental brain research , volume=. 1984 , publisher=

1984
[12]

Progress in brain research , volume=

Neural control of saccades , author=. Progress in brain research , volume=. 2002 , publisher=

2002
[13]

IEEE Transactions on Multimedia , volume=

A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction , author=. IEEE Transactions on Multimedia , volume=. 2019 , publisher=

2019
[14]

arXiv preprint arXiv:2105.14173 , year=

Foveater: Foveated transformer for image classification , author=. arXiv preprint arXiv:2105.14173 , year=

arXiv
[15]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

LF-ViT: Reducing spatial redundancy in vision transformer for efficient image recognition , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[16]

ACM transactions on Graphics (tOG) , volume=

Foveated 3D graphics , author=. ACM transactions on Graphics (tOG) , volume=. 2012 , publisher=

2012
[17]

ACM Transactions On Graphics (TOG) , volume=

Towards foveated rendering for gaze-tracked virtual reality , author=. ACM Transactions On Graphics (TOG) , volume=. 2016 , publisher=

2016
[18]

arXiv preprint arXiv:2402.18577 , year=

Motion Guided Token Compression for Efficient Masked Video Modeling , author=. arXiv preprint arXiv:2402.18577 , year=

arXiv
[19]

ACM Transactions on Graphics (TOG) , volume=

DeepFovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos , author=. ACM Transactions on Graphics (TOG) , volume=. 2019 , publisher=

2019
[20]

Advances in neural information processing systems , volume=

Recurrent models of visual attention , author=. Advances in neural information processing systems , volume=
[21]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Recurrent attention models for depth-based person identification , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[22]

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Adafocus v2: End-to-end training of spatial dynamic networks for video recognition , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2022 , organization=

2022
[23]

2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers , pages=

A 128 x 128 120db 30mw asynchronous vision sensor that responds to relative intensity change , author=. 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers , pages=. 2006 , organization=

2006
[24]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Generalized event cameras , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[25]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

A camera that CNNs: Towards embedded neural networks on pixel processor arrays , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[26]

IEEE transactions on pattern analysis and machine intelligence , volume=

Neural sensors: Learning pixel exposures for HDR imaging and video compressive sensing with programmable sensors , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

2020
[27]

CVPR , month=

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors , author=. CVPR , month=. 2024 , pages=

2024
[28]

Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=

Large-scale video classification with convolutional neural networks , author=. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=
[29]

2020 International SAUPEC/RobMech/PRASA Conference , pages=

Human eye inspired log-polar pre-processing for neural networks , author=. 2020 International SAUPEC/RobMech/PRASA Conference , pages=. 2020 , organization=

2020
[30]

Robot Vision , pages=

Towards real time data reduction and feature abstraction for robotics vision , author=. Robot Vision , pages=. 2010 , publisher=

2010
[31]

IEEE Transactions on pattern analysis and machine intelligence , volume=

A model of saliency-based visual attention for rapid scene analysis , author=. IEEE Transactions on pattern analysis and machine intelligence , volume=. 2002 , publisher=

2002
[32]

Computational visual media , volume=

Foveated rendering: A state-of-the-art survey , author=. Computational visual media , volume=. 2023 , publisher=

2023
[33]

arXiv preprint arXiv:1610.01563 , year=

DeepGaze II: Reading fixations from deep features trained on object recognition , author=. arXiv preprint arXiv:1610.01563 , year=

Pith/arXiv arXiv
[34]

Journal of Vision , volume=

DeepGaze III: Modeling free-viewing human scanpaths with deep learning , author=. Journal of Vision , volume=. 2022 , publisher=

2022
[35]

2002 , publisher=

Level of detail for 3D graphics , author=. 2002 , publisher=

2002
[36]

Computer graphics forum , volume=

Adaptive image-space sampling for gaze-contingent real-time rendering , author=. Computer graphics forum , volume=. 2016 , organization=

2016
[37]

Light Transport Entertainment Research , volume=

Foveated real-time ray tracing for virtual reality headset , author=. Light Transport Entertainment Research , volume=
[38]

2020 , publisher=

Foveated path tracing with fast reconstruction and efficient sample distribution , author=. 2020 , publisher=

2020
[39]

, author=

Voronoi-Based Foveated Volume Rendering. , author=. EuroVis (Short Papers) , pages=
[40]

International Journal of Computer Vision , volume=

Top-down neural attention by excitation backprop , author=. International Journal of Computer Vision , volume=. 2018 , publisher=

2018
[41]

Proceedings of the European conference on computer vision (ECCV) , pages=

Learning to zoom: a saliency-based sampling layer for neural networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=
[42]

arXiv 2014 , author=

Multiple object recognition with visual attention. arXiv 2014 , author=. arXiv preprint arXiv:1412.7755 , year=

Pith/arXiv arXiv 2014
[43]

Advances in neural information processing systems , volume=

Spatial transformer networks , author=. Advances in neural information processing systems , volume=
[44]

arXiv preprint arXiv:1709.01889 , year=

Polar transformer networks , author=. arXiv preprint arXiv:1709.01889 , year=

Pith/arXiv arXiv
[45]

PLoS computational biology , volume=

Object detection through search with a foveated visual system , author=. PLoS computational biology , volume=. 2017 , publisher=

2017
[46]

arXiv preprint arXiv:2312.01450 , year=

Foveation in the era of deep learning , author=. arXiv preprint arXiv:2312.01450 , year=

arXiv
[47]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Dynamic neural networks: A survey , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2022 , publisher=

2022
[48]

Proceedings of the European Conference on Computer Vision (ECCV) , pages=

Distractor-aware siamese networks for visual object tracking , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=
[49]

arXiv preprint arXiv:2603.23491 , year=

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation , author=. arXiv preprint arXiv:2603.23491 , year=

arXiv
[50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=
[51]

2025 , school=

Image Classification with Foveated Neural Networks , author=. 2025 , school=

2025
[52]

proceedings of the IEEE/CVF international conference on computer vision , pages=

Adaptive focus for efficient video recognition , author=. proceedings of the IEEE/CVF international conference on computer vision , pages=
[53]

European Conference on Computer Vision , pages=

Adafocusv3: On unified spatial-temporal dynamic video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022
[54]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Uni-adafocus: spatial-temporal dynamic computation for video recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
[55]

The Journal of physiology , volume=

The representation of the visual field on the cerebral cortex in monkeys , author=. The Journal of physiology , volume=
[56]

Biological cybernetics , volume=

Spatial mapping in the primate sensory projection: analytic structure and relevance to perception , author=. Biological cybernetics , volume=. 1977 , publisher=

1977
[57]

IEEE Transactions on Systems, Man, and Cybernetics , number=

Anatomical and physiological correlates of visual computation from striate to infero-temporal cortex , author=. IEEE Transactions on Systems, Man, and Cybernetics , number=. 1984 , publisher=

1984
[58]

, author=

A New Foveal Cartesian Geometry Approach used for Object Tracking. , author=. SPPRA , volume=
[59]

Frontiers in Computational Neuroscience , volume=

Biologically inspired deep learning model for efficient foveal-peripheral vision , author=. Frontiers in Computational Neuroscience , volume=. 2021 , publisher=

2021
[60]

2001 , publisher=

Rate-scalable foveated image and video communications , author=. 2001 , publisher=

2001
[61]

2002 , publisher=

DCT domain video foveation and transcoding for heterogeneous video communication , author=. 2002 , publisher=

2002
[62]

2002 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume=

Foveated multipoint videoconferencing at low bit rates , author=. 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume=. 2002 , organization=

2002
[63]

Real-Time Imaging , volume=

Real-time foveation techniques for low bit rate video coding , author=. Real-Time Imaging , volume=. 2003 , publisher=

2003
[64]

2025 , school=

Image classification with foveated neural networks , author=. 2025 , school=

2025
[65]

Computaci

Towards an active foveated approach to computer vision , author=. Computaci. 2022 , publisher=

2022
[66]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Multi-agent reinforcement learning based frame sampling for effective untrimmed video recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[67]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

A dynamic frame selection framework for fast video recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2020 , publisher=

2020
[68]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Scsampler: Sampling salient clips from video for efficient action recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[69]

European conference on computer vision , pages=

Ar-net: Adaptive frame resolution for efficient action recognition , author=. European conference on computer vision , pages=. 2020 , organization=

2020
[70]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

End-to-end learning of action detection from frame glimpses in videos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[71]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Frameexit: Conditional early exiting for efficient video recognition , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[72]

European Conference on Computer Vision , pages=

Nsnet: Non-saliency suppression sampler for efficient video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022
[73]

European Conference on Computer Vision , pages=

Temporal saliency query network for efficient video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022
[74]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Dynamic network quantization for efficient video inference , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[75]

The International Journal of Robotics Research , pages=

Diffusion policy: Visuomotor policy learning via action diffusion , author=. The International Journal of Robotics Research , pages=. 2023 , publisher=

2023
[76]

Vision research , volume=

Probability summation and regional variation in contrast sensitivity across the visual field , author=. Vision research , volume=. 1981 , publisher=

1981
[77]

Vision research , volume=

The contrast sensitivity gradient across the human visual field: With emphasis on the low spatial frequency range , author=. Vision research , volume=. 1989 , publisher=

1989
[78]

2013 , publisher=

Color appearance models , author=. 2013 , publisher=

2013
[79]

Journal of comparative neurology , volume=

Human photoreceptor topography , author=. Journal of comparative neurology , volume=. 1990 , publisher=

1990
[80]

Journal of comparative Neurology , volume=

Topography of ganglion cells in human retina , author=. Journal of comparative Neurology , volume=. 1990 , publisher=

1990

Showing first 80 references.

[1] [1]

Proceedings of the Int’l Image Sensor Workshop (IISW), Crieff, UK , pages=

World smallest 200Mp CMOS image sensor with 0.56 m pixel equipped with novel deep trench isolation structure for better sensitivity and higher CG , author=. Proceedings of the Int’l Image Sensor Workshop (IISW), Crieff, UK , pages=

[2] [2]

Canon develops CMOS sensor with 410 megapixels, the largest number of pixels ever achieved in a 35 mm full-frame sensor , howpublished =

[3] [3]

, title =

Samsung Electronics Co., Ltd. , title =. 2025 , url =

2025

[4] [4]

2025 , url =

Xiaomi Corporation , title =. 2025 , url =

2025

[5] [5]

, title =

vivo Communications Technology Co., Ltd. , title =. 2025 , url =

2025

[6] [6]

ISOCELL HP2 | Mobile Image Sensor , year =

[7] [7]

ISOCELL Zoom Anyplace , year =

[8] [8]

Proceedings of the 1990 symposium on interactive 3d graphics , pages=

Gaze-directed volume rendering , author=. Proceedings of the 1990 symposium on interactive 3d graphics , pages=

1990

[9] [9]

Human vision and electronic imaging III , volume=

Real-time foveated multiresolution system for low-bandwidth video communication , author=. Human vision and electronic imaging III , volume=. 1998 , organization=

1998

[10] [10]

Vision research , volume=

Chart demonstrating variations in acuity with retinal position , author=. Vision research , volume=. 1974 , publisher=

1974

[11] [11]

Experimental brain research , volume=

Human express saccades: extremely short reaction times of goal directed eye movements , author=. Experimental brain research , volume=. 1984 , publisher=

1984

[12] [12]

Progress in brain research , volume=

Neural control of saccades , author=. Progress in brain research , volume=. 2002 , publisher=

2002

[13] [13]

IEEE Transactions on Multimedia , volume=

A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction , author=. IEEE Transactions on Multimedia , volume=. 2019 , publisher=

2019

[14] [14]

arXiv preprint arXiv:2105.14173 , year=

Foveater: Foveated transformer for image classification , author=. arXiv preprint arXiv:2105.14173 , year=

arXiv

[15] [15]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

LF-ViT: Reducing spatial redundancy in vision transformer for efficient image recognition , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[16] [16]

ACM transactions on Graphics (tOG) , volume=

Foveated 3D graphics , author=. ACM transactions on Graphics (tOG) , volume=. 2012 , publisher=

2012

[17] [17]

ACM Transactions On Graphics (TOG) , volume=

Towards foveated rendering for gaze-tracked virtual reality , author=. ACM Transactions On Graphics (TOG) , volume=. 2016 , publisher=

2016

[18] [18]

arXiv preprint arXiv:2402.18577 , year=

Motion Guided Token Compression for Efficient Masked Video Modeling , author=. arXiv preprint arXiv:2402.18577 , year=

arXiv

[19] [19]

ACM Transactions on Graphics (TOG) , volume=

DeepFovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos , author=. ACM Transactions on Graphics (TOG) , volume=. 2019 , publisher=

2019

[20] [20]

Advances in neural information processing systems , volume=

Recurrent models of visual attention , author=. Advances in neural information processing systems , volume=

[21] [21]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Recurrent attention models for depth-based person identification , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[22] [22]

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Adafocus v2: End-to-end training of spatial dynamic networks for video recognition , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2022 , organization=

2022

[23] [23]

2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers , pages=

A 128 x 128 120db 30mw asynchronous vision sensor that responds to relative intensity change , author=. 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers , pages=. 2006 , organization=

2006

[24] [24]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Generalized event cameras , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[25] [25]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

A camera that CNNs: Towards embedded neural networks on pixel processor arrays , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[26] [26]

IEEE transactions on pattern analysis and machine intelligence , volume=

Neural sensors: Learning pixel exposures for HDR imaging and video compressive sensing with programmable sensors , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

2020

[27] [27]

CVPR , month=

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors , author=. CVPR , month=. 2024 , pages=

2024

[28] [28]

Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=

Large-scale video classification with convolutional neural networks , author=. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=

[29] [29]

2020 International SAUPEC/RobMech/PRASA Conference , pages=

Human eye inspired log-polar pre-processing for neural networks , author=. 2020 International SAUPEC/RobMech/PRASA Conference , pages=. 2020 , organization=

2020

[30] [30]

Robot Vision , pages=

Towards real time data reduction and feature abstraction for robotics vision , author=. Robot Vision , pages=. 2010 , publisher=

2010

[31] [31]

IEEE Transactions on pattern analysis and machine intelligence , volume=

A model of saliency-based visual attention for rapid scene analysis , author=. IEEE Transactions on pattern analysis and machine intelligence , volume=. 2002 , publisher=

2002

[32] [32]

Computational visual media , volume=

Foveated rendering: A state-of-the-art survey , author=. Computational visual media , volume=. 2023 , publisher=

2023

[33] [33]

arXiv preprint arXiv:1610.01563 , year=

DeepGaze II: Reading fixations from deep features trained on object recognition , author=. arXiv preprint arXiv:1610.01563 , year=

Pith/arXiv arXiv

[34] [34]

Journal of Vision , volume=

DeepGaze III: Modeling free-viewing human scanpaths with deep learning , author=. Journal of Vision , volume=. 2022 , publisher=

2022

[35] [35]

2002 , publisher=

Level of detail for 3D graphics , author=. 2002 , publisher=

2002

[36] [36]

Computer graphics forum , volume=

Adaptive image-space sampling for gaze-contingent real-time rendering , author=. Computer graphics forum , volume=. 2016 , organization=

2016

[37] [37]

Light Transport Entertainment Research , volume=

Foveated real-time ray tracing for virtual reality headset , author=. Light Transport Entertainment Research , volume=

[38] [38]

2020 , publisher=

Foveated path tracing with fast reconstruction and efficient sample distribution , author=. 2020 , publisher=

2020

[39] [39]

, author=

Voronoi-Based Foveated Volume Rendering. , author=. EuroVis (Short Papers) , pages=

[40] [40]

International Journal of Computer Vision , volume=

Top-down neural attention by excitation backprop , author=. International Journal of Computer Vision , volume=. 2018 , publisher=

2018

[41] [41]

Proceedings of the European conference on computer vision (ECCV) , pages=

Learning to zoom: a saliency-based sampling layer for neural networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

[42] [42]

arXiv 2014 , author=

Multiple object recognition with visual attention. arXiv 2014 , author=. arXiv preprint arXiv:1412.7755 , year=

Pith/arXiv arXiv 2014

[43] [43]

Advances in neural information processing systems , volume=

Spatial transformer networks , author=. Advances in neural information processing systems , volume=

[44] [44]

arXiv preprint arXiv:1709.01889 , year=

Polar transformer networks , author=. arXiv preprint arXiv:1709.01889 , year=

Pith/arXiv arXiv

[45] [45]

PLoS computational biology , volume=

Object detection through search with a foveated visual system , author=. PLoS computational biology , volume=. 2017 , publisher=

2017

[46] [46]

arXiv preprint arXiv:2312.01450 , year=

Foveation in the era of deep learning , author=. arXiv preprint arXiv:2312.01450 , year=

arXiv

[47] [47]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Dynamic neural networks: A survey , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2022 , publisher=

2022

[48] [48]

Proceedings of the European Conference on Computer Vision (ECCV) , pages=

Distractor-aware siamese networks for visual object tracking , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=

[49] [49]

arXiv preprint arXiv:2603.23491 , year=

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation , author=. arXiv preprint arXiv:2603.23491 , year=

arXiv

[50] [50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

[51] [51]

2025 , school=

Image Classification with Foveated Neural Networks , author=. 2025 , school=

2025

[52] [52]

proceedings of the IEEE/CVF international conference on computer vision , pages=

Adaptive focus for efficient video recognition , author=. proceedings of the IEEE/CVF international conference on computer vision , pages=

[53] [53]

European Conference on Computer Vision , pages=

Adafocusv3: On unified spatial-temporal dynamic video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022

[54] [54]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Uni-adafocus: spatial-temporal dynamic computation for video recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

[55] [55]

The Journal of physiology , volume=

The representation of the visual field on the cerebral cortex in monkeys , author=. The Journal of physiology , volume=

[56] [56]

Biological cybernetics , volume=

Spatial mapping in the primate sensory projection: analytic structure and relevance to perception , author=. Biological cybernetics , volume=. 1977 , publisher=

1977

[57] [57]

IEEE Transactions on Systems, Man, and Cybernetics , number=

Anatomical and physiological correlates of visual computation from striate to infero-temporal cortex , author=. IEEE Transactions on Systems, Man, and Cybernetics , number=. 1984 , publisher=

1984

[58] [58]

, author=

A New Foveal Cartesian Geometry Approach used for Object Tracking. , author=. SPPRA , volume=

[59] [59]

Frontiers in Computational Neuroscience , volume=

Biologically inspired deep learning model for efficient foveal-peripheral vision , author=. Frontiers in Computational Neuroscience , volume=. 2021 , publisher=

2021

[60] [60]

2001 , publisher=

Rate-scalable foveated image and video communications , author=. 2001 , publisher=

2001

[61] [61]

2002 , publisher=

DCT domain video foveation and transcoding for heterogeneous video communication , author=. 2002 , publisher=

2002

[62] [62]

2002 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume=

Foveated multipoint videoconferencing at low bit rates , author=. 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume=. 2002 , organization=

2002

[63] [63]

Real-Time Imaging , volume=

Real-time foveation techniques for low bit rate video coding , author=. Real-Time Imaging , volume=. 2003 , publisher=

2003

[64] [64]

2025 , school=

Image classification with foveated neural networks , author=. 2025 , school=

2025

[65] [65]

Computaci

Towards an active foveated approach to computer vision , author=. Computaci. 2022 , publisher=

2022

[66] [66]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Multi-agent reinforcement learning based frame sampling for effective untrimmed video recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[67] [67]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

A dynamic frame selection framework for fast video recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2020 , publisher=

2020

[68] [68]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Scsampler: Sampling salient clips from video for efficient action recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[69] [69]

European conference on computer vision , pages=

Ar-net: Adaptive frame resolution for efficient action recognition , author=. European conference on computer vision , pages=. 2020 , organization=

2020

[70] [70]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

End-to-end learning of action detection from frame glimpses in videos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[71] [71]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Frameexit: Conditional early exiting for efficient video recognition , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[72] [72]

European Conference on Computer Vision , pages=

Nsnet: Non-saliency suppression sampler for efficient video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022

[73] [73]

European Conference on Computer Vision , pages=

Temporal saliency query network for efficient video recognition , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022

[74] [74]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Dynamic network quantization for efficient video inference , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[75] [75]

The International Journal of Robotics Research , pages=

Diffusion policy: Visuomotor policy learning via action diffusion , author=. The International Journal of Robotics Research , pages=. 2023 , publisher=

2023

[76] [76]

Vision research , volume=

Probability summation and regional variation in contrast sensitivity across the visual field , author=. Vision research , volume=. 1981 , publisher=

1981

[77] [77]

Vision research , volume=

The contrast sensitivity gradient across the human visual field: With emphasis on the low spatial frequency range , author=. Vision research , volume=. 1989 , publisher=

1989

[78] [78]

2013 , publisher=

Color appearance models , author=. 2013 , publisher=

2013

[79] [79]

Journal of comparative neurology , volume=

Human photoreceptor topography , author=. Journal of comparative neurology , volume=. 1990 , publisher=

1990

[80] [80]

Journal of comparative Neurology , volume=

Topography of ganglion cells in human retina , author=. Journal of comparative Neurology , volume=. 1990 , publisher=

1990