A Systematic Survey on Event Camera Representation Learning

Hongwei Ren; Tuopusen Huang; Xiangqian Wu; Youxin Jiang

arxiv: 2606.23078 · v1 · pith:NN3NEEQ7new · submitted 2026-06-22 · 📡 eess.IV

A Systematic Survey on Event Camera Representation Learning

Hongwei Ren , Youxin Jiang , Tuopusen Huang , Xiangqian Wu This is my paper

Pith reviewed 2026-06-26 06:31 UTC · model grok-4.3

classification 📡 eess.IV

keywords event camerasrepresentation learningdense representationssparse representationsevent-based visionperception tasks

0 comments

The pith

Event camera representation learning splits into dense grid conversions and sparse discrete structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews methods for turning raw asynchronous event streams from event cameras into forms usable by neural networks. It groups the methods into dense-based representations that reshape events into regular grid structures and sparse-based representations that keep the original discrete event points. The split shows how choices affect regularity for standard networks, fidelity to timing, retention of sparsity, and ease of fusion with other data. The paper reviews design principles within each group, lists benchmarks for vision tasks, and points to open issues for better systems.

Core claim

Existing methods for event camera representation learning can be organized into two main categories—dense-based representations that transform raw event streams into regular grid-like structures and sparse-based representations that retain events as discrete spatio-temporal structures—clarifying balances among structural regularity, temporal fidelity, sparsity preservation, and architectural compatibility.

What carries the argument

Two-category taxonomy separating dense-based grid transformations from sparse-based discrete event retention.

If this is right

Dense representations support direct use of existing RGB network backbones and multimodal fusion methods.
Sparse representations maintain the original high temporal resolution and low data density of event streams.
The split organizes implications for both high-level perception tasks and low-level vision tasks.
Common benchmarks and evaluation protocols are collected for representative tasks.
Open problems point toward future work on efficiency, scalability, and robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid representations mixing grid and discrete elements could be designed to capture advantages from both sides.
The same dense-versus-sparse distinction may apply to other asynchronous sensors in robotics or sensing.
New methods that cross category boundaries would require updating or expanding the taxonomy.
The open problems section could motivate benchmarks that measure computational cost alongside accuracy.

Load-bearing premise

All recent methods fit cleanly into either the dense or sparse category with no major omissions or overlaps.

What would settle it

A published event representation method that cannot be placed in either the dense-based or sparse-based category.

Figures

Figures reproduced from arXiv: 2606.23078 by Hongwei Ren, Tuopusen Huang, Xiangqian Wu, Youxin Jiang.

**Figure 2.** Figure 2: Overview of diverse event camera representation paradigms using a “STOP” sign scene. (a), (b), (c), (e) show different frame-based mappings; (d) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Taxonomy of event camera representation learning. Existing methods are grouped into dense-based and sparse-based paradigms according to their [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of representative map-based representations. Raw events are [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of voxel-grid event representation. Raw events are dis [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of learned dense event representation. Raw events are [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of token-based event representation. Raw events are [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Illustration of graph-based event representation. Raw events are [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Overview of point-based event representations. Point-based methods [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

Event cameras offer distinctive advantages, including microsecond-level latency and high dynamic range, rendering them promising for challenging perception tasks. Inspired by biological vision, they output asynchronous and sparse event streams rather than dense image frames, creating a fundamental mismatch with mainstream neural networks. This survey reviews recent advances in event camera representation learning from the perspective of converting raw event streams into learnable representations. We organize existing methods into two main categories: (1) dense-based representations, which transform raw event streams into regular grid-like structures to leverage mature RGB backbones and multimodal fusion pipelines, and (2) sparse-based representations, which retain events as discrete spatio-temporal structures to preserve fine-grained temporal dynamics and data sparsity. This representation-centric organization clarifies how different representations balance structural regularity, temporal fidelity, sparsity preservation, and architectural compatibility. For each category, we examine the underlying design choices, modeling principles, and task-level implications.We further summarize standard benchmarks and evaluation settings across representative high-level perception and low-level vision tasks. Finally, we discuss open problems and outline future research directions toward more efficient, scalable, and robust event-based perception systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that organizes event camera methods into dense grid vs sparse event buckets, which is a reasonable synthesis but brings no new results or verified completeness.

read the letter

This survey groups event camera representation learning into dense-based methods that turn event streams into regular grids and sparse-based ones that keep the discrete events. The split is presented as clarifying trade-offs in regularity, temporal detail, sparsity, and network compatibility.

The paper does a solid job walking through the design principles in each category and linking them to task outcomes. It also collects standard benchmarks and evaluation settings for perception and low-level vision tasks, which gives a practical map of how people test these representations.

The dense-sparse distinction itself is not new; earlier work has used similar language. The main addition is the organized layout plus the section on open problems like efficiency and robustness.

The soft spot is coverage. The abstract gives no selection criteria or search method, so it is hard to judge whether the cited papers are representative or if the two buckets leave out hybrid approaches that mix grid conversion with explicit sparse handling. If the full text shows clean placement for most recent papers, that concern shrinks; otherwise the taxonomy loses some clarifying power.

This is mainly for newcomers to event cameras who want a quick overview of representation choices before picking a method. Experts already working in the area will find less that is fresh.

It deserves peer review. A well-executed survey on an active topic can reduce duplicated effort even without new experiments, provided the citations hold up and the categories are shown to be reasonably exhaustive.

Referee Report

2 major / 1 minor

Summary. This survey reviews advances in event camera representation learning, organizing methods into two categories—dense-based (converting event streams to regular grid-like structures for compatibility with RGB backbones) and sparse-based (retaining discrete spatio-temporal event structures)—while discussing design choices, task implications, benchmarks for high- and low-level vision tasks, and future directions.

Significance. If the taxonomy proves exhaustive and non-overlapping, the survey would provide a useful organizing framework that highlights trade-offs in structural regularity, temporal fidelity, sparsity, and architectural compatibility, potentially guiding efficient event-based perception research.

major comments (2)

[Abstract] Abstract: The central claim that the two-category taxonomy (dense-based vs. sparse-based) comprehensively and non-overlappingly covers recent literature is load-bearing for the survey's clarifying value, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or mapping of all cited works to categories; this leaves open the possibility of hybrid methods or unclassified papers that would undermine the partition's utility.
[Abstract] Abstract and taxonomy discussion sections: No analysis is given of boundary cases (e.g., methods that perform both a dense conversion step and retain explicit sparse structures), which directly affects whether the claimed organization clarifies balances among the four properties without circularity or omission.

minor comments (1)

[Abstract] The abstract states that standard benchmarks are summarized, but without a dedicated table or section reference listing the exact datasets, tasks, and metrics used across the surveyed papers, readers cannot easily verify coverage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The concerns regarding the taxonomy's documentation and boundary handling are valid, and we will revise the manuscript to address them directly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the two-category taxonomy (dense-based vs. sparse-based) comprehensively and non-overlappingly covers recent literature is load-bearing for the survey's clarifying value, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or mapping of all cited works to categories; this leaves open the possibility of hybrid methods or unclassified papers that would undermine the partition's utility.

Authors: We agree that the current abstract and taxonomy sections lack explicit documentation of literature selection. In the revision we will add a dedicated 'Literature Review Methodology' subsection that specifies search databases and keywords, time range, inclusion/exclusion criteria, and the total number of papers screened versus retained. We will also include a supplementary table that maps every cited method to its assigned category, with a column noting any hybrid characteristics and the rationale for the final assignment. This will make the partition's scope and any edge cases transparent. revision: yes
Referee: [Abstract] Abstract and taxonomy discussion sections: No analysis is given of boundary cases (e.g., methods that perform both a dense conversion step and retain explicit sparse structures), which directly affects whether the claimed organization clarifies balances among the four properties without circularity or omission.

Authors: We will expand the taxonomy discussion section with a new paragraph on boundary cases. It will explicitly identify representative hybrid methods, describe the decision rule used to assign them (primary representation fed into the learning pipeline), and analyze how the four properties (structural regularity, temporal fidelity, sparsity preservation, architectural compatibility) are traded off in those cases. This addition will demonstrate that the taxonomy is applied consistently rather than circularly. revision: yes

Circularity Check

0 steps flagged

No circularity: literature survey with no derivations or load-bearing self-references

full rationale

This paper is a systematic survey that reviews and organizes existing event-camera representation methods into two categories (dense-based grid-like and sparse-based discrete). No equations, fitted parameters, predictions, or derivation chains appear in the provided text or abstract. The taxonomy is presented as an organizational lens drawn from the literature rather than derived from or reducing to any self-citation, ansatz, or input by construction. The central claim does not invoke uniqueness theorems or prior author work as load-bearing justification; it simply classifies published methods. This matches the default expectation for non-circular survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Survey paper introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5724 in / 1007 out tokens · 20848 ms · 2026-06-26T06:31:14.381631+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

176 extracted references · 2 canonical work pages

[1]

Recent event camera innovations: A survey,

B. Chakravarthi, A. A. Verma, K. Daniilidis, C. Fermuller, and Y . Yang, “Recent event camera innovations: A survey,” inComputer Vision – ECCV 2024 Workshops, A. Del Bue, C. Canton, J. Pont-Tuset, and T. Tommasi, Eds. Cham: Springer Nature Switzerland, 2025, pp. 342–376

2024
[2]

IEEE Transactions on Pattern Analysis and Machine Intelligence44, 154–180 (2022) https://doi.org/ 10.1109/TPAMI.2020.3008413

G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “ Event-Based Vision: A Survey ,”IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 01, pp. 154–180, Jan. 2022. [Online]. Available: https: //doi.ieeecomputersociety.org/10.1109/TPAMI.2...

work page doi:10.1109/tpami.2020.3008413 2022
[3]

A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,

P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008

2008
[4]

End-to- end learning of representations for asynchronous event-based data,

D. Gehrig, A. Loquercio, K. Derpanis, and D. Scaramuzza, “End-to- end learning of representations for asynchronous event-based data,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5632–5642

2019
[5]

Mapping from frame- driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing–application to feedforward con- vnets,

J. A. P ´erez-Carrasco, B. Zhao, C. Serrano, B. Acha, T. Serrano- Gotarredona, S. Chen, and B. Linares-Barranco, “Mapping from frame- driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing–application to feedforward con- vnets,”IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 11, pp....

2013
[6]

A low power, fully event-based gesture recognition system,

A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendozaet al., “A low power, fully event-based gesture recognition system,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7243–7252

2017
[7]

Slayer: Spike layer error reassignment in time,

S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” inAdvances in Neural Information Processing Systems, vol. 31, 2018

2018
[8]

Temporal-wise attention spiking neural networks for event streams classification,

M. Yao, H. Gao, G. Zhao, D. Wang, Y . Lin, Z. Yang, and G. Li, “Temporal-wise attention spiking neural networks for event streams classification,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision, 2021, pp. 10 221–10 230

2021
[9]

Stsc-snn: Spatio- temporal synaptic connection with temporal convolution and attention for spiking neural networks,

Q. Xu, Y . Li, J. Shen, J. K. Liu, H. Tang, and G. Pan, “Stsc-snn: Spatio- temporal synaptic connection with temporal convolution and attention for spiking neural networks,”Frontiers in Neuroscience, vol. 16, p. 1079357, 2023

2023
[10]

Training deep spiking neural networks using backpropagation,

J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,”Frontiers in neuroscience, vol. 10, p. 508, 2016

2016
[11]

Mambaevt: Event stream based visual object tracking using state space model,

X. Wang, C. Wang, S. Wang, X. Wang, Z. Zhao, L. Zhu, and B. Jiang, “Mambaevt: Event stream based visual object tracking using state space model,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025
[12]

Event stream-based visual object tracking: Hdetrack v2 and a high-definition benchmark,

S. Wang, X. Wang, C. Wang, L. Jin, L. Zhu, B. Jiang, Y . Tian, and J. Tang, “Event stream-based visual object tracking: Hdetrack v2 and a high-definition benchmark,”arXiv preprint arXiv:2502.05574, 2025

arXiv 2025
[13]

Crsot: Cross-resolution object tracking using unaligned frame and event cameras,

Y . Zhu, X. Wang, C. Li, B. Jiang, L. Zhu, Z. Huang, Y . Tian, and J. Tang, “Crsot: Cross-resolution object tracking using unaligned frame and event cameras,”IEEE Transactions on Multimedia, 2025

2025
[14]

Visevent: Reliable object tracking via collaboration of frame and event flows,

X. Wang, J. Li, L. Zhu, Z. Zhang, Z. Chen, X. Li, Y . Wang, Y . Tian, and F. Wu, “Visevent: Reliable object tracking via collaboration of frame and event flows,”IEEE transactions on cybernetics, vol. 54, no. 3, pp. 1997–2010, 2023

1997
[15]

Event-based asynchronous sparse convolutional networks,

N. Messikommer, D. Gehrig, A. Loquercio, and D. Scaramuzza, “Event-based asynchronous sparse convolutional networks,” inEuro- pean Conference on Computer Vision, 2020

2020
[16]

EV-SegNet: Semantic segmentation for event-based cameras,

I. Alonso and A. C. Murillo, “EV-SegNet: Semantic segmentation for event-based cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019

2019
[17]

Depth any event stream: Enhancing event-based monocular depth estimation via dense-to-sparse distillation,

J. Zhu, T. Pan, Z. Cao, Y . Liu, J. T. Kwok, and H. Xiong, “Depth any event stream: Enhancing event-based monocular depth estimation via dense-to-sparse distillation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 5146–5155

2025
[18]

Spatio-temporal recurrent networks for event-based optical flow esti- mation,

Z. Ding, R. Zhao, J. Zhang, T. Gao, R. Xiong, Z. Yu, and T. Huang, “Spatio-temporal recurrent networks for event-based optical flow esti- mation,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 36, no. 1, 2022, pp. 525–533

2022
[19]

Is space-time attention all you need for video understanding?

G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” inIcml, vol. 2, no. 3, 2021, p. 4

2021
[20]

Tam: Temporal adaptive module for video recognition,

Z. Liu, L. Wang, W. Wu, C. Qian, and T. Lu, “Tam: Temporal adaptive module for video recognition,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13 708–13 718

2021
[21]

Transformer-based domain adap- tation for event data classification,

J. Zhao, S. Zhang, and T. Huang, “Transformer-based domain adap- tation for event data classification,” inICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4673–4677

2022
[22]

Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,

D. Gehrig, M. R ¨uegg, M. Gehrig, J. Hidalgo-Carri ´o, and D. Scara- muzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2822–2829, 2021

2021
[23]

CMX: Cross-modal fusion for rgb-x semantic segmentation with transform- ers,

J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “CMX: Cross-modal fusion for rgb-x semantic segmentation with transform- ers,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14 679–14 694, 2023

2023
[24]

RGB-event fusion for moving object detection in autonomous driv- ing,

Z. Zhou, Z. Wu, R. Boutteau, F. Yang, C. Demonceaux, and D. Ginhac, “RGB-event fusion for moving object detection in autonomous driv- ing,” inProceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2023

2023
[25]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141

2018
[26]

Multimodal fusion for sensorimotor control in steering angle prediction,

F. Munir, S. Azam, K.-C. Yow, B.-G. Lee, and M. Jeon, “Multimodal fusion for sensorimotor control in steering angle prediction,”Engineer- ing Applications of Artificial Intelligence, vol. 126, p. 107087, 2023

2023
[27]

Calibrated RGB-D salient object detection,

W. Ji, J. Li, S. Yu, M. Zhang, Y . Piao, S. Yao, Q. Bi, K. Ma, Y . Zheng, H. Lu, and L. Cheng, “Calibrated RGB-D salient object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9471–9481

2021
[28]

Embracing events and frames with hierarchical feature refinement network for object detection,

H. Cao, Z. Zhang, Y . Xia, X. Li, J. Xia, G. Chen, and A. Knoll, “Embracing events and frames with hierarchical feature refinement network for object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–17

2024
[29]

HALSIE: Hybrid approach to learning segmentation by simultane- ously exploiting image and event modalities,

S. Das Biswas, A. Kosta, C. Liyanagedera, M. Apolinario, and K. Roy, “HALSIE: Hybrid approach to learning segmentation by simultane- ously exploiting image and event modalities,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5964–5974

2024
[30]

De- livering arbitrary-modal semantic segmentation,

J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “De- livering arbitrary-modal semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1136–1147

2023
[31]

Ev-flownet: Self- supervised optical flow estimation for event-based cameras,

A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Ev-flownet: Self- supervised optical flow estimation for event-based cameras,”arXiv preprint arXiv:1802.06898, 2018

Pith/arXiv arXiv 2018
[32]

Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors,

G. Chen, H. Cao, C. Ye, Z. Zhang, X. Liu, X. Mo, Z. Qu, J. Conradt, F. R ¨ohrbein, and A. Knoll, “Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors,”Frontiers in Neurorobotics, vol. 13, p. 10, 2019

2019
[33]

Event-based visual flow,

R. Benosman, C. Clercq, X. Lagorce, S.-H. Ieng, and C. Bartolozzi, “Event-based visual flow,”IEEE transactions on neural networks and learning systems, vol. 25, no. 2, pp. 407–417, 2013

2013
[34]

An event-driven categorization model for aer image sensors using multispike encoding and learning,

R. Xiao, H. Tang, Y . Ma, R. Yan, and G. Orchard, “An event-driven categorization model for aer image sensors using multispike encoding and learning,”IEEE Transactions on Neural Networks and Learning Systems, 2019

2019
[35]

Effective aer object classification using segmented probability-maximization learning in spiking neural networks,

Q. Liu, H. Ruan, D. Xing, H. Tang, and G. Pan, “Effective aer object classification using segmented probability-maximization learning in spiking neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 1308–1315

2020
[36]

Hats: Histograms of averaged time surfaces for robust event-based ob- ject classification,

A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman, “Hats: Histograms of averaged time surfaces for robust event-based ob- ject classification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1731–1740

2018
[37]

Ecsnet: Spatio- temporal feature learning for event camera,

Z. Chen, J. Wu, J. Hou, L. Li, W. Dong, and G. Shi, “Ecsnet: Spatio- temporal feature learning for event camera,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 701– 712, 2022

2022
[38]

Fast event-based corner detection,

E. Mueggler, C. Bartolozzi, and D. Scaramuzza, “Fast event-based corner detection,” 2017

2017
[39]

Bina-rep event frames: A simple and effective representation for event-based cameras,

S. Barchid, J. Mennesson, and C. Dj ´eraba, “Bina-rep event frames: A simple and effective representation for event-based cameras,” in2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 3998–4002

2022
[40]

Temporal binary representation for event-based action recognition,

S. U. Innocenti, F. Becattini, F. Pernici, and A. Del Bimbo, “Temporal binary representation for event-based action recognition,” in2020 25th JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 17 International Conference on Pattern Recognition (ICPR), 2021, pp. 10 426–10 432

2020
[41]

Mambapupil: Bidirectional selective recurrent model for event- based eye tracking,

Z. Wang, Z. Wan, H. Han, B. Liao, Y . Wu, W. Zhai, Y . Cao, and Z.-J. Zha, “Mambapupil: Bidirectional selective recurrent model for event- based eye tracking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5762–5770

2024
[42]

Exploring temporal dynamics in event-based eye tracker,

H. Huang, X. Lin, H. Ren, Y . Zhou, and B. Cheng, “Exploring temporal dynamics in event-based eye tracker,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5145–5154

2025
[43]

V oxel-based multi-scale transformer network for event stream processing,

D. Liu, T. Wang, and C. Sun, “V oxel-based multi-scale transformer network for event stream processing,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2112–2124, 2023

2023
[44]

Event voxel set trans- former for spatiotemporal representation learning on event streams,

B. Xie, Y . Deng, Z. Shao, Q. Xu, and Y . Li, “Event voxel set trans- former for spatiotemporal representation learning on event streams,” IEEE Transactions on Circuits and Systems for Video Technology, 2024

2024
[45]

Fast classification and action recognition with event-based imaging,

C. Liu, X. Qi, E. Y . Lam, and N. Wong, “Fast classification and action recognition with event-based imaging,”IEEE access, vol. 10, pp. 55 638–55 649, 2022

2022
[46]

Compressed event sensing (ces) volumes for event cameras,

S. Lin, Y . Ma, J. Chen, and B. Wen, “Compressed event sensing (ces) volumes for event cameras,”International Journal of Computer Vision, vol. 133, no. 1, pp. 435–455, 2025

2025
[47]

High speed and high dynamic range video with an event camera,

H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,”IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 6, pp. 1964– 1980, 2019

1964
[48]

Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,

P. R. G. Cadena, Y . Qian, C. Wang, and M. Yang, “Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4150–4158

2023
[49]

Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy,

F. Paredes-Valles and G. C. H. E. de Croon, “Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3446–3455

2021
[50]

Revisiting color-event based tracking: A unified network, dataset, and metric,

C. Tang, X. Wang, J. Huang, B. Jiang, L. Zhu, S. Chen, J. Zhang, Y . Wang, and Y . Tian, “Revisiting color-event based tracking: A unified network, dataset, and metric,”Pattern Recognition, p. 112718, 2025

2025
[51]

Event-based vision enhanced: A joint detection framework in autonomous driving,

J. Li, S. Dong, Z. Yu, Y . Tian, and T. Huang, “Event-based vision enhanced: A joint detection framework in autonomous driving,” in Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 2019

2019
[52]

Better and faster: Adaptive event conversion for event-based object detection,

Y . Peng, Y . Zhang, Z. Xiong, X. Sun, and F. Wu, “Better and faster: Adaptive event conversion for event-based object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2056–2064

2023
[53]

Srfnet: Monocular depth estimation with fine-grained structure via spatial reliability-oriented fusion of frames and events,

T. Pan, Z. Cao, and L. Wang, “Srfnet: Monocular depth estimation with fine-grained structure via spatial reliability-oriented fusion of frames and events,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 10 695–10 702

2024
[54]

Secrets of event-based optical flow,

S. Shiba, Y . Aoki, and G. Gallego, “Secrets of event-based optical flow,” inEuropean Conference on Computer Vision, 2022, pp. 628–645

2022
[55]

Motion and appearance decoupling representation for event cameras,

N. Chen, B. Li, Y . Wang, X. Ying, L. Wang, C. Zhang, Y . Guo, M. Li, and W. An, “Motion and appearance decoupling representation for event cameras,”IEEE Transactions on Image Processing, vol. 34, pp. 5964–5977, 2025

2025
[56]

Y . Peng, H. Li, Y . Zhang, X. Sun, and F. Wu, “c,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 794–16 804

2024
[57]

Timereplayer: Unlocking the potential of event cameras for video interpolation,

W. He, K. You, Z. Qiao, X. Jia, Z. Zhang, W. Wang, H. Lu, Y . Wang, and J. Liao, “Timereplayer: Unlocking the potential of event cameras for video interpolation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 783–17 792

2022
[58]

Time lens: Event-based video frame interpolation,

S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y . Li, and D. Scaramuzza, “Time lens: Event-based video frame interpolation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 155–16 164

2021
[59]

Video interpolation by event-driven anisotropic adjustment of optical flow,

S. Wu, K. You, W. He, C. Yang, Y . Tian, Y . Wang, Z. Zhang, and J. Liao, “Video interpolation by event-driven anisotropic adjustment of optical flow,” inEuropean Conference on Computer Vision, 2022, pp. 267–283

2022
[60]

Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields,

T. Kim, Y . Chae, H.-K. Jang, and K.-J. Yoon, “Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 032–18 042

2023
[61]

Video frame interpolation via direct synthesis with the event-based reference,

Y . Liu, Y . Deng, H. Chen, and Z. Yang, “Video frame interpolation via direct synthesis with the event-based reference,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8477–8487

2024
[62]

Recurrent vision transformers for object detection with event cameras,

M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

2023
[63]

SMamba: Sparse mamba for event-based object detection,

N. Yang, Y . Wang, Z. Liu, M. Li, Y . An, and X. Zhao, “SMamba: Sparse mamba for event-based object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 9, 2025, pp. 9229–9237

2025
[64]

Automotive object detection via learning sparse events by spiking neurons,

H. Zhang, Y . Li, L. Leng, K. Che, Q. Liu, Q. Guo, J. Liao, and R. Cheng, “Automotive object detection via learning sparse events by spiking neurons,” 2024

2024
[65]

A multi-scale recurrent framework for motion segmentation with event camera,

S. Zhang, L. Sun, and K. Wang, “A multi-scale recurrent framework for motion segmentation with event camera,”IEEE Access, vol. 11, pp. 80 105–80 114, 2023

2023
[66]

Continuous-time object segmentation using high temporal resolution event camera,

L. Zhu, X. Chen, L. Wang, X. Wang, Y . Tian, and H. Huang, “Continuous-time object segmentation using high temporal resolution event camera,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[67]

ESS: Learning event-based semantic segmentation from still images,

Z. Sun, N. Messikommer, D. Gehrig, and D. Scaramuzza, “ESS: Learning event-based semantic segmentation from still images,” in European Conference on Computer Vision. Springer, 2022, pp. 341– 357

2022
[68]

Exploring event-driven dynamic context for accident scene segmentation,

J. Zhang, X. Yang, and R. Stiefelhagen, “Exploring event-driven dynamic context for accident scene segmentation,”IEEE Transactions on Intelligent Transportation Systems, 2021

2021
[69]

Efficient event-based semantic segmentation via exploiting frame-event fusion: A hybrid neural net- work approach,

K. Li, Y . Zhao, G. Lyu, and Y . Deng, “Efficient event-based semantic segmentation via exploiting frame-event fusion: A hybrid neural net- work approach,” inProceedings of the AAAI Conference on Artificial Intelligence, 2025

2025
[70]

MambaSeg: Harnessing mamba for accurate and efficient image-event semantic segmentation,

F. Gu, Y . Li, X. Long, K. Ji, C. Chen, Q. Gu, and Z. Ni, “MambaSeg: Harnessing mamba for accurate and efficient image-event semantic segmentation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026

2026
[71]

Learning monocular dense depth from events,

J. Hidalgo-Carri ´o, D. Gehrig, and D. Scaramuzza, “Learning monocular dense depth from events,” in2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 534–542

2020
[72]

Distil-e2d: Distilling image-to-depth priors for event-based monocular depth estimation,

J. L. Lee and G. H. Lee, “Distil-e2d: Distilling image-to-depth priors for event-based monocular depth estimation,” inAdvances in Neural Information Processing Systems, 2025

2025
[73]

Unsupervised event- based learning of optical flow, depth, and egomotion,

A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event- based learning of optical flow, depth, and egomotion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 989–997

2019
[74]

Self- supervised event-based monocular depth estimation using cross-modal consistency,

J. Zhu, L. Liu, B. Jiang, F. Wen, H. Zhang, W. Li, and Y . Liu, “Self- supervised event-based monocular depth estimation using cross-modal consistency,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7704–7710

2023
[75]

DCR-EFlow: Dynamic correlation recurrent architecture for optical flow estimation based on event cameras,

F. Sun, C. Su, B. Xiong, and Y . Wang, “DCR-EFlow: Dynamic correlation recurrent architecture for optical flow estimation based on event cameras,”Intelligent Computing, vol. 4, p. 0243, 2025

2025
[76]

Learning dense and continuous optical flow from an event camera,

Z. Wan, Y . Dai, and Y . Mao, “Learning dense and continuous optical flow from an event camera,”IEEE Transactions on Image Processing, 2022

2022
[77]

E-RAFT: Dense optical flow from event cameras,

M. Gehrig, M. Millh ¨ausler, D. Gehrig, and D. Scaramuzza, “E-RAFT: Dense optical flow from event cameras,” inInternational Conference on 3D Vision, 2021, pp. 197–206

2021
[78]

TMA: Temporal motion aggregation for event-based optical flow,

H. Liu, G. Chen, S. Qu, Y . Zhang, Z. Li, A. Knoll, and C. Jiang, “TMA: Temporal motion aggregation for event-based optical flow,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9685–9694

2023
[79]

Towards anytime optical flow estimation with event cameras,

Y . Ye, H. Shi, K. Yang, Z. Wang, X. Yin, Y . Lin, M. Liu, Y . Wang, and K. Wang, “Towards anytime optical flow estimation with event cameras,”Sensors, vol. 25, no. 10, p. 3158, 2025

2025
[80]

Learning optical flow from event camera with rendered dataset,

X. Luo, K. Luo, A. Luo, Z. Wang, P. Tan, and S. Liu, “Learning optical flow from event camera with rendered dataset,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9847–9857

2023

Showing first 80 references.

[1] [1]

Recent event camera innovations: A survey,

B. Chakravarthi, A. A. Verma, K. Daniilidis, C. Fermuller, and Y . Yang, “Recent event camera innovations: A survey,” inComputer Vision – ECCV 2024 Workshops, A. Del Bue, C. Canton, J. Pont-Tuset, and T. Tommasi, Eds. Cham: Springer Nature Switzerland, 2025, pp. 342–376

2024

[2] [2]

IEEE Transactions on Pattern Analysis and Machine Intelligence44, 154–180 (2022) https://doi.org/ 10.1109/TPAMI.2020.3008413

G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “ Event-Based Vision: A Survey ,”IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 01, pp. 154–180, Jan. 2022. [Online]. Available: https: //doi.ieeecomputersociety.org/10.1109/TPAMI.2...

work page doi:10.1109/tpami.2020.3008413 2022

[3] [3]

A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,

P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008

2008

[4] [4]

End-to- end learning of representations for asynchronous event-based data,

D. Gehrig, A. Loquercio, K. Derpanis, and D. Scaramuzza, “End-to- end learning of representations for asynchronous event-based data,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5632–5642

2019

[5] [5]

Mapping from frame- driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing–application to feedforward con- vnets,

J. A. P ´erez-Carrasco, B. Zhao, C. Serrano, B. Acha, T. Serrano- Gotarredona, S. Chen, and B. Linares-Barranco, “Mapping from frame- driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing–application to feedforward con- vnets,”IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 11, pp....

2013

[6] [6]

A low power, fully event-based gesture recognition system,

A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendozaet al., “A low power, fully event-based gesture recognition system,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7243–7252

2017

[7] [7]

Slayer: Spike layer error reassignment in time,

S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” inAdvances in Neural Information Processing Systems, vol. 31, 2018

2018

[8] [8]

Temporal-wise attention spiking neural networks for event streams classification,

M. Yao, H. Gao, G. Zhao, D. Wang, Y . Lin, Z. Yang, and G. Li, “Temporal-wise attention spiking neural networks for event streams classification,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision, 2021, pp. 10 221–10 230

2021

[9] [9]

Stsc-snn: Spatio- temporal synaptic connection with temporal convolution and attention for spiking neural networks,

Q. Xu, Y . Li, J. Shen, J. K. Liu, H. Tang, and G. Pan, “Stsc-snn: Spatio- temporal synaptic connection with temporal convolution and attention for spiking neural networks,”Frontiers in Neuroscience, vol. 16, p. 1079357, 2023

2023

[10] [10]

Training deep spiking neural networks using backpropagation,

J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,”Frontiers in neuroscience, vol. 10, p. 508, 2016

2016

[11] [11]

Mambaevt: Event stream based visual object tracking using state space model,

X. Wang, C. Wang, S. Wang, X. Wang, Z. Zhao, L. Zhu, and B. Jiang, “Mambaevt: Event stream based visual object tracking using state space model,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025

[12] [12]

Event stream-based visual object tracking: Hdetrack v2 and a high-definition benchmark,

S. Wang, X. Wang, C. Wang, L. Jin, L. Zhu, B. Jiang, Y . Tian, and J. Tang, “Event stream-based visual object tracking: Hdetrack v2 and a high-definition benchmark,”arXiv preprint arXiv:2502.05574, 2025

arXiv 2025

[13] [13]

Crsot: Cross-resolution object tracking using unaligned frame and event cameras,

Y . Zhu, X. Wang, C. Li, B. Jiang, L. Zhu, Z. Huang, Y . Tian, and J. Tang, “Crsot: Cross-resolution object tracking using unaligned frame and event cameras,”IEEE Transactions on Multimedia, 2025

2025

[14] [14]

Visevent: Reliable object tracking via collaboration of frame and event flows,

X. Wang, J. Li, L. Zhu, Z. Zhang, Z. Chen, X. Li, Y . Wang, Y . Tian, and F. Wu, “Visevent: Reliable object tracking via collaboration of frame and event flows,”IEEE transactions on cybernetics, vol. 54, no. 3, pp. 1997–2010, 2023

1997

[15] [15]

Event-based asynchronous sparse convolutional networks,

N. Messikommer, D. Gehrig, A. Loquercio, and D. Scaramuzza, “Event-based asynchronous sparse convolutional networks,” inEuro- pean Conference on Computer Vision, 2020

2020

[16] [16]

EV-SegNet: Semantic segmentation for event-based cameras,

I. Alonso and A. C. Murillo, “EV-SegNet: Semantic segmentation for event-based cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019

2019

[17] [17]

Depth any event stream: Enhancing event-based monocular depth estimation via dense-to-sparse distillation,

J. Zhu, T. Pan, Z. Cao, Y . Liu, J. T. Kwok, and H. Xiong, “Depth any event stream: Enhancing event-based monocular depth estimation via dense-to-sparse distillation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 5146–5155

2025

[18] [18]

Spatio-temporal recurrent networks for event-based optical flow esti- mation,

Z. Ding, R. Zhao, J. Zhang, T. Gao, R. Xiong, Z. Yu, and T. Huang, “Spatio-temporal recurrent networks for event-based optical flow esti- mation,” inProceedings of the AAAI Conference on Artificial Intelli- gence, vol. 36, no. 1, 2022, pp. 525–533

2022

[19] [19]

Is space-time attention all you need for video understanding?

G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” inIcml, vol. 2, no. 3, 2021, p. 4

2021

[20] [20]

Tam: Temporal adaptive module for video recognition,

Z. Liu, L. Wang, W. Wu, C. Qian, and T. Lu, “Tam: Temporal adaptive module for video recognition,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13 708–13 718

2021

[21] [21]

Transformer-based domain adap- tation for event data classification,

J. Zhao, S. Zhang, and T. Huang, “Transformer-based domain adap- tation for event data classification,” inICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4673–4677

2022

[22] [22]

Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,

D. Gehrig, M. R ¨uegg, M. Gehrig, J. Hidalgo-Carri ´o, and D. Scara- muzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2822–2829, 2021

2021

[23] [23]

CMX: Cross-modal fusion for rgb-x semantic segmentation with transform- ers,

J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “CMX: Cross-modal fusion for rgb-x semantic segmentation with transform- ers,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14 679–14 694, 2023

2023

[24] [24]

RGB-event fusion for moving object detection in autonomous driv- ing,

Z. Zhou, Z. Wu, R. Boutteau, F. Yang, C. Demonceaux, and D. Ginhac, “RGB-event fusion for moving object detection in autonomous driv- ing,” inProceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2023

2023

[25] [25]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141

2018

[26] [26]

Multimodal fusion for sensorimotor control in steering angle prediction,

F. Munir, S. Azam, K.-C. Yow, B.-G. Lee, and M. Jeon, “Multimodal fusion for sensorimotor control in steering angle prediction,”Engineer- ing Applications of Artificial Intelligence, vol. 126, p. 107087, 2023

2023

[27] [27]

Calibrated RGB-D salient object detection,

W. Ji, J. Li, S. Yu, M. Zhang, Y . Piao, S. Yao, Q. Bi, K. Ma, Y . Zheng, H. Lu, and L. Cheng, “Calibrated RGB-D salient object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9471–9481

2021

[28] [28]

Embracing events and frames with hierarchical feature refinement network for object detection,

H. Cao, Z. Zhang, Y . Xia, X. Li, J. Xia, G. Chen, and A. Knoll, “Embracing events and frames with hierarchical feature refinement network for object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–17

2024

[29] [29]

HALSIE: Hybrid approach to learning segmentation by simultane- ously exploiting image and event modalities,

S. Das Biswas, A. Kosta, C. Liyanagedera, M. Apolinario, and K. Roy, “HALSIE: Hybrid approach to learning segmentation by simultane- ously exploiting image and event modalities,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5964–5974

2024

[30] [30]

De- livering arbitrary-modal semantic segmentation,

J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “De- livering arbitrary-modal semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1136–1147

2023

[31] [31]

Ev-flownet: Self- supervised optical flow estimation for event-based cameras,

A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Ev-flownet: Self- supervised optical flow estimation for event-based cameras,”arXiv preprint arXiv:1802.06898, 2018

Pith/arXiv arXiv 2018

[32] [32]

Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors,

G. Chen, H. Cao, C. Ye, Z. Zhang, X. Liu, X. Mo, Z. Qu, J. Conradt, F. R ¨ohrbein, and A. Knoll, “Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors,”Frontiers in Neurorobotics, vol. 13, p. 10, 2019

2019

[33] [33]

Event-based visual flow,

R. Benosman, C. Clercq, X. Lagorce, S.-H. Ieng, and C. Bartolozzi, “Event-based visual flow,”IEEE transactions on neural networks and learning systems, vol. 25, no. 2, pp. 407–417, 2013

2013

[34] [34]

An event-driven categorization model for aer image sensors using multispike encoding and learning,

R. Xiao, H. Tang, Y . Ma, R. Yan, and G. Orchard, “An event-driven categorization model for aer image sensors using multispike encoding and learning,”IEEE Transactions on Neural Networks and Learning Systems, 2019

2019

[35] [35]

Effective aer object classification using segmented probability-maximization learning in spiking neural networks,

Q. Liu, H. Ruan, D. Xing, H. Tang, and G. Pan, “Effective aer object classification using segmented probability-maximization learning in spiking neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 1308–1315

2020

[36] [36]

Hats: Histograms of averaged time surfaces for robust event-based ob- ject classification,

A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman, “Hats: Histograms of averaged time surfaces for robust event-based ob- ject classification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1731–1740

2018

[37] [37]

Ecsnet: Spatio- temporal feature learning for event camera,

Z. Chen, J. Wu, J. Hou, L. Li, W. Dong, and G. Shi, “Ecsnet: Spatio- temporal feature learning for event camera,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 701– 712, 2022

2022

[38] [38]

Fast event-based corner detection,

E. Mueggler, C. Bartolozzi, and D. Scaramuzza, “Fast event-based corner detection,” 2017

2017

[39] [39]

Bina-rep event frames: A simple and effective representation for event-based cameras,

S. Barchid, J. Mennesson, and C. Dj ´eraba, “Bina-rep event frames: A simple and effective representation for event-based cameras,” in2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 3998–4002

2022

[40] [40]

Temporal binary representation for event-based action recognition,

S. U. Innocenti, F. Becattini, F. Pernici, and A. Del Bimbo, “Temporal binary representation for event-based action recognition,” in2020 25th JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 17 International Conference on Pattern Recognition (ICPR), 2021, pp. 10 426–10 432

2020

[41] [41]

Mambapupil: Bidirectional selective recurrent model for event- based eye tracking,

Z. Wang, Z. Wan, H. Han, B. Liao, Y . Wu, W. Zhai, Y . Cao, and Z.-J. Zha, “Mambapupil: Bidirectional selective recurrent model for event- based eye tracking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5762–5770

2024

[42] [42]

Exploring temporal dynamics in event-based eye tracker,

H. Huang, X. Lin, H. Ren, Y . Zhou, and B. Cheng, “Exploring temporal dynamics in event-based eye tracker,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5145–5154

2025

[43] [43]

V oxel-based multi-scale transformer network for event stream processing,

D. Liu, T. Wang, and C. Sun, “V oxel-based multi-scale transformer network for event stream processing,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2112–2124, 2023

2023

[44] [44]

Event voxel set trans- former for spatiotemporal representation learning on event streams,

B. Xie, Y . Deng, Z. Shao, Q. Xu, and Y . Li, “Event voxel set trans- former for spatiotemporal representation learning on event streams,” IEEE Transactions on Circuits and Systems for Video Technology, 2024

2024

[45] [45]

Fast classification and action recognition with event-based imaging,

C. Liu, X. Qi, E. Y . Lam, and N. Wong, “Fast classification and action recognition with event-based imaging,”IEEE access, vol. 10, pp. 55 638–55 649, 2022

2022

[46] [46]

Compressed event sensing (ces) volumes for event cameras,

S. Lin, Y . Ma, J. Chen, and B. Wen, “Compressed event sensing (ces) volumes for event cameras,”International Journal of Computer Vision, vol. 133, no. 1, pp. 435–455, 2025

2025

[47] [47]

High speed and high dynamic range video with an event camera,

H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,”IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 6, pp. 1964– 1980, 2019

1964

[48] [48]

Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,

P. R. G. Cadena, Y . Qian, C. Wang, and M. Yang, “Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4150–4158

2023

[49] [49]

Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy,

F. Paredes-Valles and G. C. H. E. de Croon, “Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3446–3455

2021

[50] [50]

Revisiting color-event based tracking: A unified network, dataset, and metric,

C. Tang, X. Wang, J. Huang, B. Jiang, L. Zhu, S. Chen, J. Zhang, Y . Wang, and Y . Tian, “Revisiting color-event based tracking: A unified network, dataset, and metric,”Pattern Recognition, p. 112718, 2025

2025

[51] [51]

Event-based vision enhanced: A joint detection framework in autonomous driving,

J. Li, S. Dong, Z. Yu, Y . Tian, and T. Huang, “Event-based vision enhanced: A joint detection framework in autonomous driving,” in Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 2019

2019

[52] [52]

Better and faster: Adaptive event conversion for event-based object detection,

Y . Peng, Y . Zhang, Z. Xiong, X. Sun, and F. Wu, “Better and faster: Adaptive event conversion for event-based object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2056–2064

2023

[53] [53]

Srfnet: Monocular depth estimation with fine-grained structure via spatial reliability-oriented fusion of frames and events,

T. Pan, Z. Cao, and L. Wang, “Srfnet: Monocular depth estimation with fine-grained structure via spatial reliability-oriented fusion of frames and events,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 10 695–10 702

2024

[54] [54]

Secrets of event-based optical flow,

S. Shiba, Y . Aoki, and G. Gallego, “Secrets of event-based optical flow,” inEuropean Conference on Computer Vision, 2022, pp. 628–645

2022

[55] [55]

Motion and appearance decoupling representation for event cameras,

N. Chen, B. Li, Y . Wang, X. Ying, L. Wang, C. Zhang, Y . Guo, M. Li, and W. An, “Motion and appearance decoupling representation for event cameras,”IEEE Transactions on Image Processing, vol. 34, pp. 5964–5977, 2025

2025

[56] [56]

Y . Peng, H. Li, Y . Zhang, X. Sun, and F. Wu, “c,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 794–16 804

2024

[57] [57]

Timereplayer: Unlocking the potential of event cameras for video interpolation,

W. He, K. You, Z. Qiao, X. Jia, Z. Zhang, W. Wang, H. Lu, Y . Wang, and J. Liao, “Timereplayer: Unlocking the potential of event cameras for video interpolation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 783–17 792

2022

[58] [58]

Time lens: Event-based video frame interpolation,

S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y . Li, and D. Scaramuzza, “Time lens: Event-based video frame interpolation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 155–16 164

2021

[59] [59]

Video interpolation by event-driven anisotropic adjustment of optical flow,

S. Wu, K. You, W. He, C. Yang, Y . Tian, Y . Wang, Z. Zhang, and J. Liao, “Video interpolation by event-driven anisotropic adjustment of optical flow,” inEuropean Conference on Computer Vision, 2022, pp. 267–283

2022

[60] [60]

Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields,

T. Kim, Y . Chae, H.-K. Jang, and K.-J. Yoon, “Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 032–18 042

2023

[61] [61]

Video frame interpolation via direct synthesis with the event-based reference,

Y . Liu, Y . Deng, H. Chen, and Z. Yang, “Video frame interpolation via direct synthesis with the event-based reference,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8477–8487

2024

[62] [62]

Recurrent vision transformers for object detection with event cameras,

M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

2023

[63] [63]

SMamba: Sparse mamba for event-based object detection,

N. Yang, Y . Wang, Z. Liu, M. Li, Y . An, and X. Zhao, “SMamba: Sparse mamba for event-based object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 9, 2025, pp. 9229–9237

2025

[64] [64]

Automotive object detection via learning sparse events by spiking neurons,

H. Zhang, Y . Li, L. Leng, K. Che, Q. Liu, Q. Guo, J. Liao, and R. Cheng, “Automotive object detection via learning sparse events by spiking neurons,” 2024

2024

[65] [65]

A multi-scale recurrent framework for motion segmentation with event camera,

S. Zhang, L. Sun, and K. Wang, “A multi-scale recurrent framework for motion segmentation with event camera,”IEEE Access, vol. 11, pp. 80 105–80 114, 2023

2023

[66] [66]

Continuous-time object segmentation using high temporal resolution event camera,

L. Zhu, X. Chen, L. Wang, X. Wang, Y . Tian, and H. Huang, “Continuous-time object segmentation using high temporal resolution event camera,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024

[67] [67]

ESS: Learning event-based semantic segmentation from still images,

Z. Sun, N. Messikommer, D. Gehrig, and D. Scaramuzza, “ESS: Learning event-based semantic segmentation from still images,” in European Conference on Computer Vision. Springer, 2022, pp. 341– 357

2022

[68] [68]

Exploring event-driven dynamic context for accident scene segmentation,

J. Zhang, X. Yang, and R. Stiefelhagen, “Exploring event-driven dynamic context for accident scene segmentation,”IEEE Transactions on Intelligent Transportation Systems, 2021

2021

[69] [69]

Efficient event-based semantic segmentation via exploiting frame-event fusion: A hybrid neural net- work approach,

K. Li, Y . Zhao, G. Lyu, and Y . Deng, “Efficient event-based semantic segmentation via exploiting frame-event fusion: A hybrid neural net- work approach,” inProceedings of the AAAI Conference on Artificial Intelligence, 2025

2025

[70] [70]

MambaSeg: Harnessing mamba for accurate and efficient image-event semantic segmentation,

F. Gu, Y . Li, X. Long, K. Ji, C. Chen, Q. Gu, and Z. Ni, “MambaSeg: Harnessing mamba for accurate and efficient image-event semantic segmentation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026

2026

[71] [71]

Learning monocular dense depth from events,

J. Hidalgo-Carri ´o, D. Gehrig, and D. Scaramuzza, “Learning monocular dense depth from events,” in2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 534–542

2020

[72] [72]

Distil-e2d: Distilling image-to-depth priors for event-based monocular depth estimation,

J. L. Lee and G. H. Lee, “Distil-e2d: Distilling image-to-depth priors for event-based monocular depth estimation,” inAdvances in Neural Information Processing Systems, 2025

2025

[73] [73]

Unsupervised event- based learning of optical flow, depth, and egomotion,

A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event- based learning of optical flow, depth, and egomotion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 989–997

2019

[74] [74]

Self- supervised event-based monocular depth estimation using cross-modal consistency,

J. Zhu, L. Liu, B. Jiang, F. Wen, H. Zhang, W. Li, and Y . Liu, “Self- supervised event-based monocular depth estimation using cross-modal consistency,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7704–7710

2023

[75] [75]

DCR-EFlow: Dynamic correlation recurrent architecture for optical flow estimation based on event cameras,

F. Sun, C. Su, B. Xiong, and Y . Wang, “DCR-EFlow: Dynamic correlation recurrent architecture for optical flow estimation based on event cameras,”Intelligent Computing, vol. 4, p. 0243, 2025

2025

[76] [76]

Learning dense and continuous optical flow from an event camera,

Z. Wan, Y . Dai, and Y . Mao, “Learning dense and continuous optical flow from an event camera,”IEEE Transactions on Image Processing, 2022

2022

[77] [77]

E-RAFT: Dense optical flow from event cameras,

M. Gehrig, M. Millh ¨ausler, D. Gehrig, and D. Scaramuzza, “E-RAFT: Dense optical flow from event cameras,” inInternational Conference on 3D Vision, 2021, pp. 197–206

2021

[78] [78]

TMA: Temporal motion aggregation for event-based optical flow,

H. Liu, G. Chen, S. Qu, Y . Zhang, Z. Li, A. Knoll, and C. Jiang, “TMA: Temporal motion aggregation for event-based optical flow,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9685–9694

2023

[79] [79]

Towards anytime optical flow estimation with event cameras,

Y . Ye, H. Shi, K. Yang, Z. Wang, X. Yin, Y . Lin, M. Liu, Y . Wang, and K. Wang, “Towards anytime optical flow estimation with event cameras,”Sensors, vol. 25, no. 10, p. 3158, 2025

2025

[80] [80]

Learning optical flow from event camera with rendered dataset,

X. Luo, K. Luo, A. Luo, Z. Wang, P. Tan, and S. Liu, “Learning optical flow from event camera with rendered dataset,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9847–9857

2023