pith. sign in

arxiv: 2606.19835 · v1 · pith:ELCEJNH3new · submitted 2026-06-18 · 💻 cs.CV

Neural Events: Discrete Asynchronous Autoencoders for Event-Based Vision

Pith reviewed 2026-06-26 18:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords event-based visionneural eventsasynchronous autoencodersevent compressionobject detectionevent camerasdiscrete codesspatio-temporal encoding
0
0 comments X

The pith

Discrete asynchronous autoencoders turn raw event camera streams into a small set of neural events that preserve performance on detection and classification tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces neural events to compress the continuous high-volume stream produced by event cameras. Each neural event encodes a local spatio-temporal context window using a discrete learnable code that only triggers when the code changes. Networks trained directly on these neural events match or exceed the accuracy of existing methods on object detection and classification while cutting the overall event rate by a factor of two. This approach tackles the core problem of integrating many low-information events without overwhelming downstream processing. A sympathetic reader cares because it offers a route to efficient, high-temporal-resolution vision systems that do not sacrifice semantic content.

Core claim

Re-tokenizing event streams into neural events, where each represents a local spatio-temporal context window encoded by a discrete learnable code from an asynchronous autoencoder, produces a highly compressed data stream. Every code flip triggers a neural event. Networks trained on these neural events perform on par with or surpass state-of-the-art approaches on object detection and classification while reducing the event rate by a factor of 2.0.

What carries the argument

Asynchronous autoencoder that outputs discrete learnable codes; a neural event is emitted each time one of these codes flips.

If this is right

  • Object detection and classification pipelines can operate on half the data volume while retaining or improving accuracy.
  • Downstream algorithms receive a more semantically dense input that balances fine temporal detail with manageable throughput.
  • Event-based systems can avoid being overwhelmed by torrents of minimal-information brightness-change events.
  • The same neural-event representation supports both classification and detection without separate retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compression principle could extend to other high-rate asynchronous sensors such as neuromorphic audio or tactile arrays.
  • Lower event rates would reduce bandwidth and power demands in embedded robotics or autonomous driving setups.
  • If the codes prove task-agnostic, the autoencoder could serve as a general front-end for multiple simultaneous vision tasks.

Load-bearing premise

The discrete learnable codes preserve enough spatio-temporal information for downstream tasks without task-specific retraining or significant quantization loss.

What would settle it

Evaluation on a held-out event-camera dataset in which models trained on neural events drop more than 5 percent in mean average precision or accuracy relative to raw-event baselines at the same compression level.

Figures

Figures reproduced from arXiv: 2606.19835 by Daniel Gehrig, Davide Scaramuzza, Roberto Pellerito, Shintaro Shiba.

Figure 1
Figure 1. Figure 1: Method overview. Our architecture compresses raw events into a low-bandwidth stream of neural events via four stages: (i) Raw events ei are projected into a continuous vector space using spatial and temporal embeddings. (ii) A linear-attention sequence model (RWKV-7) updates a localized memory state to output a continuous logit vector oi per event. (iii) A Gumbel Softmax operator maps the continuous logits… view at source ↗
Figure 2
Figure 2. Figure 2: Pre-training. Temporally averaged and stacked codes, H are decoded and supervised with time surface [57]. To mitigate frequent code flipping (chatter), i.e. frequent crossing of the decision boundaries (dashed line) in the probability simplex, a rate alignment and latent straightening loss are applied to the logit sequence oi . This yields smooth trajectories and reduces flipping. RWKV-7 mixes temporal fea… view at source ↗
Figure 3
Figure 3. Figure 3: Empirical log probabilities of transitioning from code [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Event cameras capture dynamic scenes with exceptional temporal fidelity by representing them as a continuous stream of microsecond resolution \textit{events}. Each individual event, however, only carries minimal semantic value, merely signaling a localized brightness change. To derive meaningful signals, downstream algorithms need to quickly integrate cues from a potentially massive torrent of low-information events. Current architectures, however, are easily overwhelmed, struggling to balance capturing fine-grained temporal dynamics and maintaining a manageable data throughput. This paper proposes a framework to re-tokenize event streams into a small set of highly informative \textit{neural events}, each representing a local spatio-temporal context window with a discrete learnable code. Every time this code flips, a neural event is triggered, yielding a highly compressed data stream. We demonstrate that, across object detection and classification, networks trained on neural events are on par or surpass the performance of state-of-the-art approaches while reducing the event rate by a factor of 2.0.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes neural events as a compressed representation of event-camera data streams, obtained by re-tokenizing raw events via a discrete asynchronous autoencoder whose learnable codes trigger an output event only when they flip. The central empirical claim is that models trained on these neural events achieve object-detection and classification performance on par with or exceeding state-of-the-art event-based methods while reducing the raw event rate by a factor of two.

Significance. If the preservation of task-relevant spatio-temporal information by the discrete codes can be demonstrated, the approach would address a core bottleneck in event-based vision—balancing temporal fidelity against data volume—potentially enabling more efficient downstream pipelines. The use of learnable discrete codes in an asynchronous setting is a conceptually interesting direction, but the abstract supplies no quantitative support for the performance or compression claims.

major comments (2)
  1. [Abstract] Abstract: the headline result (parity or improvement over SOTA while halving event rate) is asserted without any numerical results, dataset names, baseline methods, codebook size, reconstruction metrics, or ablation on quantization loss. Because the central claim rests on the unshown premise that the discrete codes retain sufficient spatio-temporal information, this absence is load-bearing and prevents evaluation of the data-to-claim link.
  2. [Abstract] Abstract: no description is given of the autoencoder architecture, training procedure, loss functions, or how the discrete codes are generated and triggered, making it impossible to assess whether fine-grained timing or polarity cues exploited by existing SOTA methods are preserved or discarded.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'across object detection and classification' is used without naming the specific tasks, datasets, or evaluation protocols that would allow readers to contextualize the claimed factor-of-2.0 reduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on the abstract. We agree that the abstract would benefit from greater specificity to better support the central claims and will revise it accordingly while preserving its length constraints.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline result (parity or improvement over SOTA while halving event rate) is asserted without any numerical results, dataset names, baseline methods, codebook size, reconstruction metrics, or ablation on quantization loss. Because the central claim rests on the unshown premise that the discrete codes retain sufficient spatio-temporal information, this absence is load-bearing and prevents evaluation of the data-to-claim link.

    Authors: The experimental sections of the manuscript report the supporting quantitative results, including task performance metrics, comparisons against listed baselines, the factor-of-2.0 event-rate reduction, and codebook size. We acknowledge that these details are not summarized in the abstract itself. We will revise the abstract to include concise numerical highlights (e.g., the observed event-rate reduction and performance parity/improvement on the evaluated detection and classification tasks) to strengthen the data-to-claim link. revision: yes

  2. Referee: [Abstract] Abstract: no description is given of the autoencoder architecture, training procedure, loss functions, or how the discrete codes are generated and triggered, making it impossible to assess whether fine-grained timing or polarity cues exploited by existing SOTA methods are preserved or discarded.

    Authors: The methods section provides the full description of the discrete asynchronous autoencoder, its training procedure, the composite loss (including quantization), and the flip-based triggering mechanism for neural events. The design intentionally retains polarity and local spatio-temporal context. To address the abstract-level concern, we will add a brief clause describing the core mechanism (e.g., “re-tokenizing via a discrete asynchronous autoencoder whose learnable codes trigger output events on flips”). revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical performance claims are independent of inputs.

full rationale

The paper proposes an asynchronous autoencoder that produces discrete learnable codes to generate compressed neural events from raw event streams, then reports experimental results on object detection and classification tasks. The central claim (performance on par with or exceeding SOTA at 2x lower event rate) is an empirical outcome measured after training and evaluation, not a quantity derived by construction from the method definition or from fitted parameters renamed as predictions. No equations, uniqueness theorems, or self-citations are shown that would reduce the reported metrics to tautological equivalence with the input event data or the autoencoder loss. The derivation chain is self-contained against external benchmarks because success is defined by downstream task metrics that can falsify the preservation assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract provides insufficient technical detail to enumerate free parameters or background axioms; the central claim rests on the unstated premise that the autoencoder training objective aligns with downstream task objectives.

invented entities (1)
  • neural events no independent evidence
    purpose: highly compressed discrete tokens representing local spatio-temporal context
    Core output of the proposed framework; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5704 in / 1024 out tokens · 22023 ms · 2026-06-26T18:20:48.264643+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 4 canonical work pages

  1. [1]

    In: 2021 IEEE International Conference on Image Processing (ICIP)

    Banerjee, S., Wang, Z.W., Chopp, H.H., Cossairt, O., Katsaggelos, A.K.: Lossy event compres- sion based on image-derived quad trees and poisson disk sampling. In: 2021 IEEE International Conference on Image Processing (ICIP). pp. 2154–2158. IEEE (2021)

  2. [2]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: Asynchronous convolutional networks for object detection in neuromorphic cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 0–0 (2019)

  3. [3]

    In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16

    Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: A differentiable recurrent surface for asynchronous event-based data. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. pp. 136–152. Springer (2020)

  4. [4]

    arXiv preprint arXiv:2304.13918 (2023)

    Chakrabartty, S., Thakur, C.S., et al.: Neuromorphic computing with aer using time-to-event- margin propagation. arXiv preprint arXiv:2304.13918 (2023)

  5. [5]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

    Chen, N.F.: Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 644–653 (2018)

  6. [6]

    In: 2022 International Joint Conference on Neural Networks (IJCNN)

    Cordone, L., Miramond, B., Thierion, P.: Object detection with spiking neural networks on automotive event data. In: 2022 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2022)

  7. [7]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Dampfhoffer, M., Mesquida, T., Joubert, D., Dalgaty, T., Vivet, P., Posch, C.: Graph neural network combining event stream and periodic aggregation for low-latency event-based vision. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6909–6918 (2025)

  8. [8]

    arXiv preprint arXiv:2001.08499 (2020)

    De Tournemire, P., Nitti, D., Perot, E., Migliore, D., Sironi, A.: A large scale event-based detection dataset for automotive. arXiv preprint arXiv:2001.08499 (2020)

  9. [9]

    In: Proc

    Delbruck, T.: Frame-free dynamic digital vision. In: Proc. Int. Symp. Secure-Life Electron. pp. 21–26 (2008). https://doi.org/10.5167/uzh-17620

  10. [10]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Fan, R., Hao, W., Guan, J., Rui, L., Gu, L., Wu, T., Zeng, F., Zhu, Z.: Eventpillars: Pillar-based efficient representations for event data. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 2861–2869 (2025)

  11. [11]

    arXiv preprint arXiv:2504.15371 (2025)

    Fang, W., Panda, P.: Event2vec: Processing neuromorphic events directly by representations in vector space. arXiv preprint arXiv:2504.15371 (2025)

  12. [12]

    In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC)

    Finateu, T., Niwa, A., Matolin, D., Tsuchimoto, K., Mascheroni, A., Reynaud, E., Mostafalu, P., Brady, F., Chotard, L., LeGoff, F., et al.: 5.10 a 1280× 720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86 µm pixels, 1.066 geps readout, programmable event-rate controller and compressive data-formatting pipeline. In: 2020 IEEE...

  13. [13]

    IEEE transactions on pattern analysis and machine intelligence44(1), 154–180 (2020)

    Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A.J., Conradt, J., Daniilidis, K., et al.: Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence44(1), 154–180 (2020)

  14. [14]

    arXiv preprint arXiv:2107.08430 (2021)

    Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  15. [15]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Gehrig, D., Loquercio, A., Derpanis, K.G., Scaramuzza, D.: End-to-end learning of represen- tations for asynchronous event-based data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5633–5643 (2019)

  16. [16]

    Gehrig, D., Scaramuzza, D.: Are high-resolution event cameras really needed? In: arxiv:2203.14672 (2022)

  17. [17]

    Nature 629(8014), 1034–1040 (2024) 13

    Gehrig, D., Scaramuzza, D.: Low-latency automotive vision with event cameras. Nature 629(8014), 1034–1040 (2024) 13

  18. [18]

    IEEE Robotics and Automation Letters6(3), 4947–4954 (2021)

    Gehrig, M., Aarents, W., Gehrig, D., Scaramuzza, D.: Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters6(3), 4947–4954 (2021)

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Gehrig, M., Scaramuzza, D.: Recurrent vision transformers for object detection with event cam- eras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13884–13893 (2023)

  20. [20]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Graham, B., Engelcke, M., Van Der Maaten, L.: 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 9224–9232 (2018)

  21. [21]

    arXiv preprint arXiv:2312.00752 (2023)

    Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

  22. [22]

    In: 25th Asia and South Pacific Design Automation Conference (ASP-DAC)

    Guo, S., Kang, Z., Wang, L., Li, S., Xu, W.: Hashheat: An O(C) complexity hashing-based filter for dynamic vision sensor. In: 25th Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 452–457 (2020)

  23. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Hamaguchi, R., Furukawa, Y ., Onishi, M., Sakurada, K.: Hierarchical neural memory network for low latency event processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22867–22876 (2023)

  24. [24]

    arXiv preprint arXiv:2603.06228 (2026)

    Hao, H., Sui, Z., Zou, R., Dai, Z., Zubi´c, N., Scaramuzza, D., Wang, W.: Low-latency event- based object detection with spatially-sparse linear attention. arXiv preprint arXiv:2603.06228 (2026)

  25. [25]

    arXiv preprint arXiv:2505.11165 (2025)

    Hao, H., Zubi´c, N., He, W., Sui, Z., Scaramuzza, D., Wang, W.: Maximizing asynchronicity in event-based neural networks. arXiv preprint arXiv:2505.11165 (2025)

  26. [26]

    In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Iacono, M., Weber, S., Glover, A., Bartolozzi, C.: Towards event-driven object detection with off-the-shelf deep learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1–9. IEEE (2018)

  27. [27]

    In: 2019 International Conference on Robotics and Automation (ICRA)

    Jiang, Z., Xia, P., Huang, K., Stechele, W., Chen, G., Bing, Z., Knoll, A.: Mixed frame-/event- driven fast pedestrian detection. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 8332–8338. IEEE (2019)

  28. [28]

    In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)

    Jouppi, N.P., Yoon, D.H., Ashcraft, M., Gottscho, M., Jablin, T.B., Kurian, G., Laudon, J., Li, S., Ma, P., Ma, X., et al.: Ten lessons from three generations shaped google’s tpuv4i: Industrial product. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). pp. 1–14. IEEE (2021)

  29. [29]

    In: The Eleventh International Conference on Learning Representations (2023)

    Kamal, U., Dash, S., Mukhopadhyay, S.: Associative memory augmented asynchronous spa- tiotemporal representation learning for event-based perception. In: The Eleventh International Conference on Learning Representations (2023)

  30. [30]

    Ieee Access8, 103149–103163 (2020)

    Khan, N., Iqbal, K., Martini, M.G.: Lossless compression of data from static and mobile dynamic vision sensors-performance and trade-offs. Ieee Access8, 103149–103163 (2020)

  31. [31]

    arXiv preprint arXiv:1412.6980 (2014)

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  32. [32]

    IEEE transactions on pattern analysis and machine intelligence39(7), 1346–1359 (2016)

    Lagorce, X., Orchard, G., Galluppi, F., Shi, B.E., Benosman, R.B.: Hots: a hierarchy of event- based time-surfaces for pattern recognition. IEEE transactions on pattern analysis and machine intelligence39(7), 1346–1359 (2016)

  33. [33]

    In: European conference on computer vision

    Lee, C., Kosta, A.K., Zhu, A.Z., Chaney, K., Daniilidis, K., Roy, K.: Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks. In: European conference on computer vision. pp. 366–382. Springer (2020)

  34. [34]

    IEEE Transactions on Image Processing 31, 2975–2987 (2022) 14

    Li, J., Li, J., Zhu, L., Xiang, X., Huang, T., Tian, Y .: Asynchronous spatio-temporal memory network for continuous event-based object detection. IEEE Transactions on Image Processing 31, 2975–2987 (2022) 14

  35. [35]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Li, Y ., Zhou, H., Yang, B., Zhang, Y ., Cui, Z., Bao, H., Zhang, G.: Graph-based asynchronous event processing for rapid object recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 934–943 (2021)

  36. [36]

    Journal of Machine Learning Research23(42), 1–85 (2022)

    Li, Z., Han, J., Li, Q., et al.: Approximation and optimization theory for linear continuous-time recurrent neural networks. Journal of Machine Learning Research23(42), 1–85 (2022)

  37. [37]

    Lichtsteiner, P., Posch, C., Delbruck, T.: A 128 ×128 120 dB 15 µs latency asyn- chronous temporal contrast vision sensor. IEEE J. Solid-State Circuits43(2), 566–576 (2008). https://doi.org/10.1109/JSSC.2007.914337

  38. [38]

    In: IEEE Int

    Liu, H., Brandli, C., Li, C., Liu, S.C., Delbruck, T.: Design of a spatiotemporal correlation filter for event-based sensors. In: IEEE Int. Symp. Circuits Syst. (ISCAS). pp. 722–725 (2015). https://doi.org/10.1109/ISCAS.2015.7168735

  39. [39]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liu, Z., Hu, H., Lin, Y ., Yao, Z., Xie, Z., Wei, Y ., Ning, J., Cao, Y ., Zhang, Z., Dong, L., et al.: Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12009–12019 (2022)

  40. [40]

    In: International Conference on Learning Representations (2018)

    Martin, E., Cundy, C.: Parallelizing linear recurrent neural nets over sequence length. In: International Conference on Learning Representations (2018)

  41. [41]

    In: European Conference on Computer Vision

    Messikommer, N., Gehrig, D., Loquercio, A., Scaramuzza, D.: Event-based asynchronous sparse convolutional networks. In: European Conference on Computer Vision. pp. 415–431. Springer (2020)

  42. [42]

    arXiv preprint arXiv:2510.26614 (2025)

    Øhrstrøm, C.K., Güldenring, R., Nalpantidis, L.: Spiking patches: Asynchronous, sparse, and efficient tokens for event cameras. arXiv preprint arXiv:2510.26614 (2025)

  43. [43]

    Frontiers in neuroscience9, 437 (2015)

    Orchard, G., Jayawant, A., Cohen, G.K., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in neuroscience9, 437 (2015)

  44. [44]

    arXiv preprint arXiv:2305.13048 (2023)

    Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Biderman, S., Cao, H., Cheng, X., Chung, M., Grella, M., et al.: Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048 (2023)

  45. [45]

    arXiv preprint arXiv:2404.058923(2024)

    Peng, B., Goldstein, D., Anthony, Q., Albalak, A., Alcaide, E., Biderman, S., Cheah, E., Ferdinan, T., Hou, H., Kazienko, P., et al.: Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence. arXiv preprint arXiv:2404.058923(2024)

  46. [46]

    In: Second Conference on Language Modeling (2025)

    Peng, B., Zhang, R., Goldstein, D., Alcaide, E., Du, X., Hou, H., Lin, J., Liu, J., Lu, J., Merrill, W., et al.: Rwkv-7" goose" with expressive dynamic state evolution. In: Second Conference on Language Modeling (2025)

  47. [47]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Peng, Y ., Zhang, Y ., Xiong, Z., Sun, X., Wu, F.: Get: Group event transformer for event-based vision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6038–6048 (2023)

  48. [48]

    Advances in Neural Information Processing Systems33, 16639–16652 (2020)

    Perot, E., De Tournemire, P., Nitti, D., Masci, J., Sironi, A.: Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems33, 16639–16652 (2020)

  49. [49]

    In: Meila, M., Zhang, T

    Ramesh, A., Pavlov, M., Goh, G., Gray, S., V oss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8821–8831. PMLR (18–24 Jul 2021), https://proceedings.mlr.press/ v139/r...

  50. [50]

    IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

    Rebecq, H., Ranftl, R., Koltun, V ., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

  51. [51]

    In: European Conference on Computer Vision

    Santambrogio, R., Cannici, M., Matteucci, M.: Farse-cnn: Fully asynchronous, recurrent and sparse event-based cnn. In: European Conference on Computer Vision. pp. 1–18. Springer (2024) 15

  52. [52]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Schaefer, S., Gehrig, D., Scaramuzza, D.: Aegnn: Asynchronous event-based graph neural net- works. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12371–12381 (2022)

  53. [53]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Sekikawa, Y ., Hara, K., Saito, H.: Eventnet: Asynchronous recursive event processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3887–3896 (2019)

  54. [54]

    In: Applications of Digital Image Processing XLVIII

    Sezavar, A., Brites, C., Ascenso, J., Ebrahimi, T.: A learning-based lossless event data compres- sion for computer vision applications. In: Applications of Digital Image Processing XLVIII. vol. 13605, pp. 230–236. SPIE (2025)

  55. [55]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Shiba, S., Aoki, Y ., Gallego, G.: Simultaneous motion and noise estimation with event cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6959–6969 (2025)

  56. [56]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3693–3702 (2017)

  57. [57]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: Hats: Histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1731–1740 (2018)

  58. [58]

    Advances in Neural Information Processing Systems37, 48784–48809 (2024)

    Vladymyrov, M., V on Oswald, J., Sandler, M., Ge, R.: Linear transformers are versatile in-context learners. Advances in Neural Information Processing Systems37, 48784–48809 (2024)

  59. [59]

    Advances in Neural Information Processing Systems36, 74021–74038 (2023)

    Wang, S., Xue, B.: State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory. Advances in Neural Information Processing Systems36, 74021–74038 (2023)

  60. [60]

    arXiv preprint arXiv:2603.12231 (2026)

    Wang, Y ., Bounou, O., Zhou, G., Balestriero, R., Rudner, T.G., LeCun, Y ., Ren, M.: Temporal straightening for latent planning. arXiv preprint arXiv:2603.12231 (2026)

  61. [61]

    In: IEEE Conf

    Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based learning of optical flow, depth, and egomotion. In: IEEE Conf. Comput. Vis. Pattern Recog. (CVPR). pp. 989–997 (2019). https://doi.org/10.1109/CVPR.2019.00108

  62. [62]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zubic, N., Gehrig, M., Scaramuzza, D.: State space models for event cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5819–5828 (2024) 16