pith. machine review for the scientific record. sign in

arxiv: 2605.04355 · v1 · submitted 2026-05-05 · 💻 cs.CV

Recognition: unknown

InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords autonomous drivingsensor fusionevent camerasDVStransformerreinforcement learningCARLAmultimodal perception
0
0 comments X

The pith

Adding event cameras to InterFuser fusion raises route completion to 100 percent on CARLA benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the InterFuser architecture by adding Dynamic Vision Sensors as a third input alongside RGB cameras and LiDAR. It introduces a token-based fusion method that feeds accumulated event frames directly into the transformer backbone. The goal is to overcome motion blur and latency problems that plague standard sensors in high-speed or high-contrast driving scenes. A sympathetic reader would care because more reliable multimodal perception could translate into fewer failures when lighting changes rapidly or vehicles move quickly. Evaluation on CARLA Leaderboard tasks shows the extended model reaches a Driving Score of 77.2 while completing routes at 100 percent.

Core claim

The authors claim that integrating DVS data through a novel token-based fusion strategy into the transformer backbone of InterFuser produces a more robust driving policy. This policy leverages the complementary strengths of RGB, LiDAR, and asynchronous event streams to maintain performance where conventional sensors degrade. On the CARLA Leaderboard the resulting agent records a competitive Driving Score of 77.2 and perfect Route Completion of 100 percent, indicating that event-based vision supplies useful information precisely in adverse lighting and dynamic conditions.

What carries the argument

Token-based fusion strategy that converts accumulated event frames into tokens and merges them with RGB and LiDAR tokens inside the transformer backbone of the extended InterFuser model.

If this is right

  • The agent maintains higher reliability when traditional cameras suffer motion blur at high speeds.
  • Perception stays effective in scenes with extreme brightness changes where RGB saturates or loses detail.
  • The same fusion approach can be applied to other reinforcement-learning driving policies that already use transformer backbones.
  • Event data reduces dependence on perfect synchronization between RGB and LiDAR streams.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may generalize to real-vehicle hardware if event cameras are mounted alongside existing sensors.
  • Further gains could appear if the fusion tokens are weighted dynamically according to scene conditions.
  • Similar token fusion could be tested in other domains that need low-latency vision such as drone navigation.
  • Longer evaluation runs on CARLA might reveal whether the 100 percent route completion holds across more weather and traffic variations.

Load-bearing premise

The token-based fusion method actually succeeds in letting the transformer use event information in a way that measurably improves robustness over RGB-plus-LiDAR alone.

What would settle it

Re-running the same CARLA evaluation with the DVS branch removed and observing that the Driving Score and Route Completion stay statistically unchanged or improve.

Figures

Figures reproduced from arXiv: 2605.04355 by Kaung Sithua, Maciej Wielgosza, Min Khant Soe Okea, Mustafa Sakhaia.

Figure 1
Figure 1. Figure 1: Overview of the InterFuserDVS Architecture. The model fuses RGB images, LiDAR point clouds, and DVS event frames. Specific backbones (ResNet for RGB/DVS, PointNet for LiDAR) extract features which are tokenized and processed by a global Transformer Encoder. The Decoder predicts waypoints, traffic light states, and occupancy maps, which are used by the Safety Controller to generate final control commands. T… view at source ↗
Figure 2
Figure 2. Figure 2: InterFuserDVS Control Loop: High-level overview of the sensor fusion, transformer processing, and control generation pipeline. Note that the Safety Validation block takes the PID command as input and can override it. where 𝑤𝑝 is the 𝓁1 loss between predicted and ground-truth waypoints. 𝑡𝑟𝑎𝑓 𝑓 𝑖𝑐 is a specialized Multi-View Traffic Loss (MVTL) that enforces consistency between the predicted object density… view at source ↗
Figure 3
Figure 3. Figure 3: Safety Validation Logic: The hybrid controller logic that overrides PID proposals based on traffic light states and predicted future occupancy. It specifically checks if the vehicle is close to a junction before enforcing red light stops, preventing deadlocks. • Driving Score (DS): The primary metric, calculated as the product of Route Completion and (1 - Infraction Penalty). • Route Completion (RC): The p… view at source ↗
Figure 4
Figure 4. Figure 4: Future Architecture Design: Integrating Spiking Neural Networks (SNN) and Asynchronous Fusion Transformer (AFT). The pipeline preserves the asynchronous nature of DVS events through the SNN backbone, using a Discretizer to align spike features with synchronous RGB and LiDAR data before fusion. Sakhai et al.: Preprint submitted to Elsevier Page 14 of 17 view at source ↗
read the original abstract

Autonomous driving systems rely heavily on robust sensor fusion to perceive complex envi- ronments. Traditional setups using RGB cameras and LiDAR often struggle in high-dynamic- range scenes or high-speed scenarios due to motion blur and latency. Dynamic Vision Sensors (DVS), or event cameras, offer a paradigm shift by capturing asynchronous brightness changes with microsecond temporal resolution and high dynamic range. In this paper, we propose an extended architecture of the state-of-the-art InterFuser model, integrating DVS as an additional modality to enhance perception reliability. We introduce a novel token-based fusion strategy that incorporates accumulated event frames into the transformer-based backbone of InterFuser. Our method leverages the complementary nature of RGB, LiDAR, and DVS data. We evaluate our approach on the Car Learning to Act (CARLA) Leaderboard benchmarks, demonstrating that the inclusion of DVS improves the robustness of the driving agent, achieving a competitive Driving Score of 77.2 and a superior Route Completion of 100%. The results indicate that event-based vision is a promising direction for improving safety and performance in adverse lighting and dynamic conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes InterFuserDVS, an extension of the InterFuser architecture that adds Dynamic Vision Sensors (DVS) as a third modality. It introduces a token-based fusion mechanism to incorporate accumulated event frames into the transformer backbone alongside RGB and LiDAR inputs. The central empirical claim is that this fusion improves robustness in high-dynamic-range and high-speed driving scenarios, as evidenced by a Driving Score of 77.2 and 100% Route Completion on the CARLA Leaderboard.

Significance. If the reported gains can be isolated to the DVS modality through controlled experiments, the work would provide concrete evidence that event-based sensing complements conventional cameras and LiDAR in RL-based autonomous driving, particularly under motion blur and lighting extremes. The absence of such controls currently prevents any assessment of whether the result advances the state of multi-modal fusion.

major comments (3)
  1. [Abstract] Abstract: The reported Driving Score of 77.2 and Route Completion of 100% are presented without any matched baseline run of the original InterFuser or an ablation that removes only the DVS branch while holding architecture, RL training procedure, and CARLA scenario distribution fixed. Given the high variance typical of RL driving agents, a single scalar score does not establish that the token-based fusion produces measurable robustness gains.
  2. [Methods] Methods / fusion description: The token-based fusion strategy is introduced at a conceptual level but supplies no equations, pseudocode, or architectural diagram showing how accumulated event frames are tokenized, embedded, and cross-attended with RGB and LiDAR tokens inside the transformer backbone. Without these details the claim that DVS data are “effectively incorporated” cannot be evaluated.
  3. [Experiments] Experiments: No implementation details, training hyperparameters, random seeds, error bars, statistical tests, or leaderboard comparison table are provided. The manuscript therefore offers no way to determine whether the 77.2 score differs from prior InterFuser results by more than training stochasticity.
minor comments (1)
  1. [Abstract] Abstract: The phrase “competitive Driving Score” is used without stating the current leaderboard rank or the exact metric definitions employed by CARLA.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We agree that the current manuscript lacks the necessary controls, technical details, and experimental rigor to substantiate the claimed benefits of DVS integration. We will perform a major revision to address all points raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported Driving Score of 77.2 and Route Completion of 100% are presented without any matched baseline run of the original InterFuser or an ablation that removes only the DVS branch while holding architecture, RL training procedure, and CARLA scenario distribution fixed. Given the high variance typical of RL driving agents, a single scalar score does not establish that the token-based fusion produces measurable robustness gains.

    Authors: We acknowledge the validity of this criticism. The revised manuscript will include a matched baseline evaluation of the original InterFuser under identical training conditions, random seeds, and CARLA scenario distributions. We will also add an ablation study that isolates the DVS branch by removing it while keeping all other components fixed. These additions will enable direct quantification of the contribution of the token-based fusion beyond training stochasticity. revision: yes

  2. Referee: [Methods] Methods / fusion description: The token-based fusion strategy is introduced at a conceptual level but supplies no equations, pseudocode, or architectural diagram showing how accumulated event frames are tokenized, embedded, and cross-attended with RGB and LiDAR tokens inside the transformer backbone. Without these details the claim that DVS data are “effectively incorporated” cannot be evaluated.

    Authors: We agree that the fusion mechanism requires precise specification. In the revision we will add the full mathematical formulation for event-frame accumulation, tokenization, embedding, and the cross-attention equations within the transformer. We will also include pseudocode for the fusion module and a detailed architectural diagram that illustrates the data flow from DVS events through to the shared backbone. revision: yes

  3. Referee: [Experiments] Experiments: No implementation details, training hyperparameters, random seeds, error bars, statistical tests, or leaderboard comparison table are provided. The manuscript therefore offers no way to determine whether the 77.2 score differs from prior InterFuser results by more than training stochasticity.

    Authors: We accept this observation. The revised experiments section will report complete implementation details, the full set of training hyperparameters, the number of random seeds, standard deviations across runs, results of statistical significance tests, and a comparison table that places the 77.2 score against the original InterFuser and other published CARLA leaderboard entries under consistent evaluation protocols. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical simulation results with no derivations or self-referential reductions.

full rationale

The manuscript describes an architectural extension of InterFuser that adds a DVS branch and a token-based fusion module, then reports CARLA leaderboard metrics (Driving Score 77.2, Route Completion 100%). No equations, parameter-fitting procedures, or derivation steps appear in the provided text. The central claim rests on direct simulation outcomes rather than any quantity that is defined in terms of itself or obtained by renaming a fitted input. Self-citations, if present, are not load-bearing for any mathematical result. The evaluation chain is therefore self-contained against the external CARLA benchmark and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions in multi-modal sensor fusion for driving; no free parameters, new axioms beyond common practice, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption RGB, LiDAR, and DVS modalities are complementary for perception in adverse conditions
    Invoked when stating that the method leverages their complementary nature.

pith-pipeline@v0.9.0 · 5511 in / 1175 out tokens · 88020 ms · 2026-05-08T16:57:19.438760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    Expert drivers for autonomous driving

    , 2021. Expert drivers for autonomous driving. urlhttps://kait0.github.io/files/master_thesis_bernhard_jaeger.pdf

  2. [2]

    End to End Learning for Self-Driving Cars

    Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al., 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 URL:https://arxiv.org/abs/1604.07316

  3. [3]

    Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

    Caesar,H.,Bankiti,V.,Lang,A.H.,Vora,S.,Liong,V.E.,Xu,Q.,Krishnan,A.,Pan,Y.,Baldan,G.,Beijbom,O.,2019.nuscenes:Amultimodal dataset for autonomous driving. CoRR abs/1903.11027. URL:http://arxiv.org/abs/1903.11027,arXiv:1903.11027

  4. [4]

    GRI:GeneralReinforcedImitationanditsApplicationtoVision- Based Autonomous Driving,

    Chekroun, R., Toromanoff, M., Hornauer, S., Moutarde, F., 2021. Gri: General reinforced imitation and its application to vision-based autonomous driving. arXiv preprint arXiv:2111.08575

  5. [5]

    Learningtodrivefromaworldonrails,in:ProceedingsoftheIEEE/CVFInternationalConference on Computer Vision, pp

    Chen,D.,Koltun,V.,Krähenbühl,P.,2021. Learningtodrivefromaworldonrails,in:ProceedingsoftheIEEE/CVFInternationalConference on Computer Vision, pp. 15590–15599

  6. [6]

    Learning from all vehicles, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Chen, D., Krähenbühl, P., 2022. Learning from all vehicles, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17222–17231

  7. [7]

    Learning by Cheating,

    Chen, D., Zhou, B., Koltun, V., Krähenbühl, P., 2020. Learning by cheating, in: Conference on Robot Learning, PMLR. pp. 66–75. URL: https://arxiv.org/abs/1912.12294

  8. [8]

    End-to-end autonomous driving: Challenges and frontiers

    Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H., 2024. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927 URL:https://arxiv.org/abs/2306.16927,arXiv:2306.16927

  9. [9]

    Multi-view 3d object detection network for autonomous driving

    Chen, X., Ma, H., Wan, J., Li, B., Xia, T., 2016. Multi-view 3d object detection network for autonomous driving. CoRR abs/1611.07759. URL:http://arxiv.org/abs/1611.07759,arXiv:1611.07759

  10. [10]

    Neat: Neural attention fields for end-to-end autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Chitta, K., Prakash, A., Geiger, A., 2021. Neat: Neural attention fields for end-to-end autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15793–15803. URL:https://arxiv.org/abs/2109.04456

  11. [11]

    Transfuser: Imitation with transformer-based sensor fu- sion for autonomous driving,

    Chitta, K., Prakash, A., Jaeger, B., Yu, Z., Renz, K., Geiger, A., 2022. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving. arXiv preprint arXiv:2205.15997 URL:https://arxiv.org/abs/2205.15997,arXiv:2205.15997

  12. [12]

    End-to-end driving via conditional imitation learn- ing,

    Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A., 2018. End-to-end driving via conditional imitation learning, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE. pp. 4693–4700. URL:https://arxiv.org/abs/1710.02410

  13. [13]

    Exploring the limitations of behavior cloning for autonomous driving, in: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp

    Codevilla, F., Santana, E., Lopez, A., Gaidon, A., 2019. Exploring the limitations of behavior cloning for autonomous driving, in: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9328–9337. doi:10.1109/ICCV.2019.00942

  14. [14]

    Deepreinforcementandilforautonomousdriving:Areviewinthecarlasimulation environment

    Czechowski,P.,Kawa,B.,Sakhai,M.,Wielgosz,M.,2025. Deepreinforcementandilforautonomousdriving:Areviewinthecarlasimulation environment. Applied Sciences 15, 8972. URL:https://doi.org/10.3390/app15168972, doi:10.3390/app15168972

  15. [15]

    Loihi: A neuromorphic manycore processor with on-chip learning

    Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C.K., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y.H., Wild, A., Yang, Y., Wang, H., 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 82–99. ...

  16. [16]

    Temporal efficient training of spiking neural network via gradient re-weighting

    Deng, S., Li, Y., Zhang, S., Gu, S., 2022. Temporal efficient training of spiking neural network via gradient re-weighting. URL:https: //openreview.net/forum?id=_XNtisL32jv

  17. [17]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit,J.,Houlsby,N.,2020. Animageisworth16x16words:Transformersforimagerecognitionatscale. CoRRabs/2010.11929. URL: https://arxiv.org/abs/2010.11929,arXiv:2010.11929

  18. [18]

    CARLA: An open urban driving simulator, in: Levine, S., Vanhoucke, V., Goldberg, K

    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V., 2017. CARLA: An open urban driving simulator, in: Levine, S., Vanhoucke, V., Goldberg, K. (Eds.), Proceedings of the 1st Annual Conference on Robot Learning, PMLR. pp. 1–16. URL:https://proceedings. mlr.press/v78/dosovitskiy17a.html

  19. [19]

    Fang,W.,Yu,Z.,Chen,Y.,Masquelier,T.,Huang,T.,Tian,Y.,2021. IncorporatingLearnableMembraneTimeConstanttoEnhanceLearning of Spiking Neural Networks , in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE Computer Society, Los Alamitos, CA, USA. pp. 2641–2651. URL:https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.00266, doi:10. ...

  20. [20]

    Event-based vision: A survey

    Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A.J., Conradt, J., Daniilidis, K., Scaramuzza, D., 2019. Event-based vision: A survey. CoRR abs/1904.08405. URL:http://arxiv.org/abs/1904.08405, arXiv:1904.08405

  21. [21]

    Combining events and frames using recurrent asyn- chronous multimodal networks for monocular depth prediction

    Gehrig, D., Rüegg, M., Gehrig, M., Hidalgo-Carrió, J., Scaramuzza, D., 2021. Combining events and frames using recurrent asyn- chronous multimodal networks for monocular depth prediction. CoRR abs/2102.09320. URL:https://arxiv.org/abs/2102.09320, arXiv:2102.09320

  22. [22]

    The International Journal of Robotics Research32(11), 1231–1237 (2013).https://doi.org/10.1177/0278364913491297,https: //journals.sagepub.com/doi/10.1177/02783649134912972

    Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 1231–1237. URL:https://doi.org/10.1177/0278364913491297, doi:10.1177/0278364913491297, arXiv:https://doi.org/10.1177/0278364913491297

  23. [23]

    Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp

    He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  24. [24]

    V2E: from video frames to realistic DVS event camera streams

    Hu, Y., Liu, S., Delbruck, T., 2020. V2E: from video frames to realistic DVS event camera streams. CoRR abs/2006.07722. URL: https://arxiv.org/abs/2006.07722,arXiv:2006.07722

  25. [25]

    Planning-oriented autonomous driving

    Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., Lu, L., Jia, X., Liu, Q., Dai, J., Qiao, Y., Li, H., 2023. Planning-oriented autonomous driving. arXiv preprint arXiv:2212.10156 URL:https://arxiv.org/abs/2212.10156, arXiv:2212.10156

  26. [26]

    Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

    Huang, J., Huang, G., Zhu, Z., Du, D., 2021. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. CoRR abs/2112.11790. URL:https://arxiv.org/abs/2112.11790,arXiv:2112.11790

  27. [27]

    Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art

    Janai, J., Güney, F., Behl, A., Geiger, A., 2017. Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art. CoRR abs/1704.05519. URL:http://arxiv.org/abs/1704.05519,arXiv:1704.05519. Sakhai et al.:Preprint submitted to ElsevierPage 15 of 17 InterFuserDVS

  28. [28]

    Jiang, S

    Jiang,B.,Chen,S.,Xu,Q.,Liao,B.,Chen,J.,Zhou,H.,Zhang,Q.,Liu,W.,Huang,C.,Wang,X.,2023. Vad:Vectorizedscenerepresentation for efficient autonomous driving. arXiv preprint arXiv:2303.12077 URL:https://arxiv.org/abs/2303.12077,arXiv:2303.12077

  29. [29]

    Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice 94, 182–193

    Kalra, N., Paddock, S.M., 2016. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice 94, 182–193. URL:https://www.sciencedirect.com/science/article/pii/ S0965856416302129, doi:https://doi.org/10.1016/j.tra.2016.09.010

  30. [30]

    Challenges in autonomous vehicle testing and validation

    Koopman, P., Wagner, M.D., 2016. Challenges in autonomous vehicle testing and validation. SAE International journal of transportation safety 4, 15–24. URL:https://api.semanticscholar.org/CorpusID:35473171

  31. [31]

    Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers.arXiv preprint arXiv:2203.17270, 2022

    Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., Yu, Q., Dai, J., 2022. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270 URL:https://arxiv.org/abs/2203.17270, arXiv:2203.17270

  32. [32]

    Bevfusion:Asimpleandrobustlidar-camerafusion framework

    Liang,T.,Xie,H.,Yu,K.,Xia,Z.,Lin,Z.,Wang,Y.,Tang,T.,Wang,B.,Tang,Z.,2022. Bevfusion:Asimpleandrobustlidar-camerafusion framework. arXiv preprint arXiv:2205.13790 URL:https://arxiv.org/abs/2205.13790,arXiv:2205.13790

  33. [33]

    , Posch , C

    Lichtsteiner,P.,Posch,C.,Delbruck,T.,2008. A128×128120db15𝜇slatencyasynchronoustemporalcontrastvisionsensor. IEEEJournal of Solid-State Circuits 43, 566–576. doi:10.1109/JSSC.2007.914337

  34. [34]

    2023 IEEE International Conference on Robotics and Automation (ICRA) , volume =

    Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., Han, S., 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird’s- eye view representation, in: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2774–2781. doi:10.1109/ ICRA48891.2023.10160968

  35. [35]

    Event-based vision meets deep learning on steering prediction for self-driving cars, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D., 2018. Event-based vision meets deep learning on steering prediction for self-driving cars, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5419–5427. URL:https: //openaccess.thecvf.com/content_cvpr_2018/papers/Maqueda_Event-Based_Vision_Meets_CVPR...

  36. [36]

    Event-based asynchronous sparse convolutional networks

    Messikommer, N., Gehrig, D., Loquercio, A., Scaramuzza, D., 2020. Event-based asynchronous sparse convolutional networks. CoRR abs/2003.09148. URL:https://arxiv.org/abs/2003.09148,arXiv:2003.09148

  37. [37]

    Visual-inertial monocular SLAM with map reuse

    Mur-Artal, R., Tardós, J.D., 2016. Visual-inertial monocular SLAM with map reuse. CoRR abs/1610.05949. URL:http://arxiv.org/ abs/1610.05949,arXiv:1610.05949

  38. [38]

    Lmdrive dataset: 64k instruction-sensor-control data clips for autonomous driving.https://openxlab.org.cn/ datasets/deepcs233/LMDrive

    OpenXLab, 2024. Lmdrive dataset: 64k instruction-sensor-control data clips for autonomous driving.https://openxlab.org.cn/ datasets/deepcs233/LMDrive. Multi-modal dataset collected in CARLA simulator with navigation instructions

  39. [39]

    Deep learning with spiking neurons: Opportunities and challenges

    Pfeiffer, M., Pfeil, T., 2018. Deep learning with spiking neurons: Opportunities and challenges. Frontiers in Neuroscience 12. URL: https://api.semanticscholar.org/CorpusID:53086859

  40. [40]

    Alvinn: An autonomous land vehicle in a neural network, in: Advances in Neural Information Processing Systems, pp

    Pomerleau, D.A., 1989. Alvinn: An autonomous land vehicle in a neural network, in: Advances in Neural Information Processing Systems, pp. 305–313. URL:https://proceedings.neurips.cc/paper/1988/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf

  41. [41]

    7077–7087

    Prakash,A.,Chitta,K.,Geiger,A.,2021.Multi-modalfusiontransformerforend-to-endautonomousdriving,in:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087. URL:https://arxiv.org/abs/2104.09224

  42. [42]

    Pointnet:Deeplearningonpointsetsfor3dclassificationandsegmentation,in:Proceedingsof the IEEE conference on computer vision and pattern recognition, pp

    Qi,C.R.,Su,H.,Mo,K.,Guibas,L.J.,2017a. Pointnet:Deeplearningonpointsetsfor3dclassificationandsegmentation,in:Proceedingsof the IEEE conference on computer vision and pattern recognition, pp. 652–660

  43. [43]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space, in: Advances in Neural Information Processing Systems, pp

    Qi, C.R., Yi, L., Su, H., Guibas, L.J., 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space, in: Advances in Neural Information Processing Systems, pp. 5099–5108

  44. [44]

    Events-to-video: Bringing modern computer vision to event cameras, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D., 2019. Events-to-video: Bringing modern computer vision to event cameras, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3852–3861. doi:10.1109/CVPR.2019.00398

  45. [45]

    Towards spike-based machine intelligence with neuromorphic computing

    Roy, K., Jaiswal, A.R., Panda, P., 2019. Towards spike-based machine intelligence with neuromorphic computing. Nature 575, 607 – 617. URL:https://api.semanticscholar.org/CorpusID:208329736

  46. [46]

    Spiking neural networks for real-time pedestrian street-crossing detection using dynamic vision sensors in simulated adverse weather conditions

    Sakhai, M., Mazurek, S., Caputa, J., Argasiński, J.K., Wielgosz, M., 2024. Spiking neural networks for real-time pedestrian street-crossing detection using dynamic vision sensors in simulated adverse weather conditions. Electronics 13. URL:https://www.mdpi.com/ 2079-9292/13/21/4280, doi:10.3390/electronics13214280

  47. [47]

    Evaluatingsyntheticvs.realdynamicvisionsensordataforsnn-based objectdetectionclassification

    Sakhai,M.,Sithu,K.,Oke,M.K.S.,Mazurek,S.,Wielgosz,M.,2025a. Evaluatingsyntheticvs.realdynamicvisionsensordataforsnn-based objectdetectionclassification. Availableat:https://events.plgrid.pl/event/70/attachments/143/364/PROCEEDINGS%202025_ na%20www%20bez%20notatek.pdf

  48. [48]

    Cyberattack resilience of autonomous vehicle sensor systems: Evaluating rgb vs

    Sakhai, M., Sithu, K., Oke, M.K.S., Wielgosz, M., 2025b. Cyberattack resilience of autonomous vehicle sensor systems: Evaluating rgb vs. dynamic vision sensors in carla. Applied Sciences 15, 7493. URL:https://doi.org/10.3390/app15137493, doi:10.3390/ app15137493

  49. [49]

    Dvs-pedx: Synthetic-and-real event-based pedestrian dataset

    Sakhai, M., Sithu, K., Oke, M.K.S., Wielgosz, M., 2026. Dvs-pedx: Synthetic-and-real event-based pedestrian dataset. Scientific Data 13,

  50. [50]

    URL:https://doi.org/10.1038/s41597-026-06969-y, doi:10.1038/s41597-026-06969-y

  51. [51]

    Computing spatial image convolutions for event cameras

    Scheerlinck, C., Barnes, N., Mahony, R.E., 2018. Computing spatial image convolutions for event cameras. CoRR abs/1812.00438. URL: http://arxiv.org/abs/1812.00438,arXiv:1812.00438

  52. [52]

    Opportunities for neuromorphic computing algorithms and applications

    Schuman, C.D., Kulkarni, S.R., Parsa, M., Mitchell, J.P., Date, P., Kay, B., 2022. Opportunities for neuromorphic computing algorithms and applications. NatureComputationalScience2. URL:https://www.osti.gov/biblio/1881146,doi:10.1038/s43588-021-00184-y

  53. [53]

    Safety-enhancedautonomousdrivingusinginterpretablesensorfusiontransformer

    Shao,H.,Wang,L.,Chen,R.,Li,H.,Liu,Y.,2022. Safety-enhancedautonomousdrivingusinginterpretablesensorfusiontransformer. arXiv preprint arXiv:2207.14024 URL:https://arxiv.org/abs/2207.14024,arXiv:2207.14024

  54. [54]

    Reasonnet:End-to-enddriving withtemporaland globalreasoning,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Shao,H., Wang,L.,Chen, R.,Waslander,S.L., Liu,Y.,Ma, H.,2023. Reasonnet:End-to-enddriving withtemporaland globalreasoning,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13723–13733

  55. [55]

    Pcla: A framework for testing autonomous agents in the carla simulator

    Tehrani, M.J., Kim, J., Tonella, P., 2025. Pcla: A framework for testing autonomous agents in the carla simulator. arXiv preprint arXiv:2503.09385 URL:https://arxiv.org/abs/2503.09385,arXiv:2503.09385

  56. [56]

    End-to-end model-free reinforcement learning for urban driving using implicit affordances, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Toromanoff, M., Wirbel, E., Moutarde, F., 2020. End-to-end model-free reinforcement learning for urban driving using implicit affordances, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7153–7162. Sakhai et al.:Preprint submitted to ElsevierPage 16 of 17 InterFuserDVS

  57. [57]

    Attention Is All You Need

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser,L.,Polosukhin,I.,2017. Attentionisallyouneed. CoRR abs/1706.03762. URL:http://arxiv.org/abs/1706.03762,arXiv:1706.03762

  58. [58]

    The ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios

    Vidal, A.R., Rebecq, H., Horstschaefer, T., Scaramuzza, D., 2018. The ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios. IEEE Robotics and Automation Letters 3, 2942–2949. URL:https://ieeexplore.ieee.org/ document/8337777

  59. [59]

    DETR3D:3dobjectdetectionfrommulti-viewimagesvia3d-to-2d queries

    Wang,Y.,Guizilini,V.,Zhang,T.,Wang,Y.,Zhao,H.,Solomon,J.,2021. DETR3D:3dobjectdetectionfrommulti-viewimagesvia3d-to-2d queries. CoRR abs/2110.06922. URL:https://arxiv.org/abs/2110.06922,arXiv:2110.06922

  60. [60]

    Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline

    Wu, P., Jia, X., Chen, L., Yan, J., Li, H., Qiao, Y., 2022. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. arXiv preprint arXiv:2206.08129 URL:https://arxiv.org/abs/2206.08129,arXiv:2206.08129

  61. [61]

    M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving.arXiv preprint arXiv:2403.12552, 2024

    Xu, D., Li, H., Wang, Q., Song, Z., Chen, L., Deng, H., 2024. M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving. arXiv preprint arXiv:2403.12552

  62. [62]

    A survey of autonomous driving: Common practices and emerging technologies

    Yurtsever, E., Lambert, J., Carballo, A., Takeda, K., 2020. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 8, 58443–58469. doi:10.1109/ACCESS.2020.2983149

  63. [63]

    Learning to see in the dark with events, in: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M

    Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., Zhou, B., 2020. Learning to see in the dark with events, in: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (Eds.), Computer Vision – ECCV 2020, Springer International Publishing, Cham. pp. 666–682

  64. [64]

    In: Proceedings of the IEEE/CVF ICCV, pp

    Zhang, Z., Liniger, A., Dai, D., Yu, F., Van Gool, L., 2021. End-to-end urban driving by imitating a reinforcement learning coach, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15202–15212. doi:10.1109/ICCV48922.2021.01494

  65. [65]

    Unsupervised event-based learning of optical flow, depth, and egomotion

    Zhu, A.Z., Yuan, L., Chaney, K., Daniilidis, K., 2018. Unsupervised event-based learning of optical flow, depth, and egomotion. CoRR abs/1812.08156. URL:http://arxiv.org/abs/1812.08156,arXiv:1812.08156. Sakhai et al.:Preprint submitted to ElsevierPage 17 of 17