AERO-VIS: Asynchronous Event-based Real-time Onboard Visual-Inertial SLAM
Pith reviewed 2026-05-11 03:36 UTC · model grok-4.3
The pith
AERO-VIS is the first purely event-based inertial SLAM system to achieve closed-loop UAV control and large-scale estimation using only onboard compute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present AERO-VIS, a stereo event-inertial SLAM system with an integrated, data-driven, robust, and performance-optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low-latency and real-time performance. When deploying AERO-VIS on a UAV, we achieve unprecedented accuracy in onboard event-based SLAM. These unique characteristics enable us to present the first purely event-based inertial SLAM system that demonstrates closed-loop UAV control and large-scale state estimation while relying solely on onboard compute.
What carries the argument
asynchronous event-stream processing combined with an integrated data-driven keypoint detector in a stereo event-inertial SLAM pipeline
If this is right
- The system runs in real time on onboard hardware without external GPUs or servers.
- Event cameras provide robustness to motion blur and high dynamic range during fast UAV maneuvers.
- Asynchronous adaptation maintains low latency even when compute load varies.
- Large-scale state estimation becomes possible using only events and inertial data.
Where Pith is reading between the lines
- Similar asynchronous pipelines could be tested on other high-speed platforms such as ground robots in low-light tunnels.
- The learned detector might be swapped for alternative event-based feature extractors to measure the contribution of the data-driven component.
- Extending the stereo setup to more cameras could improve scale estimation without adding synchronized frame processing.
Load-bearing premise
The asynchronous processing and data-driven detector keep tracking accurate and free of latency spikes or drift that would break closed-loop UAV control under all tested and untested flight conditions.
What would settle it
A controlled UAV flight in which event-based tracking drifts or latency spikes appear and the vehicle loses autonomous control would falsify the claim of reliable closed-loop operation.
Figures
read the original abstract
The robustness of event cameras to high dynamic range and motion blur holds the potential to improve visual odometry systems in challenging environments. Although their high temporal resolution does not require synchronous processing, most event-based odometry methods still run at fixed rates, which simplifies system design but restricts latency and throughput. In this work, we present AERO-VIS, a stereo event-inertial SLAM system with an integrated, data-driven, robust, and performance-optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low-latency and real-time performance. When deploying AERO-VIS on a UAV, we achieve unprecedented accuracy in onboard event-based SLAM. These unique characteristics enable us to present the first purely event-based inertial SLAM system that demonstrates closed-loop UAV control and large-scale state estimation while relying solely on onboard compute. A video of the experiments and the source code are available at ethz-mrl.github.io/AERO-VIS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents AERO-VIS, a stereo event-inertial SLAM system that integrates a data-driven keypoint detector and processes the event stream asynchronously to dynamically adapt to runtime demands. It claims to deliver low-latency real-time performance on UAV onboard compute, achieving unprecedented accuracy in large-scale state estimation and enabling the first demonstration of closed-loop UAV control using a purely event-based inertial SLAM pipeline. Source code and experiment videos are released.
Significance. If the experimental validation holds, the work is significant for event-based robotics as it demonstrates practical closed-loop control and large-scale onboard SLAM in high-dynamic-range and motion-blur conditions where frame-based cameras typically fail. The explicit release of source code is a clear strength that supports reproducibility and community extension. The asynchronous design addresses a key limitation of prior fixed-rate event methods.
major comments (1)
- [§5] §5 (Experimental Results): The central claim of enabling stable closed-loop UAV control and large-scale estimation rests on the asynchronous pipeline and learned detector maintaining bounded latency and low drift. However, the results do not include worst-case latency histograms, event-rate vs. latency curves, or drift-vs-speed analysis under high-dynamic regimes, leaving the robustness assumption unverified for untested flight conditions.
minor comments (1)
- [Abstract] The abstract asserts 'unprecedented accuracy' without inline quantitative values; moving at least one key metric (e.g., ATE or RPE) into the abstract would strengthen the summary.
Simulated Author's Rebuttal
We thank the referee for the thorough review and positive recommendation for minor revision. We are pleased that the significance of the asynchronous design and the release of source code are recognized. Below we address the major comment regarding the experimental results.
read point-by-point responses
-
Referee: §5 (Experimental Results): The central claim of enabling stable closed-loop UAV control and large-scale estimation rests on the asynchronous pipeline and learned detector maintaining bounded latency and low drift. However, the results do not include worst-case latency histograms, event-rate vs. latency curves, or drift-vs-speed analysis under high-dynamic regimes, leaving the robustness assumption unverified for untested flight conditions.
Authors: We agree that providing more detailed analysis on latency and drift would strengthen the validation of our claims. Our current experiments include multiple UAV flights in challenging conditions with varying event rates and dynamics, demonstrating real-time performance and low drift. To address this, we will include in the revised manuscript worst-case latency histograms, event-rate versus latency curves, and drift versus speed plots derived from the logged data of our high-dynamic flights. This will explicitly show the bounded latency and robustness under the tested regimes. revision: yes
Circularity Check
No circularity: engineering system with external validation
full rationale
The paper describes an integrated SLAM system (asynchronous event processing, data-driven detector, stereo event-inertial fusion, UAV closed-loop control) whose claims rest on implementation details, runtime measurements, and hardware experiments rather than any derivation chain. No equations, fitted parameters, or first-principles results are presented that reduce to their own inputs by construction. The abstract and system description contain no self-definitional loops, no 'prediction' of quantities already used in fitting, and no load-bearing uniqueness theorems imported from prior self-citations. External code release and experimental results provide independent falsifiability, keeping the work self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Event cameras output asynchronous brightness-change events that are robust to motion blur and high dynamic range.
- standard math Inertial measurements can be integrated with visual features for state estimation in SLAM.
Reference graph
Works this paper leans on
-
[1]
Superevent: Cross- modal learning of event-based keypoint detection for slam,
Y . Burkhardt, S. Schaefer, and S. Leutenegger, “Superevent: Cross- modal learning of event-based keypoint detection for slam,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
work page 2025
-
[2]
ESVO2: Direct visual-inertial odometry with stereo event cameras,
J. Niu, S. Zhong, X. Lu, S. Shen, G. Gallego, and Y . Zhou, “ESVO2: Direct visual-inertial odometry with stereo event cameras,”IEEE Transactions on Robotics, 2025
work page 2025
-
[3]
Deep visual odometry for stereo event cameras,
S. Zhong, J. Niu, and Y . Zhou, “Deep visual odometry for stereo event cameras,”IEEE Robotics and Automation Letters, 2025
work page 2025
-
[4]
ESVIO: Event-based stereo visual inertial odometry,
P. Chen, W. Guan, and P. Lu, “ESVIO: Event-based stereo visual inertial odometry,”IEEE Robotics and Automation Letters, 2023
work page 2023
-
[5]
Indoor UA V navigation using event cameras and intermediate frame reconstruction,
D. Tejero-Ruiz, D. Sol ´ıs-Mart´ın, F. J. P´erez-Grau, and J. Borrego-D´ıaz, “Indoor UA V navigation using event cameras and intermediate frame reconstruction,”Computer Vision and Image Understanding, 2026
work page 2026
-
[6]
OKVIS2: Realtime scalable visual-inertial SLAM with loop closure,
S. Leutenegger, “OKVIS2: Realtime scalable visual-inertial SLAM with loop closure,”arXiv preprint, 2022
work page 2022
-
[7]
Event-based visual inertial odometry,
A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based visual inertial odometry,” in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[8]
Feature-based event stereo visual odometry,
A. Hadviger, I. Cvi ˇsi´c, I. Markovi ´c, S. Vra ˇzi´c, and I. Petrovi ´c, “Feature-based event stereo visual odometry,” in2021 European Conference on Mobile Robots (ECMR), 2021
work page 2021
-
[9]
HOTS: A hierarchy of event-based time-surfaces for pattern recognition,
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benos- man, “HOTS: A hierarchy of event-based time-surfaces for pattern recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
work page 2017
-
[10]
Asynchronous corner detection and track- ing for event cameras in real time,
I. Alzugaray and M. Chli, “Asynchronous corner detection and track- ing for event cameras in real time,”IEEE Robotics and Automation Letters, 2018
work page 2018
-
[11]
Event-based line slam in real-time,
W. Chamorro, J. Sola, and J. Andrade-Cetto, “Event-based line slam in real-time,”IEEE Robotics and Automation Letters, 2022
work page 2022
-
[12]
EMVS: Event-based multi-view stereo—3D reconstruction with an event cam- era in real-time,
H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “EMVS: Event-based multi-view stereo—3D reconstruction with an event cam- era in real-time,”International Journal of Computer Vision, 2018
work page 2018
-
[13]
W. Guan and P. Lu, “Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022
work page 2022
-
[14]
An iterative image registration technique with an application to stereo vision,
B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inIJCAI’81: 7th international joint conference on Artificial intelligence, 1981
work page 1981
-
[15]
Real-time 3D reconstruc- tion and 6-dof tracking with an event camera,
H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3D reconstruc- tion and 6-dof tracking with an event camera,” inEuropean conference on computer vision, 2016
work page 2016
-
[16]
EVO: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,
H. Rebecq, T. Horstsch ¨afer, G. Gallego, and D. Scaramuzza, “EVO: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,”IEEE Robotics and Automation Letters, 2016
work page 2016
-
[17]
ES-PTAM: Event-based stereo parallel tracking and mapping,
S. Ghosh, V . Cavinato, and G. Gallego, “ES-PTAM: Event-based stereo parallel tracking and mapping,” inEuropean Conference on Computer Vision (ECCV) Workshops, 2024
work page 2024
-
[18]
Multi-event-camera depth estimation and outlier rejection by refocused events fusion,
S. Ghosh and G. Gallego, “Multi-event-camera depth estimation and outlier rejection by refocused events fusion,”Advanced Intelligent Systems, 2022
work page 2022
-
[19]
Event-based stereo visual odom- etry,
Y . Zhou, G. Gallego, and S. Shen, “Event-based stereo visual odom- etry,”IEEE Transactions on Robotics, 2021
work page 2021
-
[20]
Z. Liu, D. Shi, R. Li, Y . Zhang, and S. Yang, “T-ESVO: improved event-based stereo visual odometry via adaptive time-surface and truncated signed distance function,”Advanced Intelligent Systems, 2023
work page 2023
-
[21]
ESVIO: event-based stereo visual- inertial odometry,
Z. Liu, D. Shi, R. Li, and S. Yang, “ESVIO: event-based stereo visual- inertial odometry,”Sensors, 2023
work page 2023
-
[22]
IMU-aided event-based stereo visual odometry,
J. Niu, S. Zhong, and Y . Zhou, “IMU-aided event-based stereo visual odometry,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024
work page 2024
-
[23]
Unsupervised learning of dense optical flow, depth and egomotion from sparse event data,
C. Ye, A. Mitrokhin, C. Ferm ¨uller, J. A. Yorke, and Y . Aloimonos, “Unsupervised learning of dense optical flow, depth and egomotion from sparse event data,”arXiv preprint, 2018
work page 2018
-
[24]
Unsupervised event-based learning of optical flow, depth, and egomotion,
A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019
work page 2019
-
[25]
S. Klenk, M. Motzet, L. Koestler, and D. Cremers, “Deep event visual odometry,” in2024 International Conference on 3D Vision (3DV), 2024
work page 2024
-
[26]
Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” Advances in Neural Information Processing Systems, 2023
work page 2023
-
[27]
DEIO: Deep event inertial odometry,
W. Guan, F. Lin, P. Chen, and P. Lu, “DEIO: Deep event inertial odometry,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025
work page 2025
-
[28]
Fast image reconstruction with an event camera,
C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, and D. Scaramuzza, “Fast image reconstruction with an event camera,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020
work page 2020
-
[29]
OpenVINS: A research platform for visual-inertial estimation,
P. Geneva, K. Eckenhoff, W. Lee, Y . Yang, and G. Huang, “OpenVINS: A research platform for visual-inertial estimation,” in2020 IEEE International Conference on Robotics and Automation (ICRA), 2020
work page 2020
-
[30]
SuperPoint: Self- supervised interest point detection and description,
D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self- supervised interest point detection and description,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018
work page 2018
-
[31]
MaxViT: Multi-axis vision transformer,
Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y . Li, “MaxViT: Multi-axis vision transformer,” inEuropean conference on computer vision, 2022
work page 2022
-
[32]
Very deep convolutional networks for large-scale image recognition,
K. Simonyan, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint, 2014
work page 2014
-
[33]
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for MobileNetV3,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2019
work page 2019
-
[34]
Bags of binary words for fast place recognition in image sequences,
D. G ´alvez-L´opez and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,”IEEE Transactions on robotics, 2012
work page 2012
-
[35]
E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza, “The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam,”The International Journal of Robotics Research, 2017
work page 2017
-
[36]
Event-aided direct sparse odometry,
J. Hidalgo-Carri ´o, G. Gallego, and D. Scaramuzza, “Event-aided direct sparse odometry,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
work page 2022
-
[37]
Semi-dense 3D reconstruction with a stereo event camera,
Y . Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza, “Semi-dense 3D reconstruction with a stereo event camera,” inPro- ceedings of the European Conference on Computer Vision, 2018
work page 2018
-
[38]
TUM-VIE: The TUM stereo visual-inertial event dataset,
S. Klenk, J. Chui, N. Demmel, and D. Cremers, “TUM-VIE: The TUM stereo visual-inertial event dataset,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
work page 2021
-
[39]
VECtor: A versatile event-centric benchmark for multi- sensor SLAM,
L. Gao, Y . Liang, J. Yang, S. Wu, C. Wang, J. Chen, and L. Kneip, “VECtor: A versatile event-centric benchmark for multi- sensor SLAM,”IEEE Robotics and Automation Letters, 2022
work page 2022
-
[40]
D. Tzoumanikas, W. Li, M. Grimm, K. Zhang, M. Kovac, and S. Leutenegger, “Fully autonomous micro air vehicle flight and land- ing on a moving target using visual–inertial estimation and model- predictive control,”Journal of Field Robotics, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.