Autonomous FPV Flight with Translational Optical Flow and Uncertainty Mask
Pith reviewed 2026-06-27 16:45 UTC · model grok-4.3
The pith
Decomposing optical flow into translational components and adding an uncertainty mask enables robust monocular FPV quadrotor flight at speeds exceeding 11 m/s in real forests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By decomposing optical flow into translational and rotational components and utilizing only the translational flow along with an uncertainty mask from forward and backward estimate inconsistencies, a neural network policy trained in differentiable simulation can guide a quadrotor through complex forest environments at high speeds using solely monocular RGB input.
What carries the argument
translational optical flow combined with uncertainty mask from forward-backward flow inconsistencies
Load-bearing premise
The uncertainty mask derived from inconsistencies between forward and backward flow estimates accurately highlights obstacle structures including those straight ahead of the camera, and the simulation-trained policy works in real flights.
What would settle it
A real-world flight collision into an object straight ahead of the camera despite the uncertainty mask, or real-world speeds much lower than the 11.79 m/s achieved in tests due to transfer issues.
Figures
read the original abstract
Autonomous FPV quadrotor flight in complex environments using a monocular RGB camera as the sole exteroceptive sensor remains a fundamental challenge. Recent research has shown that using optical flow as the input of a neural network can achieve end-to-end autonomous flight in cluttered scenes. However, extracting the most relevant information from the flow estimation is the key bottleneck limiting agility and robustness. Existing methods struggle to disentangle obstacle-induced optical flow from the ego-motion background flow and suffer from low signal-to-noise ratios near the focus of expansion (FoE). To address these issues, we decompose the optical flow into translational and rotational components and utilize only the translational flow, which captures scene geometry and depth cues. In addition, we introduce an uncertainty mask derived from inconsistencies between forward and backward flow estimates. This mask highlights obstacle structures, including those within the FoE region. Both cues are fed to a control policy trained in a differentiable simulation framework, which enables efficient first-order optimization across perception and control. We validate our approach through extensive experiments in both simulated and real-world forest environments. The proposed system achieves robust flight at speeds of up to 13.91 m/s in simulation and 11.79 m/s in real-world tests, with a 93.3\% success rate over 30 real-world trials, nearly doubling the previously reported 6 m/s real-world speed of the monocular-RGB optical-flow UAV obstacle avoidance system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that decomposing monocular optical flow into its translational component (discarding rotational) and augmenting it with an uncertainty mask computed from forward-backward flow inconsistencies allows a neural policy, trained end-to-end in a differentiable simulator, to achieve robust high-speed FPV flight. Reported results are 13.91 m/s peak speed in simulation, 11.79 m/s in real forest flights, and 93.3 % success over 30 real-world trials, nearly doubling the prior 6 m/s monocular-RGB baseline.
Significance. If the reported speeds and success rate are reproducible and the contribution of the translational-flow-plus-mask decomposition is isolated, the work would constitute a meaningful advance in monocular vision-based UAV navigation by addressing the low-SNR problem near the focus of expansion. The use of differentiable simulation for joint perception-control optimization is a methodological strength that could be adopted more broadly.
major comments (3)
- [Abstract and §4] Abstract and §4 (Experiments): the headline performance numbers (13.91 m/s sim, 11.79 m/s real, 93.3 % success in 30 trials) are presented without error bars, per-trial statistics, or ablation studies that isolate the uncertainty mask; without these data it is impossible to attribute the doubling of prior speed to the proposed decomposition rather than to other unstated factors.
- [Abstract and §3.2] Abstract and §3.2 (Uncertainty Mask): the claim that the forward-backward inconsistency mask “highlights obstacle structures, including those within the FoE region” is central to the method, yet no quantitative metric (e.g., IoU with ground-truth depth edges inside the FoE on real imagery) or controlled comparison with/without the mask is supplied.
- [§3.3 and §4.2] §3.3 and §4.2 (Sim-to-Real): the policy is optimized in differentiable simulation and deployed zero-shot on real hardware; the manuscript supplies no domain-randomization schedule, noise-model calibration, or failure-case analysis that would substantiate transfer of the learned mapping when real flow statistics differ from the simulator.
minor comments (2)
- [Abstract] The abstract states “extensive experiments in both simulated and real-world forest environments” but does not report the number of simulation episodes or the precise forest density parameters used for training versus testing.
- [§3] Notation for the forward and backward flow fields (F_f, F_b) and the exact formula for the inconsistency mask should be introduced once in §3 and used consistently thereafter.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments identify areas where additional quantitative support and methodological detail would strengthen the claims. We address each point below and commit to revisions that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the headline performance numbers (13.91 m/s sim, 11.79 m/s real, 93.3 % success in 30 trials) are presented without error bars, per-trial statistics, or ablation studies that isolate the uncertainty mask; without these data it is impossible to attribute the doubling of prior speed to the proposed decomposition rather than to other unstated factors.
Authors: We agree that error bars, per-trial statistics, and ablation studies are required to isolate the contribution of the uncertainty mask and translational decomposition. In the revised manuscript we will report standard deviations across repeated trials for both simulation and real-world speeds and success rates. We will also add an ablation table in §4 that compares the full pipeline against (i) translational flow without the mask and (ii) full optical flow without decomposition, using the same training protocol. These additions will make the attribution of the reported speed increase explicit. revision: yes
-
Referee: [Abstract and §3.2] Abstract and §3.2 (Uncertainty Mask): the claim that the forward-backward inconsistency mask “highlights obstacle structures, including those within the FoE region” is central to the method, yet no quantitative metric (e.g., IoU with ground-truth depth edges inside the FoE on real imagery) or controlled comparison with/without the mask is supplied.
Authors: Section 3.2 currently supports the claim with qualitative examples. We acknowledge the absence of quantitative metrics. In revision we will add a controlled ablation performed in simulation (where dense ground-truth depth is available) that measures policy success rate and collision distance with versus without the mask; we will also report the fraction of high-uncertainty pixels that coincide with depth discontinuities inside the FoE. For real imagery, pixel-accurate ground-truth depth edges are unavailable in our dataset, so we will instead provide a quantitative correlation between mask values and optical-flow magnitude near the FoE across the 30 real trials. revision: partial
-
Referee: [§3.3 and §4.2] §3.3 and §4.2 (Sim-to-Real): the policy is optimized in differentiable simulation and deployed zero-shot on real hardware; the manuscript supplies no domain-randomization schedule, noise-model calibration, or failure-case analysis that would substantiate transfer of the learned mapping when real flow statistics differ from the simulator.
Authors: We will expand §3.3 with the exact domain-randomization schedule (ranges for lighting, texture density, camera intrinsics, and additive flow noise) and the procedure used to calibrate the noise model from real-world optical-flow statistics collected on the target hardware. In §4.2 we will add a failure-case breakdown of the two unsuccessful real-world trials together with a discussion of how the uncertainty mask influenced behavior in the 28 successful runs. These details will be included in the revised manuscript. revision: yes
Circularity Check
No circularity; results are empirical validation of a proposed decomposition and policy.
full rationale
The paper decomposes optical flow into translational/rotational components, derives an uncertainty mask from forward-backward flow inconsistencies, and trains a policy in differentiable simulation before reporting measured speeds and success rates from separate simulation and real-world trials. These outcomes are presented as experimental results rather than any quantity that reduces by construction to fitted parameters or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked in the provided text. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Champion-level drone racing using deep reinforcement learning,
E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023
2023
-
[2]
Honeybee navigation en route to the goal: visual flight control and odometry,
M. V . Srinivasan, S. Zhang, M. Lehrer, and T. Collett, “Honeybee navigation en route to the goal: visual flight control and odometry,” Journal of Experimental Biology, vol. 199, no. 1, pp. 237–244, 1996
1996
-
[3]
Gap perception in bumblebees,
S. Ravi, O. Bertrand, T. Siesenop, L.-S. Manz, C. Doussot, A. Fisher, and M. Egelhaaf, “Gap perception in bumblebees,”Journal of Experimental Biology, vol. 222, no. 2, p. jeb184135, 2019
2019
-
[4]
Visual flight control in naturalistic and artificial environments,
E. Baird and M. Dacke, “Visual flight control in naturalistic and artificial environments,”Journal of Comparative Physiology A, vol. 198, pp. 869– 876, 2012
2012
-
[5]
Bumblebees display characteristics of active vision during robust obstacle avoidance flight,
S. Ravi, T. Siesenop, O. J. Bertrand, L. Li, C. Doussot, A. Fisher, W. H. Warren, and M. Egelhaaf, “Bumblebees display characteristics of active vision during robust obstacle avoidance flight,”Journal of Experimental Biology, vol. 225, no. 4, p. jeb243021, 2022
2022
-
[6]
Learning to fly by crashing,
D. Gandhi, L. Pinto, and A. Gupta, “Learning to fly by crashing,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3948–3955, 2017
2017
-
[7]
Cad2rl: Real single-image flight without a single real image,
F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image,” 2017
2017
-
[8]
Dronet: Learning to fly by driving,
A. Loquercio, A. I. Maqueda, C. R. del Blanco, and D. Scaramuzza, “Dronet: Learning to fly by driving,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1088–1095, 2018
2018
-
[9]
Sous vide: Cooking visual drone navigation policies in a gaussian splatting vacuum,
J. Low, M. Adang, J. Yu, K. Nagami, and M. Schwager, “Sous vide: Cooking visual drone navigation policies in a gaussian splatting vacuum,”IEEE Robotics and Automation Letters, 2025
2025
-
[10]
Flying in clutter on monocular rgb by learning in 3d radiance fields with domain adaptation,
X. Huang, J. Li, T. Wu, X. Zhou, Z. Han, and F. Gao, “Flying in clutter on monocular rgb by learning in 3d radiance fields with domain adaptation,”arXiv preprint arXiv:2512.17349, 2025
-
[11]
Nonlinear ego-motion estimation from optical flow for online control of a quadrotor uav,
V . Grabe, H. H. B¨ulthoff, D. Scaramuzza, and P. R. Giordano, “Nonlinear ego-motion estimation from optical flow for online control of a quadrotor uav,”The International Journal of Robotics Research, vol. 34, no. 8, pp. 1114–1135, 2015
2015
-
[12]
Motion estimation by hybrid optical flow technology for uav landing in an unvisited area,
H.-W. Cheng, T.-L. Chen, and C.-H. Tien, “Motion estimation by hybrid optical flow technology for uav landing in an unvisited area,”Sensors, vol. 19, no. 6, p. 1380, 2019
2019
-
[13]
Enhancing optical-flow-based control by learning visual appearance cues for flying robots,
G. Croon, C. De Wagter, and T. Seidl, “Enhancing optical-flow-based control by learning visual appearance cues for flying robots,”Nature Machine Intelligence, vol. 3, pp. 33–41, 01 2021
2021
-
[14]
Optical flow- based control for micro air vehicles: an efficient data-driven incremental nonlinear dynamic inversion approach,
H. W. Ho, Y . Zhou, Y . Feng, and G. C. H. E. de Croon, “Optical flow- based control for micro air vehicles: an efficient data-driven incremental nonlinear dynamic inversion approach,”Auton. Robots, vol. 48, Oct. 2024. 8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. JUNE, 2026 First person vision in varying–density forests (a) (b) (c) (d) (e) Fi...
2024
-
[15]
Hovering flight and vertical landing control of a vtol unmanned aerial vehicle using optical flow,
B. Herisse, F.-X. Russotto, T. Hamel, and R. Mahony, “Hovering flight and vertical landing control of a vtol unmanned aerial vehicle using optical flow,” in2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 801–806, 2008
2008
-
[16]
Obstacle avoidance using flow field divergence,
R. C. Nelson and J. Aloimonos, “Obstacle avoidance using flow field divergence,”IEEE Transactions on pattern analysis and machine intel- ligence, vol. 11, no. 10, pp. 1102–1106, 1989
1989
-
[17]
Optic flow-based collision-free strategies: From insects to robots,
J. R. Serres and F. Ruffier, “Optic flow-based collision-free strategies: From insects to robots,”Arthropod structure & development, vol. 46, no. 5, pp. 703–717, 2017
2017
-
[18]
Accommodating unobservability to control flight attitude with optic flow,
G. Croon, J. Dupeyroux, C. De Wagter, A. Chatterjee, D. Olejnik, and F. Ruffier, “Accommodating unobservability to control flight attitude with optic flow,”Nature, vol. 610, pp. 485–490, 10 2022
2022
-
[19]
Event-based adaptive koopman framework for optic flow-guided landing on moving platforms,
B. Banday, C. K. Sah, and J. Keshavan, “Event-based adaptive koopman framework for optic flow-guided landing on moving platforms,”arXiv preprint arXiv:2501.16868, 2025
-
[20]
Seeing through pixel motion: Learning obstacle avoidance from optical flow with one camera,
Y . Hu, Y . Zhang, Y . Song, Y . Deng, F. Yu, L. Zhang, W. Lin, D. Zou, and W. Yu, “Seeing through pixel motion: Learning obstacle avoidance from optical flow with one camera,”IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5871–5878, 2025
2025
-
[21]
Deep drone racing: Learning agile flight in dynamic environments,
E. Kaufmann, A. Loquercio, R. Ranftl, A. Dosovitskiy, V . Koltun, and D. Scaramuzza, “Deep drone racing: Learning agile flight in dynamic environments,” inConference on Robot Learning, pp. 133–145, PMLR, 2018
2018
-
[22]
Nanoflownet: Real-time dense optical flow on a nano quadcopter,
R. J. Bouwmeester, F. Paredes-Vall ´es, and G. C. De Croon, “Nanoflownet: Real-time dense optical flow on a nano quadcopter,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1996–2003, IEEE, 2023
1996
-
[23]
The interpretation of a moving retinal image,
H. C. Longuet-Higgins and K. Prazdny, “The interpretation of a moving retinal image,”Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 208, no. 1173, pp. 385–397, 1980
1980
-
[24]
Optical flow estimation,
D. Fleet and Y . Weiss, “Optical flow estimation,” inHandbook of mathematical models in computer vision, pp. 237–257, Springer, 2006
2006
-
[25]
The right spin: Learning object motion from rotation-compensated flow fields,
P. Bideau, E. Learned-Miller, C. Schmid, and K. Alahari, “The right spin: Learning object motion from rotation-compensated flow fields,” International Journal of Computer Vision, vol. 132, no. 1, pp. 40–55, 2024
2024
-
[26]
Unflow: Unsupervised learning of optical flow with a bidirectional census loss,
S. Meister, J. Hur, and S. Roth, “Unflow: Unsupervised learning of optical flow with a bidirectional census loss,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, 2018
2018
-
[27]
Decoupled spatiotemporal adaptive fusion network for self-supervised motion estimation,
Z. Sun, Z. Luo, and S. Nishida, “Decoupled spatiotemporal adaptive fusion network for self-supervised motion estimation,”Neurocomputing, vol. 534, pp. 133–146, 2023
2023
-
[28]
Ocai: Improving optical flow estimation by occlusion and consistency aware interpolation,
J. Jeong, H. Cai, R. Garrepalli, J. M. Lin, M. Hayat, and F. Porikli, “Ocai: Improving optical flow estimation by occlusion and consistency aware interpolation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19352–19362, 2024
2024
-
[29]
Learning vision-based agile flight via differentiable physics,
Y . Zhang, Y . Hu, Y . Song, D. Zou, and W. Lin, “Learning vision-based agile flight via differentiable physics,”Nature Machine Intelligence, pp. 1–13, 2025
2025
-
[30]
Gmflow: Learning optical flow via global matching,
H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8121– 8130, 2022
2022
-
[31]
Optic flow based spatial vision in insects,
M. Egelhaaf, “Optic flow based spatial vision in insects,”Journal of Comparative Physiology A, vol. 209, no. 4, pp. 541–561, 2023
2023
-
[32]
Speeded-up robust features (surf),
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),”Computer vision and image understanding, vol. 110, no. 3, pp. 346–359, 2008
2008
-
[33]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
K. Cho, B. Van Merri ¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,”arXiv preprint arXiv:1406.1078, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[34]
Airsim: High-fidelity visual and physical simulation for autonomous vehicles,
S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” inField and Service Robotics, 2017
2017
-
[35]
Flightmare: A flexible quadrotor simulator,
Y . Song, S. Naji, E. Kaufmann, A. Loquercio, and D. Scaramuzza, “Flightmare: A flexible quadrotor simulator,” inProceedings of the 2020 Conference on Robot Learning, pp. 1147–1157, 2021
2020
-
[36]
Learning high-speed flight in the wild,
A. Loquercio, E. Kaufmann, R. Ranftl, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,”Science Robotics, vol. 6, no. 59, p. eabg5810, 2021. Publisher: American Association for the Advancement of Science
2021
-
[37]
Robust and effi- cient quadrotor trajectory generation for fast autonomous flight,
B. Zhou, F. Gao, L. Wang, C. Liu, and S. Shen, “Robust and effi- cient quadrotor trajectory generation for fast autonomous flight,”IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3529–3536, 2019
2019
-
[38]
Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps,
P. Florence, J. Carter, and R. Tedrake, “Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps,” inAlgorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics, pp. 304–319, Springer, 2020
2020
-
[39]
Mavrl: Learn to fly in cluttered environments with varying speed,
H. Yu, C. Wagter, and G. C. H. E. de Croon, “Mavrl: Learn to fly in cluttered environments with varying speed,”IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1441–1448, 2025
2025
-
[40]
Sea-raft: Simple, efficient, accurate raft for optical flow,
Y . Wang, L. Lipson, and J. Deng, “Sea-raft: Simple, efficient, accurate raft for optical flow,”arXiv preprint arXiv:2405.14793, 2024
-
[41]
Flowformer: A transformer architecture and its masked cost volume autoencoding for optical flow,
Z. Huang, X. Shi, C. Zhang, Q. Wang, Y . Li, H. Qin, J. Dai, X. Wang, and H. Li, “Flowformer: A transformer architecture and its masked cost volume autoencoding for optical flow,”arXiv preprint arXiv:2306.05442, 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.