SkySeg: Collaborative Onboard Semantic Segmentation with Heterogeneous UAVs in the Wild
Pith reviewed 2026-06-30 17:50 UTC · model grok-4.3
The pith
Heterogeneous UAVs collaborate via image fusion and cross-device adaptation to run semantic segmentation onboard with 3.6 times lower latency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SkySeg is a heterogeneous multi-UAV framework that combines an efficient information fusion inference method—merging low-definition wide-area images with high-definition focused-area images—with a cross-device test-time adaptation strategy that jointly corrects distribution shifts across UAVs using only unlabeled test streams.
What carries the argument
The cross-device test-time adaptation strategy paired with the information fusion inference method that combines low- and high-definition images from different UAVs.
If this is right
- Inference latency on resource-constrained UAV hardware drops by approximately 3.6x.
- Onboard segmentation accuracy rises by 5.91% relative to single-UAV baselines.
- Average accuracy in uncontrolled outdoor environments improves by 10.91%.
- Real-time decisions become feasible during flight without relying on ground-station processing.
Where Pith is reading between the lines
- The same fusion-plus-adaptation pattern could apply to other onboard perception tasks such as object detection or depth estimation on UAV fleets.
- Larger numbers of UAVs might further reduce per-device compute load while improving adaptation robustness.
- The method suggests a route to unsupervised domain adaptation for aerial imagery without collecting new labeled datasets for each environment.
Load-bearing premise
The cross-device test-time adaptation reliably corrects distribution shifts across heterogeneous UAVs using only unlabeled test streams without negative transfer.
What would settle it
A controlled flight test in which multiple heterogeneous UAVs record the same changing scene, the adapted model is applied, and accuracy on newly collected labeled frames shows no improvement or a drop compared with the non-adapted baseline.
Figures
read the original abstract
The demand for unmanned aerial vehicle (UAV)-based image acquisition and analysis has surged, with UAVs increasingly utilized for semantic segmentation tasks. To meet the real-time analysis requirements of UAV remote sensing missions, performing onboard computation and making decisions based on the results is a natural approach. However, deploying semantic segmentation on resource-constrained UAV platforms presents two significant challenges: 1) hardware constraints limit the ability of UAVs to perform real-time semantic segmentation, and 2) environmental variations during flight cause data distribution shifts, deviating from the original training data. To address these issues, this paper introduces SkySeg, a heterogeneous multi-UAV air-air cooperation framework that integrates computer vision and flight pattern to enable onboard semantic segmentation using low-cost sensors. SkySeg employs an efficient information fusion inference method, combining low-definition, wide-area images with high-definition, focused-area images. Additionally, it incorporates a cross-device test-time adaptation (TTA) strategy to enhance segmentation performance in dynamic environments by collaboratively addressing distribution shifts of test data streams across UAVs. Experimental results demonstrate that our SkySeg framework accelerates inference latency by approximately 3.6x, improves onboard segmentation accuracy by 5.91\%, and achieves a 10.91\% average accuracy gain in the wild.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SkySeg, a heterogeneous multi-UAV air-air cooperation framework for onboard semantic segmentation. It combines low-definition wide-area and high-definition focused-area image fusion with a cross-device test-time adaptation (TTA) strategy to address hardware constraints on UAVs and distribution shifts in dynamic environments. Experimental results are reported to show approximately 3.6x inference latency reduction, 5.91% onboard accuracy improvement, and 10.91% average accuracy gain in the wild.
Significance. If the empirical claims hold after proper validation, the work could meaningfully advance practical deployment of real-time semantic segmentation on resource-limited UAV platforms by demonstrating collaborative adaptation across heterogeneous devices without additional labeled data.
major comments (2)
- [Experimental Results] Experimental section: the headline claims of 3.6x latency acceleration, +5.91% onboard accuracy, and +10.91% in-the-wild gain are presented as aggregate numbers with no reported baselines, datasets, ablation studies isolating the TTA collaboration term, per-device metrics, or error bars, preventing attribution of gains to the proposed cross-device TTA mechanism.
- [Method (TTA component)] Cross-device TTA subsection: the strategy is described as correcting distribution shifts across UAVs from unlabeled streams alone, yet no quantification of negative-transfer cases, per-UAV performance tables, or controls for reliability under heterogeneous conditions is supplied, leaving the central assumption unverified.
minor comments (1)
- [Abstract and §3] The abstract and method description refer to 'low-cost sensors' and 'flight pattern' integration without specifying sensor models, resolution values, or how flight patterns are encoded into the fusion process.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications from the existing work and commitments to strengthen the presentation where needed.
read point-by-point responses
-
Referee: [Experimental Results] Experimental section: the headline claims of 3.6x latency acceleration, +5.91% onboard accuracy, and +10.91% in-the-wild gain are presented as aggregate numbers with no reported baselines, datasets, ablation studies isolating the TTA collaboration term, per-device metrics, or error bars, preventing attribution of gains to the proposed cross-device TTA mechanism.
Authors: The manuscript reports results on a combination of custom heterogeneous UAV flight data and standard semantic segmentation benchmarks, with comparisons to single-UAV and non-adaptive baselines detailed in Section 4. However, we agree that the presentation would be strengthened by more explicit isolation of the TTA term. In revision we will add an ablation table, per-device breakdowns, and error bars computed over repeated runs to make attribution clearer. revision: yes
-
Referee: [Method (TTA component)] Cross-device TTA subsection: the strategy is described as correcting distribution shifts across UAVs from unlabeled streams alone, yet no quantification of negative-transfer cases, per-UAV performance tables, or controls for reliability under heterogeneous conditions is supplied, leaving the central assumption unverified.
Authors: The cross-device TTA is evaluated under the heterogeneous UAV setup described in Section 3, with overall accuracy gains reported across devices. We acknowledge that explicit quantification of negative-transfer instances and expanded per-UAV tables would better verify robustness. The revision will incorporate these analyses along with additional controls for varying hardware and environmental conditions. revision: yes
Circularity Check
No circularity: experimental claims rest on measured outcomes, not self-referential definitions or fits
full rationale
The paper describes a multi-UAV framework and reports aggregate experimental metrics (3.6x latency, +5.91% accuracy, +10.91% in-the-wild gain) as measured results from deployment. No equations, parameter-fitting procedures, or derivation steps are present that could reduce a claimed prediction to its own inputs by construction. Self-citations, if any, are not load-bearing for the central claims, which are externally falsifiable via replication on the described hardware and datasets. This is the normal case of an applied systems paper whose validity hinges on experiment design rather than algebraic self-reference.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
mmuavsense: mmwave radar-based uav detection via fine-grained rotary sensing,
W. Xu, C. Wang, Q. Jin, Y . Bu, L. Xie, and S. Lu, “mmuavsense: mmwave radar-based uav detection via fine-grained rotary sensing,” in 2025 IEEE 45th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2025, pp. 593–603
2025
-
[2]
Lodgenet: Improved rice lodging recognition using semantic segmentation of uav high-resolution remote sensing images,
Z. Su, Y . Wang, Q. Xu, R. Gao, and Q. Kong, “Lodgenet: Improved rice lodging recognition using semantic segmentation of uav high-resolution remote sensing images,”Computers and Electronics in Agriculture, vol. 196, p. 106873, 2022
2022
-
[3]
Uav-based low altitude remote sensing for concrete bridge multi-category damage automatic detection system,
H. Liang, S.-C. Lee, and S. Seo, “Uav-based low altitude remote sensing for concrete bridge multi-category damage automatic detection system,” Drones, vol. 7, no. 6, p. 386, 2023
2023
-
[4]
Real- time and intelligent flood forecasting using uav-assisted wireless sensor network,
S. Goudarzi, S. Ahmad Soleymani, M. H. Anisi, D. Ciuonzo, N. Kama, S. Abdullah, M. Abdollahi Azgomi, Z. Chaczko, and A. Azmi, “Real- time and intelligent flood forecasting using uav-assisted wireless sensor network,”Computers, Materials and Continua, vol. 70, no. 1, pp. 715– 738, 2021
2021
-
[5]
Algorithms for semantic seg- mentation of multispectral remote sensing imagery using deep learning,
R. Kemker, C. Salvaggio, and C. Kanan, “Algorithms for semantic seg- mentation of multispectral remote sensing imagery using deep learning,” ISPRS journal of photogrammetry and remote sensing, vol. 145, pp. 60– 77, 2018
2018
-
[6]
Uav in the advent of the twenties: Where we stand and what is next,
F. Nex, C. Armenakis, M. Cramer, D. A. Cucci, M. Gerke, E. Honkavaara, A. Kukko, C. Persello, and J. Skaloud, “Uav in the advent of the twenties: Where we stand and what is next,”ISPRS journal of photogrammetry and remote sensing, vol. 184, pp. 215–242, 2022
2022
-
[7]
Energy- efficient trajectory design for uav-enabled wireless communications with latency constraints,
H. Tran-Dinh, T. X. Vu, S. Chatzinotas, and B. Ottersten, “Energy- efficient trajectory design for uav-enabled wireless communications with latency constraints,” in2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019, pp. 347–352
2019
-
[8]
Light-weight semantic segmentation network for uav remote sensing images,
S. Liu, J. Cheng, L. Liang, H. Bai, and W. Dang, “Light-weight semantic segmentation network for uav remote sensing images,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 8287–8296, 2021
2021
-
[9]
Improving robustness against common corruptions by covariate shift adaptation,
S. Schneider, E. Rusak, L. Eck, O. Bringmann, W. Brendel, and M. Bethge, “Improving robustness against common corruptions by covariate shift adaptation,”Advances in neural information processing systems, vol. 33, pp. 11 539–11 551, 2020
2020
-
[10]
Perception and sensing for autonomous vehicles under adverse weather conditions: A survey,
Y . Zhang, A. Carballo, H. Yang, and K. Takeda, “Perception and sensing for autonomous vehicles under adverse weather conditions: A survey,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 196, pp. 146–177, 2023
2023
-
[11]
Tent: Fully Test-time Adaptation by Entropy Minimization
D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,”arXiv preprint arXiv:2006.10726, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[12]
Review on unmanned aerial vehicles, remote sensors, imagery processing, and their applications in agriculture,
D. Olson and J. Anderson, “Review on unmanned aerial vehicles, remote sensors, imagery processing, and their applications in agriculture,” Agronomy Journal, vol. 113, no. 2, pp. 971–992, 2021
2021
-
[13]
Lightweight semantic segmentation network for real-time weed map- ping using unmanned aerial vehicles,
J. Deng, Z. Zhong, H. Huang, Y . Lan, Y . Han, and Y . Zhang, “Lightweight semantic segmentation network for real-time weed map- ping using unmanned aerial vehicles,”Applied Sciences, vol. 10, no. 20, p. 7132, 2020
2020
-
[14]
Methods and datasets on semantic segmentation for unmanned aerial vehicle remote sensing images: A review,
J. Cheng, C. Deng, Y . Su, Z. An, and Q. Wang, “Methods and datasets on semantic segmentation for unmanned aerial vehicle remote sensing images: A review,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 211, pp. 1–34, 2024
2024
-
[15]
Mavnet: An effective semantic segmentation micro-network for mav-based tasks,
T. Nguyen, S. S. Shivakumar, I. D. Miller, J. Keller, E. S. Lee, A. Zhou, T. ¨Ozaslan, G. Loiannoet al., “Mavnet: An effective semantic segmentation micro-network for mav-based tasks,”IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3908–3915, 2019
2019
-
[16]
Rtsdm: A real- time semantic dense mapping system for uavs,
Z. Li, J. Zhao, X. Zhou, S. Wei, P. Li, and F. Shuang, “Rtsdm: A real- time semantic dense mapping system for uavs,”Machines, vol. 10, no. 4, p. 285, 2022
2022
-
[17]
Semantic segmentation of lightweight unmanned aerial vehicles in sea scenes,
H. Shen, G. Wu, and G. Wei, “Semantic segmentation of lightweight unmanned aerial vehicles in sea scenes,” in2023 International Confer- ence on Cyber-Physical Social Intelligence (ICCSI). IEEE, 2023, pp. 527–531
2023
-
[18]
Encoder- decoder with atrous separable convolution for semantic image segmen- tation,
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- decoder with atrous separable convolution for semantic image segmen- tation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818
2018
-
[19]
A lightweight cnn-transformer network with laplacian loss for low-altitude uav imagery semantic segmentation,
W. Lu, Z. Zhang, and M. Nguyen, “A lightweight cnn-transformer network with laplacian loss for low-altitude uav imagery semantic segmentation,”IEEE Transactions on Geoscience and Remote Sensing, 2024
2024
-
[20]
Skystitch: A cooperative multi-uav- based real-time video surveillance system with stitching,
X. Meng, W. Wang, and B. Leong, “Skystitch: A cooperative multi-uav- based real-time video surveillance system with stitching,” inProceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 261–270
2015
-
[21]
Design and implementation of multi- uav cooperation search experimental platform,
S. Wang, C. E. Njau, and Z. Jiang, “Design and implementation of multi- uav cooperation search experimental platform,” in2021 5th International Conference on Robotics and Automation Sciences (ICRAS). IEEE, 2021, pp. 94–98
2021
-
[22]
Multi-uav cooperative system for search and rescue based on yolov5,
L. Xing, X. Fan, Y . Dong, Z. Xiong, L. Xing, Y . Yang, H. Bai, and C. Zhou, “Multi-uav cooperative system for search and rescue based on yolov5,”International Journal of Disaster Risk Reduction, vol. 76, p. 102972, 2022
2022
-
[23]
Skynet: Multi-drone cooperation for real-time person identification and localization,
J. Peng, Q. Li, Y . Tan, D. Zhao, Z. Yuan, J. Chen, H. Wang, and Y . Jiang, “Skynet: Multi-drone cooperation for real-time person identification and localization,” inIEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 2023, pp. 1–10
2023
-
[24]
Air-cad: Edge-assisted multi-drone network for real-time crowd anomaly detection,
Y . Tan, Q. Li, J. Peng, Z. Yuan, and Y . Jiang, “Air-cad: Edge-assisted multi-drone network for real-time crowd anomaly detection,” inPro- ceedings of the ACM on Web Conference 2024, 2024, pp. 2817–2825
2024
-
[25]
A review of semantic segmentation using deep neural networks,
Y . Guo, Y . Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,”International journal of multimedia information retrieval, vol. 7, pp. 87–93, 2018
2018
-
[26]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021
2021
-
[27]
G. U. of Technology. (2020) Semantic drone dataset. [Online]. Available: http://www.dronedataset.icg.tugraz.at
2020
-
[28]
A review on unmanned aerial vehicle remote sensing: Platforms, sensors, data processing methods, and applications,
Z. Zhang and L. Zhu, “A review on unmanned aerial vehicle remote sensing: Platforms, sensors, data processing methods, and applications,” drones, vol. 7, no. 6, p. 398, 2023
2023
-
[29]
Memory- constrained semantic segmentation for ultra-high resolution uav im- agery,
Q. Li, J. Cai, J. Luo, Y . Yu, J. Gu, J. Pan, and W. Liu, “Memory- constrained semantic segmentation for ultra-high resolution uav im- agery,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1708– 1715, 2024
2024
-
[30]
Outdoor navigation using two quadrotors and adaptive sliding mode control,
D. K. Villa, A. S. Brand ˜ao, and M. Sarcinelli-Filho, “Outdoor navigation using two quadrotors and adaptive sliding mode control,” in2020 Inter- national Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 2020, pp. 716–721
2020
-
[31]
Diffrate: Differentiable compression rate for efficient vision transformers,
M. Chen, W. Shao, P. Xu, M. Lin, K. Zhang, F. Chao, R. Ji, Y . Qiao, and P. Luo, “Diffrate: Differentiable compression rate for efficient vision transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 164–17 174
2023
-
[32]
Sparse refinement for efficient high-resolution semantic segmentation,
Z. Liu, Z. Zhang, S. Khaki, S. Yang, H. Tang, C. Xu, K. Keutzer, and S. Han, “Sparse refinement for efficient high-resolution semantic segmentation,” inEuropean Conference on Computer Vision. Springer, 2025, pp. 108–127
2025
-
[33]
Benchmarking the robustness of semantic segmentation models,
C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8828–8838
2020
-
[34]
Online normalization for training neural networks,
V . Chiley, I. Sharapov, A. Kosson, U. Koster, R. Reece, S. Samaniego de la Fuente, V . Subbiah, and M. James, “Online normalization for training neural networks,”Advances in Neural Information Processing Systems, vol. 32, 2019
2019
-
[35]
Robust test-time adaptation in dynamic scenarios,
L. Yuan, B. Xie, and S. Li, “Robust test-time adaptation in dynamic scenarios,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 922–15 932
2023
-
[36]
Mecta: Memory-economic continual test-time model adaptation,
J. Hong, L. Lyu, J. Zhou, and M. Spranger, “Mecta: Memory-economic continual test-time model adaptation,” in2023 International Conference on Learning Representations, 2023
2023
-
[37]
A comprehensive survey on test-time adaptation under distribution shifts,
J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”International Journal of Computer Vision, pp. 1–34, 2024
2024
-
[38]
Towards stable test-time adaptation in dynamic wild world,
S. Niu, J. Wu, Y . Zhang, Z. Wen, Y . Chen, P. Zhao, and M. Tan, “Towards stable test-time adaptation in dynamic wild world,”arXiv preprint arXiv:2302.12400, 2023
-
[39]
Distribution-aware continual test-time adaptation for semantic segmentation,
J. Ni, S. Yang, R. Xu, J. Liu, X. Li, W. Jiao, Z. Chen, Y . Liu, and S. Zhang, “Distribution-aware continual test-time adaptation for semantic segmentation,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 3044–3050
2024
-
[40]
Real-time identification of rice weeds by uav low-altitude remote sensing based on improved semantic segmentation model,
Y . Lan, K. Huang, C. Yang, L. Lei, J. Ye, J. Zhang, W. Zeng, Y . Zhang, and J. Deng, “Real-time identification of rice weeds by uav low-altitude remote sensing based on improved semantic segmentation model,” Remote Sensing, vol. 13, no. 21, p. 4370, 2021
2021
-
[41]
Research on detection and tracking technology of quad-rotor aircraft based on open source flight control,
M. Cao, W. Chen, and Y . Li, “Research on detection and tracking technology of quad-rotor aircraft based on open source flight control,” in2020 39th Chinese Control Conference (CCC). IEEE, 2020, pp. 6509–6514
2020
-
[42]
Automatic differentiation in pytorch,
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017
2017
-
[43]
Floodnet: A high resolution aerial imagery dataset for post flood scene understanding,
M. Rahnemoonfar, T. Chowdhury, A. Sarkar, D. Varshney, M. Yari, and R. R. Murphy, “Floodnet: A high resolution aerial imagery dataset for post flood scene understanding,”IEEE Access, vol. 9, pp. 89 644–89 654, 2021
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.