Improving acoustic drone detection generalization through pretraining and data augmentation

Christian Rollwage; Mattes Ohlenbusch; Paul M. Reuter

arxiv: 2605.31329 · v1 · pith:NZ3DTDNInew · submitted 2026-05-29 · 📡 eess.AS

Improving acoustic drone detection generalization through pretraining and data augmentation

Paul M. Reuter , Mattes Ohlenbusch , Christian Rollwage This is my paper

Pith reviewed 2026-06-28 20:50 UTC · model grok-4.3

classification 📡 eess.AS

keywords acoustic drone detectionUAV detectionpretrainingdata augmentationgeneralizationsound event classificationfalse positive rate

0 comments

The pith

Pretraining a compact audio classifier on general sound events before fine-tuning substantially raises true-positive rates for drone detection on unseen recordings and environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that pretraining the detector on broad sound-event classification tasks is the main driver of improved generalization in acoustic UAV detection. It further shows that a full chain of on-the-fly augmentations adds measurable gains specifically on acoustically mismatched out-of-domain test sets. A sympathetic reader would care because reliable passive detection without line of sight matters for surveillance and airspace security, and the work quantifies how much each training choice moves performance at fixed false-positive rates. The evaluation uses public drone and non-drone corpora to measure both in-domain and out-of-domain true-positive rates plus false-positive rates on traffic and environmental sound collections. Distance-dependent results on one benchmark indicate usable detection out to 150 m.

Core claim

Pretraining the model for broad sound-event classification before fine-tuning on diverse drone recordings is the dominant factor for robust detection, yielding substantial TPR improvements over training from scratch on all benchmarks. The full augmentation chain (pitch shifting, noise mixing, microphone transfer function simulation, spectrogram augmentation) provides additional gains on acoustically mismatched out-of-domain data, achieving the best mean TPR on the AuDroK subsets and the largest improvements on the most challenging scenarios. False-positive rates remain equally low on unfamiliar non-drone backgrounds from IDMT-TRAFFIC and ESC-50.

What carries the argument

Compact DNN-based detector pretrained on sound-event classification then fine-tuned with on-the-fly augmentations that simulate varied acoustic conditions.

Load-bearing premise

The chosen public datasets and target false-positive rates adequately represent the range of real-world recording conditions and unseen UAV types the detector will encounter after deployment.

What would settle it

A new test set containing drone recordings from previously unseen UAV models or recording hardware where the pretrained model shows no TPR advantage over a from-scratch model at the same target FPR would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.31329 by Christian Rollwage, Mattes Ohlenbusch, Paul M. Reuter.

**Figure 2.** Figure 2: Class-wise false-positive rates of the full-stack model on ESC-50 at the operating point defined by [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Detection performance on IDMT Berne 2022 as a function of drone–microphone distance, reported as [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Detection performance on IDMT Berne 2022 as a function of drone–microphone distance for different audio [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Detecting unauthorized UAV flights is critical for surveillance, security, and airspace management. Acoustic drone detection, which relies on the distinctive propeller and motor sounds of UAVs, provides a low-cost, passive solution that requires no line of sight. A central challenge is generalization: reliably distinguishing drone signatures from ambient noise across unseen recording setups, environments, and UAV types (out-of-domain). Inspired by advances in large-scale audio pretraining, we develop a compact DNN-based detector and improve its generalization by (1) pretraining the model for broad sound-event classification before fine-tuning on diverse in-house and public drone recordings, and (2) applying on-the-fly augmentations (pitch shifting, noise mixing, microphone transfer function simulation, spectrogram augmentation) to expose the model to varied acoustic conditions. An ablation study quantifies the impact of each augmentation. For evaluation, we set target false-positive rates (FPR) aligned with real-world surveillance needs and report true-positive rates (TPR) on both in-domain data (public IDMT Berne 2022) and out-of-domain data (public AuDroK). Our results show that pretraining is the dominant factor for robust detection, yielding substantial TPR improvements over training from scratch on all benchmarks. The full augmentation chain provides additional gains on acoustically mismatched out-of-domain data, achieving the best mean TPR on the AuDroK subsets and the largest improvements on the most challenging scenarios. We further validate real-world applicability by measuring false positives on public non-drone corpora (IDMT-TRAFFIC and ESC-50), demonstrating equally low FPR on unfamiliar backgrounds. A distance-dependent analysis on IDMT Berne 2022 shows effective detection at distances up to 150 m.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pretraining drives the main generalization gains here while augmentations help on mismatched data, but missing error bars and narrow dataset coverage limit how far the robustness claims can be trusted.

read the letter

The main thing to know is that pretraining on general sound events gives the largest lift in true-positive rate at fixed false-positive rates across both in-domain and out-of-domain drone benchmarks, with the full augmentation stack adding further gains mainly on the acoustically mismatched AuDroK subsets.

The paper does a straightforward job of running ablations that separate pretraining from the individual augmentations (pitch shift, noise mixing, microphone simulation, spectrogram masking). It evaluates on public corpora, reports performance at surveillance-relevant FPR targets, checks false positives on non-drone backgrounds like traffic and ESC-50, and includes a distance-dependent breakdown up to 150 m. Those choices make the results more usable for someone who actually needs to deploy a detector.

The soft spots are real but not fatal. No error bars or statistical tests appear in the reported numbers, so it is impossible to tell whether the TPR differences are stable or could shift with different seeds or splits. Model architecture and hyperparameter details are also thin, which hurts reproducibility. The bigger concern is dataset representativeness: the test sets may not capture enough variation in UAV types, microphone responses, distances, or background spectra to support strong claims about unseen real-world conditions. If those corpora under-sample the deployment distribution, the reported gains may not transfer.

This is a practical empirical study aimed at people working on low-cost acoustic surveillance. A reader who wants a concrete recipe for improving out-of-domain audio detection will get value from the ablations. It is incremental rather than foundational, but the quantitative comparisons on public data are new enough to be worth checking.

I would send it to peer review so the statistical reporting and dataset limitations can be addressed.

Referee Report

3 major / 1 minor

Summary. The paper claims that pretraining a compact DNN on broad sound-event classification before fine-tuning on drone recordings, combined with on-the-fly augmentations (pitch shifting, noise mixing, microphone transfer function simulation, spectrogram augmentation), improves generalization in acoustic drone detection. Pretraining is the dominant factor yielding substantial TPR gains at fixed FPR over training from scratch on in-domain (IDMT Berne 2022) and OOD (AuDroK) benchmarks; the full augmentation chain adds further OOD gains. Low FPR is shown on non-drone backgrounds (IDMT-TRAFFIC, ESC-50), with effective detection up to 150 m on distance analysis.

Significance. If the empirical results hold, the work provides a practical recipe for improving acoustic UAV detector robustness via audio pretraining and targeted augmentations, supported by ablation quantification and evaluation on held-out public corpora. This could aid surveillance applications, with credit due for the component ablation and use of public benchmarks to demonstrate OOD gains.

major comments (3)

[Abstract] Abstract and results: the central claims of 'substantial TPR improvements' and 'pretraining is the dominant factor' rest on reported TPR/FPR numbers without error bars, confidence intervals, or statistical significance tests, which is load-bearing for assessing reliability of the ablation and generalization conclusions.
[Methods] Methods: no details are given on the DNN architecture, pretraining corpus, layer counts, or training hyperparameters (learning rate, epochs, batch size), which prevents reproduction or verification of the pretraining/fine-tuning procedure that underpins the dominant-factor claim.
[Evaluation] Evaluation: the choice of target FPR values and the specific public datasets (IDMT Berne 2022, AuDroK, IDMT-TRAFFIC, ESC-50) is presented without analysis showing they adequately sample real-world acoustic variability (microphone responses, UAV types, distances, backgrounds), which is load-bearing for the 'real-world applicability' and robust generalization claims.

minor comments (1)

[Abstract] The abstract refers to 'in-house and public drone recordings' for fine-tuning but provides no breakdown of their relative sizes or acoustic characteristics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve statistical reporting, reproducibility, and evaluation justification.

read point-by-point responses

Referee: [Abstract] Abstract and results: the central claims of 'substantial TPR improvements' and 'pretraining is the dominant factor' rest on reported TPR/FPR numbers without error bars, confidence intervals, or statistical significance tests, which is load-bearing for assessing reliability of the ablation and generalization conclusions.

Authors: We agree that error bars and statistical tests are needed to support the reliability of the ablation results. In the revision we will rerun the key experiments with multiple random seeds, report mean TPR with standard deviations at the target FPRs, and add paired statistical significance tests between the pretrained and from-scratch models. revision: yes
Referee: [Methods] Methods: no details are given on the DNN architecture, pretraining corpus, layer counts, or training hyperparameters (learning rate, epochs, batch size), which prevents reproduction or verification of the pretraining/fine-tuning procedure that underpins the dominant-factor claim.

Authors: We acknowledge that the current manuscript does not provide sufficient implementation details. We will expand the Methods section with the exact DNN architecture, pretraining corpus, layer counts, and all training hyperparameters (learning rate, epochs, batch size, optimizer) to enable full reproducibility. revision: yes
Referee: [Evaluation] Evaluation: the choice of target FPR values and the specific public datasets (IDMT Berne 2022, AuDroK, IDMT-TRAFFIC, ESC-50) is presented without analysis showing they adequately sample real-world acoustic variability (microphone responses, UAV types, distances, backgrounds), which is load-bearing for the 'real-world applicability' and robust generalization claims.

Authors: These are established public benchmarks used in prior drone-detection literature. We will add a dedicated paragraph in the Evaluation section that discusses coverage of microphone responses, UAV types, distances, and backgrounds, building on the existing distance analysis and the datasets' published metadata. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation on held-out corpora

full rationale

The paper is a purely empirical ML study: it trains a DNN detector with pretraining on sound-event classification followed by fine-tuning on drone recordings, applies on-the-fly augmentations, and reports TPR at fixed FPR on separate public test sets (IDMT Berne 2022 in-domain, AuDroK OOD, IDMT-TRAFFIC and ESC-50 for backgrounds). No equations, first-principles derivations, or predictions appear; all claims rest on ablation results and cross-dataset metrics. No self-citations function as load-bearing uniqueness theorems, and no fitted parameters are renamed as predictions. The evaluation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on empirical performance differences measured on fixed public datasets; no new physical model or theoretical derivation is introduced, so the ledger contains only standard machine-learning assumptions about transfer learning and data augmentation simulating domain shift.

axioms (2)

domain assumption Pretraining on broad sound-event classification produces features that transfer to drone-propeller signatures.
Invoked when the authors state that pretraining is the dominant factor; no proof or external validation of transfer is supplied in the abstract.
domain assumption On-the-fly augmentations adequately approximate real acoustic mismatches across recording setups and UAV types.
Central to the claim that the full augmentation chain yields additional gains on mismatched data.

pith-pipeline@v0.9.1-grok · 5854 in / 1514 out tokens · 20830 ms · 2026-06-28T20:50:39.907179+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 21 canonical work pages

[1]

Amateur Drone Monitoring: State-of-the-Art Architectures, Key Enabling Technologies, and Future Research Directions

Z. Kaleem and M. H. Rehmani. “Amateur Drone Monitoring: State-of-the-Art Architectures, Key Enabling Technologies, and Future Research Directions”. In:IEEE Wireless Communications25.2 (2018), pp. 150–159. DOI:10.1109/MWC.2018.1700152

work page doi:10.1109/mwc.2018.1700152 2018
[2]

Systems Engineering Baseline Concept of a Multispectral Drone Detection Solution for Airports

R. L. Sturdivant and E. K. P. Chong. “Systems Engineering Baseline Concept of a Multispectral Drone Detection Solution for Airports”. In:IEEE Access5 (2017), pp. 7123–7138.DOI:10.1109/ACCESS.2017.2697979

work page doi:10.1109/access.2017.2697979 2017
[3]

Audio Based Drone Detection and Identification using Deep Learning

S. Al-Emadi, A. Al-Ali, A. Mohammad, and A. Al-Ali. “Audio Based Drone Detection and Identification using Deep Learning”. In:Proc. International Wireless Communications & Mobile Computing Conference (IWCMC). June 2019, pp. 459–464.DOI:10.1109/IWCMC.2019.8766732

work page doi:10.1109/iwcmc.2019.8766732 2019
[4]

Malicious Drone Identification by Vibration Signature Measurement: A Radar-Based Approach

M. Bertocco, A. Brighente, G. Ciattaglia, E. Gambi, G. Peruzzi, A. Pozzebon, and S. Spinsante. “Malicious Drone Identification by Vibration Signature Measurement: A Radar-Based Approach”. In:IEEE Transactions on Instrumentation and Measurement74 (2025).DOI:10.1109/TIM.2025.3571136

work page doi:10.1109/tim.2025.3571136 2025
[5]

IEEE Transactions on Computational Social Sys- tems10(3), 1039–1056 (2023) https://doi.or g/10.1109/TCSS.2022.3162869

S. Scholes, A. Ruget, G. Mora-Mart ´ın, F. Zhu, I. Gyongy, and J. Leach. “DroneSense: The Identification, Seg- mentation, and Orientation Detection of Drones via Neural Networks”. In:IEEE Access10 (2022), pp. 38154– 38164.DOI:10.1109/ACCESS.2022.3162866

work page doi:10.1109/access.2022.3162866 2022
[6]

From classical approaches to recent advancements: A holistic review of acoustic detection for unmanned aerial vehicles

C. Kang, Q. Huang, F. Sun, X. Liang, and L. Xu. “From classical approaches to recent advancements: A holistic review of acoustic detection for unmanned aerial vehicles”. In:AIP Advances15.12 (Dec. 2025).DOI:10 . 1063/5.0304975

2025
[7]

Real-time UA V sound detection and analysis system

J. Kim, C. Park, J. Ahn, Y . Ko, J. Park, and J. C. Gallagher. “Real-time UA V sound detection and analysis system”. In:Proc. IEEE Sensors Applications Symposium (SAS). Mar. 2017.DOI:10 . 1109 / SAS . 2017 . 7894058

2017
[8]

Drone Detection Based on an Audio-Assisted Camera Array

H. Liu, Z. Wei, Y . Chen, J. Pan, L. Lin, and Y . Ren. “Drone Detection Based on an Audio-Assisted Camera Array”. In:Proc. IEEE International Conference on Multimedia Big Data (BigMM). Apr. 2017, pp. 402–406. DOI:10.1109/BigMM.2017.57

work page doi:10.1109/bigmm.2017.57 2017
[9]

Robust Drone Detection for Acoustic Monitoring Applications

M. Ohlenbusch, A. Ahrens, C. Rollwage, and J. Bitzer. “Robust Drone Detection for Acoustic Monitoring Applications”. In:Proc. European Signal Processing Conference (EUSIPCO). Jan. 2021, pp. 6–10.DOI:10. 23919/Eusipco47968.2020.9287433

arXiv 2021
[10]

UA V identification from acoustic signals using statistical learning: A state-of-the-art

A. Purier, S. Bouley, and L. Pinel-Lamotte. “UA V identification from acoustic signals using statistical learning: A state-of-the-art”. In:Proc. Quiet Drones. Sept. 2024.DOI:10.17866/rd.salford.27924897.v1. 10 Improving acoustic drone detection generalization through pretraining and data augmentation

work page doi:10.17866/rd.salford.27924897.v1 2024
[11]

Neural Network based Real-time UA V Detection and Analysis by Sound

J. Kim and D. Kim. “Neural Network based Real-time UA V Detection and Analysis by Sound”. In:Journal of Advanced Information Technology and Convergence8.1 (July 2018), pp. 43–52.DOI:10.14801/jaitc. 2018.8.1.43

work page doi:10.14801/jaitc 2018
[12]

Two Dimensional Convolutional Neural Network Frameworks Using Acoustic Nodes for UA V Security Applications

T. Marinopoulou, A. Vafeiadis, A. Lalas, C. Rollwage, D. Hollosi, K. V otis, and D. Tzovaras. “Two Dimensional Convolutional Neural Network Frameworks Using Acoustic Nodes for UA V Security Applications”. In:Proc. Quiet Drones. Oct. 2020.DOI:https://doi.org/10.5281/zenodo.4543295

work page doi:10.5281/zenodo.4543295 2020
[13]

A Large-Scale UA V Audio Dataset and Audio-Based UA V Classification Using CNN

Y . Wang, Z. Chu, I. Ku, E. C. Smith, and E. T. Matson. “A Large-Scale UA V Audio Dataset and Audio-Based UA V Classification Using CNN”. In:Proc. IEEE International Conference on Robotic Computing (IRC). Dec. 2022, pp. 186–189.DOI:10.1109/IRC55401.2022.00039

work page doi:10.1109/irc55401.2022.00039 2022
[14]

The Sound of Surveillance: Enhancing Machine Learning-Driven Drone Detection with Ad- vanced Acoustic Augmentation

S. K ¨ummritz. “The Sound of Surveillance: Enhancing Machine Learning-Driven Drone Detection with Ad- vanced Acoustic Augmentation”. In:Drones8.3 (Mar. 2024).DOI:10.3390/drones8030105

work page doi:10.3390/drones8030105 2024
[15]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”. In:Proc. IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 770–778.DOI:10.1109/CVPR. 2016.90

work page doi:10.1109/cvpr 2016
[16]

Squeeze-and-Excitation Networks

J. Hu, L. Shen, and G. Sun. “Squeeze-and-Excitation Networks”. In:Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2018, pp. 7132–7141.DOI:10.1109/CVPR.2018.00745

work page doi:10.1109/cvpr.2018.00745 2018
[17]

Gemmeke, Daniel P

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter. “Audio Set: An ontology and human-labeled dataset for audio events”. In:Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017, 776–780.DOI:10.1109/ICASSP.2017.7952261

work page doi:10.1109/icassp.2017.7952261 2017
[18]

Specaugment on large scale datasets

D. S. Park, Y . Zhang, C.-C. Chiu, Y . Chen, B. Li, W. Chan, Q. V . Le, and Y . Wu. “Specaugment on large scale datasets”. In:Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). May 2020, pp. 6879–6883.DOI:10.1109/ICASSP40776.2020.9053205

work page doi:10.1109/icassp40776.2020.9053205 2020
[19]

Comprehensive Database of UA V Sounds for Machine Learning

S. K ¨ummritz and L. Paul. “Comprehensive Database of UA V Sounds for Machine Learning”. In:Proc. F orum Acusticum. Jan. 2024, pp. 667–674.DOI:10.61782/fa.2023.0049

work page doi:10.61782/fa.2023.0049 2024
[20]

C. R. Romero, A. J. T. Martinez, N. Green, and C. Asensio.DroneNoise Database. Feb. 2023.DOI:10.17866/ rd.salford.22133411.v3

2023
[21]

Soundsnap.Soundsnap - Sound Effects Library.soundsnap.com
[22]

Neural Drone Localization Exploiting Signal Synthesis of Real- World Audio Data

X. Yang, P. A. Naylor, S. Doclo, and J. Bitzer. “Neural Drone Localization Exploiting Signal Synthesis of Real- World Audio Data”. In:Proc. European Signal Processing Conference (EUSIPCO). Sept. 2025, pp. 256–560. DOI:10.23919/EUSIPCO63237.2025.11226465

work page doi:10.23919/eusipco63237.2025.11226465 2025
[23]

Sound Localization of Drones Using an Acoustic Camera

P Alloza, B V onrhein, and A Movahed. “Sound Localization of Drones Using an Acoustic Camera”. In:Proc. Quiet Drones. Oct. 2020

2020
[24]

Real-Time Drone Detection and Tracking With Visible, Thermal and Acoustic Sensors

F. Svanstr ¨om, C. Englund, and F. Alonso-Fernandez. “Real-Time Drone Detection and Tracking With Visible, Thermal and Acoustic Sensors”. In:Proc International Conference on Pattern Recognition (ICPR). Jan. 2021, pp. 7265–7272.DOI:10.1109/ICPR48806.2021.9413241

work page doi:10.1109/icpr48806.2021.9413241 2021
[25]

Untersuchung der Ger¨auschemission von Drohnen / Investigation of the noise emis- sion of drones

S. K ¨orper and J. Treichl. “Untersuchung der Ger¨auschemission von Drohnen / Investigation of the noise emis- sion of drones”. In:L ¨armbek¨ampfung14.04 (2019), pp. 108–114.DOI:10.37544/1863-4672-2019-04-10

work page doi:10.37544/1863-4672-2019-04-10 2019
[26]

IDMT-Traffic: An Open Bench- mark Dataset for Acoustic Traffic Monitoring Research

J. Abeßer, S. Gourishetti, A. K ´atai, T. Clauß, P. Sharma, and J. Liebetrau. “IDMT-Traffic: An Open Bench- mark Dataset for Acoustic Traffic Monitoring Research”. In:Proc. European Signal Processing Conference (EUSIPCO). Aug. 2021, pp. 551–555.DOI:10.23919/EUSIPCO54536.2021.9616080

work page doi:10.23919/eusipco54536.2021.9616080 2021
[27]

ESC: Dataset for Environmental Sound Classification

K. J. Piczak. “ESC: Dataset for Environmental Sound Classification”. In:Proc. ACM International Conference on Multimedia. New York, NY , USA, Oct. 2015, pp. 1015–1018.DOI:10.1145/2733373.2806390. 11

work page doi:10.1145/2733373.2806390 2015

[1] [1]

Amateur Drone Monitoring: State-of-the-Art Architectures, Key Enabling Technologies, and Future Research Directions

Z. Kaleem and M. H. Rehmani. “Amateur Drone Monitoring: State-of-the-Art Architectures, Key Enabling Technologies, and Future Research Directions”. In:IEEE Wireless Communications25.2 (2018), pp. 150–159. DOI:10.1109/MWC.2018.1700152

work page doi:10.1109/mwc.2018.1700152 2018

[2] [2]

Systems Engineering Baseline Concept of a Multispectral Drone Detection Solution for Airports

R. L. Sturdivant and E. K. P. Chong. “Systems Engineering Baseline Concept of a Multispectral Drone Detection Solution for Airports”. In:IEEE Access5 (2017), pp. 7123–7138.DOI:10.1109/ACCESS.2017.2697979

work page doi:10.1109/access.2017.2697979 2017

[3] [3]

Audio Based Drone Detection and Identification using Deep Learning

S. Al-Emadi, A. Al-Ali, A. Mohammad, and A. Al-Ali. “Audio Based Drone Detection and Identification using Deep Learning”. In:Proc. International Wireless Communications & Mobile Computing Conference (IWCMC). June 2019, pp. 459–464.DOI:10.1109/IWCMC.2019.8766732

work page doi:10.1109/iwcmc.2019.8766732 2019

[4] [4]

Malicious Drone Identification by Vibration Signature Measurement: A Radar-Based Approach

M. Bertocco, A. Brighente, G. Ciattaglia, E. Gambi, G. Peruzzi, A. Pozzebon, and S. Spinsante. “Malicious Drone Identification by Vibration Signature Measurement: A Radar-Based Approach”. In:IEEE Transactions on Instrumentation and Measurement74 (2025).DOI:10.1109/TIM.2025.3571136

work page doi:10.1109/tim.2025.3571136 2025

[5] [5]

IEEE Transactions on Computational Social Sys- tems10(3), 1039–1056 (2023) https://doi.or g/10.1109/TCSS.2022.3162869

S. Scholes, A. Ruget, G. Mora-Mart ´ın, F. Zhu, I. Gyongy, and J. Leach. “DroneSense: The Identification, Seg- mentation, and Orientation Detection of Drones via Neural Networks”. In:IEEE Access10 (2022), pp. 38154– 38164.DOI:10.1109/ACCESS.2022.3162866

work page doi:10.1109/access.2022.3162866 2022

[6] [6]

From classical approaches to recent advancements: A holistic review of acoustic detection for unmanned aerial vehicles

C. Kang, Q. Huang, F. Sun, X. Liang, and L. Xu. “From classical approaches to recent advancements: A holistic review of acoustic detection for unmanned aerial vehicles”. In:AIP Advances15.12 (Dec. 2025).DOI:10 . 1063/5.0304975

2025

[7] [7]

Real-time UA V sound detection and analysis system

J. Kim, C. Park, J. Ahn, Y . Ko, J. Park, and J. C. Gallagher. “Real-time UA V sound detection and analysis system”. In:Proc. IEEE Sensors Applications Symposium (SAS). Mar. 2017.DOI:10 . 1109 / SAS . 2017 . 7894058

2017

[8] [8]

Drone Detection Based on an Audio-Assisted Camera Array

H. Liu, Z. Wei, Y . Chen, J. Pan, L. Lin, and Y . Ren. “Drone Detection Based on an Audio-Assisted Camera Array”. In:Proc. IEEE International Conference on Multimedia Big Data (BigMM). Apr. 2017, pp. 402–406. DOI:10.1109/BigMM.2017.57

work page doi:10.1109/bigmm.2017.57 2017

[9] [9]

Robust Drone Detection for Acoustic Monitoring Applications

M. Ohlenbusch, A. Ahrens, C. Rollwage, and J. Bitzer. “Robust Drone Detection for Acoustic Monitoring Applications”. In:Proc. European Signal Processing Conference (EUSIPCO). Jan. 2021, pp. 6–10.DOI:10. 23919/Eusipco47968.2020.9287433

arXiv 2021

[10] [10]

UA V identification from acoustic signals using statistical learning: A state-of-the-art

A. Purier, S. Bouley, and L. Pinel-Lamotte. “UA V identification from acoustic signals using statistical learning: A state-of-the-art”. In:Proc. Quiet Drones. Sept. 2024.DOI:10.17866/rd.salford.27924897.v1. 10 Improving acoustic drone detection generalization through pretraining and data augmentation

work page doi:10.17866/rd.salford.27924897.v1 2024

[11] [11]

Neural Network based Real-time UA V Detection and Analysis by Sound

J. Kim and D. Kim. “Neural Network based Real-time UA V Detection and Analysis by Sound”. In:Journal of Advanced Information Technology and Convergence8.1 (July 2018), pp. 43–52.DOI:10.14801/jaitc. 2018.8.1.43

work page doi:10.14801/jaitc 2018

[12] [12]

Two Dimensional Convolutional Neural Network Frameworks Using Acoustic Nodes for UA V Security Applications

T. Marinopoulou, A. Vafeiadis, A. Lalas, C. Rollwage, D. Hollosi, K. V otis, and D. Tzovaras. “Two Dimensional Convolutional Neural Network Frameworks Using Acoustic Nodes for UA V Security Applications”. In:Proc. Quiet Drones. Oct. 2020.DOI:https://doi.org/10.5281/zenodo.4543295

work page doi:10.5281/zenodo.4543295 2020

[13] [13]

A Large-Scale UA V Audio Dataset and Audio-Based UA V Classification Using CNN

Y . Wang, Z. Chu, I. Ku, E. C. Smith, and E. T. Matson. “A Large-Scale UA V Audio Dataset and Audio-Based UA V Classification Using CNN”. In:Proc. IEEE International Conference on Robotic Computing (IRC). Dec. 2022, pp. 186–189.DOI:10.1109/IRC55401.2022.00039

work page doi:10.1109/irc55401.2022.00039 2022

[14] [14]

The Sound of Surveillance: Enhancing Machine Learning-Driven Drone Detection with Ad- vanced Acoustic Augmentation

S. K ¨ummritz. “The Sound of Surveillance: Enhancing Machine Learning-Driven Drone Detection with Ad- vanced Acoustic Augmentation”. In:Drones8.3 (Mar. 2024).DOI:10.3390/drones8030105

work page doi:10.3390/drones8030105 2024

[15] [15]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”. In:Proc. IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 770–778.DOI:10.1109/CVPR. 2016.90

work page doi:10.1109/cvpr 2016

[16] [16]

Squeeze-and-Excitation Networks

J. Hu, L. Shen, and G. Sun. “Squeeze-and-Excitation Networks”. In:Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2018, pp. 7132–7141.DOI:10.1109/CVPR.2018.00745

work page doi:10.1109/cvpr.2018.00745 2018

[17] [17]

Gemmeke, Daniel P

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter. “Audio Set: An ontology and human-labeled dataset for audio events”. In:Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017, 776–780.DOI:10.1109/ICASSP.2017.7952261

work page doi:10.1109/icassp.2017.7952261 2017

[18] [18]

Specaugment on large scale datasets

D. S. Park, Y . Zhang, C.-C. Chiu, Y . Chen, B. Li, W. Chan, Q. V . Le, and Y . Wu. “Specaugment on large scale datasets”. In:Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP). May 2020, pp. 6879–6883.DOI:10.1109/ICASSP40776.2020.9053205

work page doi:10.1109/icassp40776.2020.9053205 2020

[19] [19]

Comprehensive Database of UA V Sounds for Machine Learning

S. K ¨ummritz and L. Paul. “Comprehensive Database of UA V Sounds for Machine Learning”. In:Proc. F orum Acusticum. Jan. 2024, pp. 667–674.DOI:10.61782/fa.2023.0049

work page doi:10.61782/fa.2023.0049 2024

[20] [20]

C. R. Romero, A. J. T. Martinez, N. Green, and C. Asensio.DroneNoise Database. Feb. 2023.DOI:10.17866/ rd.salford.22133411.v3

2023

[21] [21]

Soundsnap.Soundsnap - Sound Effects Library.soundsnap.com

[22] [22]

Neural Drone Localization Exploiting Signal Synthesis of Real- World Audio Data

X. Yang, P. A. Naylor, S. Doclo, and J. Bitzer. “Neural Drone Localization Exploiting Signal Synthesis of Real- World Audio Data”. In:Proc. European Signal Processing Conference (EUSIPCO). Sept. 2025, pp. 256–560. DOI:10.23919/EUSIPCO63237.2025.11226465

work page doi:10.23919/eusipco63237.2025.11226465 2025

[23] [23]

Sound Localization of Drones Using an Acoustic Camera

P Alloza, B V onrhein, and A Movahed. “Sound Localization of Drones Using an Acoustic Camera”. In:Proc. Quiet Drones. Oct. 2020

2020

[24] [24]

Real-Time Drone Detection and Tracking With Visible, Thermal and Acoustic Sensors

F. Svanstr ¨om, C. Englund, and F. Alonso-Fernandez. “Real-Time Drone Detection and Tracking With Visible, Thermal and Acoustic Sensors”. In:Proc International Conference on Pattern Recognition (ICPR). Jan. 2021, pp. 7265–7272.DOI:10.1109/ICPR48806.2021.9413241

work page doi:10.1109/icpr48806.2021.9413241 2021

[25] [25]

Untersuchung der Ger¨auschemission von Drohnen / Investigation of the noise emis- sion of drones

S. K ¨orper and J. Treichl. “Untersuchung der Ger¨auschemission von Drohnen / Investigation of the noise emis- sion of drones”. In:L ¨armbek¨ampfung14.04 (2019), pp. 108–114.DOI:10.37544/1863-4672-2019-04-10

work page doi:10.37544/1863-4672-2019-04-10 2019

[26] [26]

IDMT-Traffic: An Open Bench- mark Dataset for Acoustic Traffic Monitoring Research

J. Abeßer, S. Gourishetti, A. K ´atai, T. Clauß, P. Sharma, and J. Liebetrau. “IDMT-Traffic: An Open Bench- mark Dataset for Acoustic Traffic Monitoring Research”. In:Proc. European Signal Processing Conference (EUSIPCO). Aug. 2021, pp. 551–555.DOI:10.23919/EUSIPCO54536.2021.9616080

work page doi:10.23919/eusipco54536.2021.9616080 2021

[27] [27]

ESC: Dataset for Environmental Sound Classification

K. J. Piczak. “ESC: Dataset for Environmental Sound Classification”. In:Proc. ACM International Conference on Multimedia. New York, NY , USA, Oct. 2015, pp. 1015–1018.DOI:10.1145/2733373.2806390. 11

work page doi:10.1145/2733373.2806390 2015