Exploring Feature Extraction Technique Parameters for Acoustic Gunshot Classification

Ryan Quinn; Sinclair Gurny

arxiv: 2606.19568 · v1 · pith:DET23GHFnew · submitted 2026-06-17 · 💻 cs.SD · cs.AI

Exploring Feature Extraction Technique Parameters for Acoustic Gunshot Classification

Sinclair Gurny , Ryan Quinn This is my paper

Pith reviewed 2026-06-26 18:56 UTC · model grok-4.3

classification 💻 cs.SD cs.AI

keywords acoustic gunshot classificationfeature extraction techniquesResNet-18audio classificationgunshot detectionparameter tuningtop-1 accuracymachine learning for audio

0 comments

The pith

Selecting the right feature extraction technique improves acoustic gunshot classification accuracy by up to 20%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how different methods for turning raw gunshot audio into usable features affect a neural network's ability to classify which firearm produced the sound. It draws on 23,000 recordings spanning 85 firearms and 21 calibers to compare three extraction techniques across 12 parameter sets, all run through the same ResNet-18 model. The work shows that picking the better technique can raise top-1 accuracy by as much as 20 percent over poorer choices, while tuning parameters inside one technique adds up to another 4.7 percent. The authors focus on realistic data because existing commercial systems show uneven results and prior studies have not examined generalization closely enough. These accuracy differences matter for applications in public safety and security where reliable identification from sound alone is needed.

Core claim

Using a dataset of 23,000 gunshot recordings across 85 firearms and 21 calibers, the authors benchmark three feature extraction techniques with 12 unique parameter sets on a ResNet-18 model. The results show that choosing the correct feature extraction technique improves top-1 accuracy by up to 20%, while selecting the right parameters for a given technique improves accuracy by up to an additional 4.7%.

What carries the argument

Benchmarking of three feature extraction techniques and 12 parameter sets with ResNet-18 on a 23,000-recording acoustic gunshot dataset

If this is right

Choosing the optimal feature extraction technique can raise top-1 accuracy by as much as 20% over less suitable techniques.
Adjusting parameters inside a single technique can add up to 4.7% more top-1 accuracy.
Systematic comparison of feature methods reveals that technique selection is more impactful than parameter tuning alone.
The large, multi-firearm dataset supports claims that these accuracy differences arise under conditions closer to real deployment than smaller prior studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity to feature technique and parameters is likely to appear in other audio classification problems that use similar neural network backbones.
Commercial detection systems could see measurable gains by re-testing their pipelines against the parameter sets shown to perform best here.
Extending the benchmarks to include background noise, overlapping events, or distance-based attenuation would test how far the reported gains survive outside controlled recordings.
Cross-dataset validation on entirely independent recording hardware would provide a direct check on whether the 20% and 4.7% margins generalize.

Load-bearing premise

The 23,000-recording dataset across 85 firearms and 21 calibers captures enough variation in realistic acoustic conditions for the observed accuracy gains to hold on new recordings outside the training distribution.

What would settle it

Running the same models on a fresh collection of gunshot recordings made with different microphones, in different environments, or from unseen firearms and finding that the accuracy gaps between techniques shrink to near zero or reverse.

read the original abstract

Acoustic gunshot detection is a problem with applications across civilian public safety, military operations, and wildlife conservation, yet the field lacks a rigorous exploration of feature extraction techniques with a focus on generalization to realistic data. The mixed effectiveness of commercial gunshot detection and classification systems indicates an open problem that is not adequately addressed by the current literature. In this paper, we present a systematic investigation of common feature extraction techniques using a dataset of 23,000 gunshot recordings across 85 firearms and 21 calibers. We benchmark three feature extraction techniques with 12 total unique parameter sets using ResNet-18. Our results demonstrate that using the correct feature extraction technique can improve top-1 accuracy by up to 20%, and utilizing the correct parameters for a given feature extraction technique can improve that value by up to 4.7%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Choosing the right feature extraction method and parameters improves accuracy on this gunshot classification task, but the results depend on how well the dataset and splits capture real conditions.

read the letter

The punchline here is that the paper finds choosing the feature extraction technique can lift top-1 accuracy by as much as 20 percent on their gunshot data, with another 4.7 percent coming from tuning the parameters within a technique. They back this with a sweep over three techniques and twelve parameter sets using a ResNet-18 classifier.

What stands out is the dataset: twenty-three thousand recordings spanning eighty-five firearms and twenty-one calibers. That gives them room to test across variety. The work is a direct empirical check on how much these standard audio features matter when the goal is generalization to realistic conditions, which the abstract correctly notes has been missing.

They do a clean job of framing the problem around practical systems in public safety and conservation. The results are presented as quantified improvements, which is useful for anyone implementing similar pipelines.

The main question is whether those gains will hold when the data is split properly and the recordings reflect real acoustic environments. The stress test flags the risk that random splits or lab conditions could make the differences look bigger than they are in the field. If the paper shows independent test sets by firearm or session and includes varied distances and noise, that concern goes away. Otherwise the numbers are harder to trust for deployment.

This kind of targeted benchmark is worth the time of a referee. It is not proposing a new method but it tests existing ones thoroughly enough that the community working on acoustic classification would benefit from seeing the details and any limitations.

Referee Report

3 major / 2 minor

Summary. The manuscript reports an empirical benchmark of three feature extraction techniques (with 12 total parameter sets) for acoustic gunshot classification. Using ResNet-18 on a dataset of 23,000 recordings spanning 85 firearms and 21 calibers, the authors claim that selecting the appropriate technique improves top-1 accuracy by up to 20% and that parameter tuning within a technique yields an additional improvement of up to 4.7%. The work positions itself as addressing a gap in rigorous, generalization-focused evaluation of feature extraction for realistic acoustic data.

Significance. If the accuracy deltas prove robust under firearm-disjoint splits and realistic acoustic variation, the systematic parameter sweep would supply practical guidance for feature choice in gunshot classification systems used in public safety and conservation. The scale of the dataset and the explicit focus on parameter sensitivity are positive attributes that could make the results actionable for practitioners.

major comments (3)

[Dataset and Experimental Setup] Dataset and splits section: No description is given of how the train/test partition was performed (e.g., by firearm ID, recording session, or random). Because the central claim is that the observed 20% and 4.7% gains reflect genuine acoustic differences rather than leakage, the absence of a firearm- or session-disjoint split is load-bearing; the reported improvements cannot be interpreted as generalization evidence without this information.
[Introduction and Methods] Experimental design: The abstract and introduction emphasize the need for evaluation on realistic data (varying distance, reverberation, background noise), yet the manuscript provides no quantitative description or controls confirming that the 23k recordings span these conditions. Without such verification, the headline accuracy deltas risk being artifacts of controlled laboratory conditions rather than transferable acoustic properties.
[Results] Results presentation: The manuscript reports point estimates of top-1 accuracy but does not include statistical significance tests, confidence intervals across multiple random seeds, or comparisons against simple baselines (e.g., MFCC-only or raw waveform). This makes it impossible to determine whether the 4.7% parameter-tuning gain exceeds training variance.

minor comments (2)

[Methods] Provide the exact numerical values and ranges for all 12 parameter sets so that the experiments are fully reproducible.
[Results] Add a table or figure caption that explicitly lists the three feature extraction techniques and their parameter combinations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of experimental rigor that will improve the manuscript. We respond to each major comment below and will incorporate revisions as indicated.

read point-by-point responses

Referee: [Dataset and Experimental Setup] Dataset and splits section: No description is given of how the train/test partition was performed (e.g., by firearm ID, recording session, or random). Because the central claim is that the observed 20% and 4.7% gains reflect genuine acoustic differences rather than leakage, the absence of a firearm- or session-disjoint split is load-bearing; the reported improvements cannot be interpreted as generalization evidence without this information.

Authors: We agree that the train/test partitioning procedure must be described explicitly, as it directly affects interpretation of the accuracy gains as evidence of generalization. The original manuscript omitted this information. In the revised version we will add a clear description of the split (performed at the recording level). To directly address the concern about potential leakage, we will also report results under firearm-disjoint splits and discuss any differences relative to the original numbers. revision: yes
Referee: [Introduction and Methods] Experimental design: The abstract and introduction emphasize the need for evaluation on realistic data (varying distance, reverberation, background noise), yet the manuscript provides no quantitative description or controls confirming that the 23k recordings span these conditions. Without such verification, the headline accuracy deltas risk being artifacts of controlled laboratory conditions rather than transferable acoustic properties.

Authors: We accept that quantitative characterization of the acoustic conditions is required to support the claim of realism. In the revision we will add a subsection summarizing available metadata on recording distances, background noise levels, and reverberation indicators present in the 23k-recording collection. Where metadata are incomplete we will note the limitation and describe any collection protocols used to ensure diversity of conditions. revision: yes
Referee: [Results] Results presentation: The manuscript reports point estimates of top-1 accuracy but does not include statistical significance tests, confidence intervals across multiple random seeds, or comparisons against simple baselines (e.g., MFCC-only or raw waveform). This makes it impossible to determine whether the 4.7% parameter-tuning gain exceeds training variance.

Authors: We agree that point estimates alone are insufficient and that statistical analysis plus baselines are needed to substantiate the reported gains. The revised Results section will include: (i) accuracy means and standard deviations over multiple random seeds, (ii) statistical significance tests comparing the best parameter sets, and (iii) additional baseline runs using raw waveforms and untuned MFCC features. These additions will allow readers to assess whether the 4.7% improvement exceeds training variability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with no derivation chain

full rationale

The paper is a straightforward empirical study that benchmarks three feature extraction techniques (with 12 parameter sets) on a fixed 23k-recording dataset using ResNet-18, reporting observed top-1 accuracy differences. No mathematical derivation, first-principles result, fitted parameter renamed as prediction, or self-citation load-bearing uniqueness theorem is present or invoked. The central claims are direct experimental outcomes on the given data split; they do not reduce to their inputs by construction. This matches the default expectation of no significant circularity (score 0-2) for non-derivational empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmarking study with no mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.1-grok · 5660 in / 1152 out tokens · 41319 ms · 2026-06-26T18:56:49.670834+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 10 canonical work pages · 2 internal anchors

[1]

Home | Small Arms Survey,

“Home | Small Arms Survey,” Dec. 2025. [Online]. Available: https://www.smallarmssurvey.org/

2025
[2]

A Military Audio Dataset for Situational Awareness and Surveillance,

J.-W. Kim, C. Yoon, and H.-Y . Jung, “A Military Audio Dataset for Situational Awareness and Surveillance,”Scientific Data, vol. 11, no. 1, p. 668, Jun. 2024. [Online]. Available: https://www.nature.com/articles/ s41597-024-03511-w

2024
[3]

Fighting Poaching through High-Precision Real-Time Gunshot Detection Using Deep Learning and SAIL,

N. Dhar, “Fighting Poaching through High-Precision Real-Time Gunshot Detection Using Deep Learning and SAIL,” inBiodiversity Information Science and Standards, vol. 10. Pensoft Publishers, Mar. 2026, p. e183432. [Online]. Available: https://biss.pensoft.net/article/183432/

2026
[4]

Gunshot Audio: Muzzle Blast, Shock Waves, and Health Impact,

“Gunshot Audio: Muzzle Blast, Shock Waves, and Health Impact,” Apr
[5]

Available: https://biologyinsights.com/gunshot-audio- muzzle-blast-shock-waves-and-health-impact/

[Online]. Available: https://biologyinsights.com/gunshot-audio- muzzle-blast-shock-waves-and-health-impact/
[6]

Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification,

Y . Yamamoto, J. Nam, H. Terasawa, and Y . Hiraga, “Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification,” in2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2021, pp. 890–896, iSSN: 2640-0103. [Online]. Available: https://ieeexplore.iee...

work page arXiv 2021
[7]

A Gunshot Recognition Method Based on Multi-Scale Spectrum Shift Module,

J. Li, J. Guo, M. Ma, Y . Zeng, C. Li, and J. Xu, “A Gunshot Recognition Method Based on Multi-Scale Spectrum Shift Module,”Electronics, vol. 11, no. 23, 2022. [Online]. Available: https://www.mdpi.com/2079-9292/11/23/3859

2022
[8]

A reduced complexity MFCC-based deep neural network approach for speech enhancement,

R. Razani, H. Chung, Y . Attabi, and B. Champagne, “A reduced complexity MFCC-based deep neural network approach for speech enhancement,” in2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). Bilbao: IEEE, Dec. 2017, pp. 331–336. [Online]. Available: https://ieeexplore.ieee.org/ document/8388664/

work page arXiv 2017
[9]

Know Your Tech: ShotSpotter,

ACLU of Oregon, “Know Your Tech: ShotSpotter,” 2022, published: Web resource. [Online]. Available: https://www.aclu-or.org/know-your- tech-shotspotter/

2022
[10]

Field Evaluation of the ShotSpotter Gunshot Location System: Final Report on the Redwood City Field Trial | Office of Justice Programs

“Field Evaluation of the ShotSpotter Gunshot Location System: Final Report on the Redwood City Field Trial | Office of Justice Programs.” [On- line]. Available: https://www.ojp.gov/library/publications/field-evaluation- shotspotter-gunshot-location-system-final-report-redwood-city
[11]

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features,

S.-Y . Jung, C.-H. Liao, Y .-S. Wu, S.-M. Yuan, and C.-T. Sun, “Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features,”Diagnostics, vol. 11, no. 4, p. 732, Apr
[12]

Available: https://www.mdpi.com/2075-4418/11/4/732

[Online]. Available: https://www.mdpi.com/2075-4418/11/4/732

2075
[13]

Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning,

S. Carvalho and E. F. Gomes, “Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning,” Vietnam Journal of Computer Science, vol. 10, no. 01, pp. 39–54, Feb
[14]

Available: https://www.worldscientific.com/doi/10.1142/ S2196888822500300

[Online]. Available: https://www.worldscientific.com/doi/10.1142/ S2196888822500300
[15]

Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations,

P. Doungpaisan and P. Khunarsa, “Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations,”Journal of Imaging, vol. 11, no. 8, p. 281, Aug. 2025. [Online]. Available: https://www.mdpi.com/2313- 433X/11/8/281

2025
[16]

Sound of Guns: Digital Forensics of Gun Audio Samples Meets Artificial Intelligence,

S. Raponi, G. Oligeri, and I. M. Ali, “Sound of Guns: Digital Forensics of Gun Audio Samples Meets Artificial Intelligence,”Multimedia Tools and Applications, vol. 81, pp. 30 387–30 412, 2022. [Online]. Available: https://link.springer.com/article/10.1007/s11042-022-12612-w

work page doi:10.1007/s11042-022-12612-w 2022
[17]

A Fast Identification Method of Gunshot Types Based on Knowledge Distillation,

J. Li, J. Guo, X. Sun, C. Li, and L. Meng, “A Fast Identification Method of Gunshot Types Based on Knowledge Distillation,”Applied Sciences, vol. 12, no. 11, p. 5526, 2022. [Online]. Available: https://www.mdpi.com/2076-3417/12/11/5526

2022
[18]

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings,

A. Shah, R. Singh, B. Raj, and A. Hauptmann, “Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings,” Jun. 2025, arXiv:2506.20609 [cs]. [Online]. Available: http://arxiv.org/abs/ 2506.20609

work page arXiv 2025
[19]

Independent Channel Residual Convolutional Network for Gunshot Detection,

J. Bajzik, J. Prinosil, R. Jarina, and J. Mekyska, “Independent Channel Residual Convolutional Network for Gunshot Detection,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 4, 2022. [Online]. Available: http://thesai.org/Publications/ ViewPaper?V olume=13&Issue=4&Code=IJACSA&SerialNo=108

2022
[20]

Development of Computational Methods for the Audio Analysis of Gunshots,

R. Lilien, “Development of Computational Methods for the Audio Analysis of Gunshots,” Cadre Research Labs, LLC, Final Research Performance Progress Report 252947, Jun. 2018. [Online]. Available: https://www.ojp.gov/pdffiles1/nij/grants/252947.pdf

2018
[21]

A multi-firearm, multi-orientation audio dataset of gunshots,

R. Kabealo, S. Wyatt, A. Aravamudan, X. Zhang, D. N. Acaron, M. P. Dao, D. Elliott, A. O. Smith, C. E. Otero, L. D. Otero, G. C. Anagnostopoulos, A. M. Peter, W. Jones, and E. Lam, “A multi-firearm, multi-orientation audio dataset of gunshots,” Data in Brief, vol. 48, p. 109091, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/...

2023
[22]

The Free Firearm Sound Library,

bart, “The Free Firearm Sound Library,” Mar. 2014. [Online]. Available: https://opengameart.org/content/the-free-firearm-sound-library

2014
[23]

Certus Caliber Classification Gunshot Dataset (C3GD),

S. Gurny and R. Quinn, “Certus Caliber Classification Gunshot Dataset (C3GD),” May 2026. [Online]. Available: https://zenodo.org/records/ 20274400

2026
[24]

Bag of Tricks for Image Classification with Convolutional Neural Networks,

T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of Tricks for Image Classification with Convolutional Neural Networks,” Dec
[25]

Bag of Tricks for Image Classification with Convolutional Neural Networks

[Online]. Available: https://arxiv.org/abs/1812.01187v2

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

J. Salamon and J. P. Bello, “Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification,”IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, Mar. 2017, arXiv:1608.04363 [cs]. [Online]. Available: http://arxiv.org/abs/1608. 04363

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

GitHub - iver56/audiomentations: A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab. · GitHub

“GitHub - iver56/audiomentations: A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab. · GitHub.” [Online]. Available: https://github.com/iver56/audiomentations
[28]

O’Shaughnessy,Speech Communication: Human and Machine

D. O’Shaughnessy,Speech Communication: Human and Machine. Addison-Wesley Publishing Company, 1987, google-Books-ID: mH- FQAAAAMAAJ

1987
[29]

Environmental sound classification with convolutional neural networks,

K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). Boston, MA, USA: IEEE, Sep. 2015, pp. 1–6. [Online]. Available: http: //ieeexplore.ieee.org/document/7324337/

work page arXiv 2015
[30]

C.-h. Chen,Pattern Recognition and Artificial Intelligence: Proceedings of the Joint Workshop on Pattern Recognition and Artificial Intelligence, Held at Hyannis, Massachusetts, June 1-3, 1976. Academic Press, 1976, google-Books-ID: wW9QAAAAMAAJ

1976
[31]

Representing Audio Data: An In-Depth Look at STFT and MFCC

“Representing Audio Data: An In-Depth Look at STFT and MFCC.” [Online]. Available: https://www.ideas2it.com/blogs/mfcc-stft- from-audio-data
[32]

On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition,

V . Tyagi and C. Wellekens, “On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition,” inProceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol. 1, Mar. 2005, pp. I/529–I/532 V ol. 1, iSSN: 2379-190X. [Online]. Available: https://ieeexplore.ieee.org/document/1415167

work page arXiv 2005
[33]

Towards an Indoor Gunshot Detection and Notification System Using Deep Learning,

T. Khan, “Towards an Indoor Gunshot Detection and Notification System Using Deep Learning,”Applied System Innovation, vol. 6, no. 5, p. 94, Oct. 2023. [Online]. Available: https://www.mdpi.com/2571-5577/6/5/94

2023
[34]

Efficient Feature Set Developed for Acoustic Gunshot Detection in Open Space,

M. Sigmund and M. Hrabina, “Efficient Feature Set Developed for Acoustic Gunshot Detection in Open Space,”Elektronika ir Elektrotechnika, vol. 27, no. 4, pp. 62–68, Aug. 2021. [Online]. Available: https://eejournal.ktu.lt/index.php/elt/article/view/28877

2021
[35]

Choice of Hop Size | Spectral Audio Signal Processing

“Choice of Hop Size | Spectral Audio Signal Processing.” [Online]. Available: https://dsprelated.com/freebooks/sasp/Choice Hop Size.html
[36]

Machine Learning Analysis on Gunshot Recognition,

S. B. Nesar, B. M. Whitaker, and R. C. Maher, “Machine Learning Analysis on Gunshot Recognition,” in2024 Intermountain Engineering, Technology and Computing (IETC). Logan, UT, USA: IEEE, May 2024, pp. 249–254. [Online]. Available: https: //ieeexplore.ieee.org/document/10564263/

work page arXiv 2024
[37]

Measurements, Analysis, Classification, and Detection of Gunshot and Gunshot-like Sounds,

R. B. Singh and H. Zhuang, “Measurements, Analysis, Classification, and Detection of Gunshot and Gunshot-like Sounds,”Sensors, vol. 22, no. 23, p. 9170, Nov. 2022. [Online]. Available: https://www.mdpi.com/1424- 8220/22/23/9170

2022
[38]

Sound Event Detection: A Tutorial,

A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley, “Sound Event Detection: A Tutorial,”IEEE Signal Processing Magazine, vol. 38, no. 5, pp. 67–83, Sep. 2021, arXiv:2107.05463 [eess.AS]. [Online]. Available: http://arxiv.org/abs/2107.05463

work page arXiv 2021

[1] [1]

Home | Small Arms Survey,

“Home | Small Arms Survey,” Dec. 2025. [Online]. Available: https://www.smallarmssurvey.org/

2025

[2] [2]

A Military Audio Dataset for Situational Awareness and Surveillance,

J.-W. Kim, C. Yoon, and H.-Y . Jung, “A Military Audio Dataset for Situational Awareness and Surveillance,”Scientific Data, vol. 11, no. 1, p. 668, Jun. 2024. [Online]. Available: https://www.nature.com/articles/ s41597-024-03511-w

2024

[3] [3]

Fighting Poaching through High-Precision Real-Time Gunshot Detection Using Deep Learning and SAIL,

N. Dhar, “Fighting Poaching through High-Precision Real-Time Gunshot Detection Using Deep Learning and SAIL,” inBiodiversity Information Science and Standards, vol. 10. Pensoft Publishers, Mar. 2026, p. e183432. [Online]. Available: https://biss.pensoft.net/article/183432/

2026

[4] [4]

Gunshot Audio: Muzzle Blast, Shock Waves, and Health Impact,

“Gunshot Audio: Muzzle Blast, Shock Waves, and Health Impact,” Apr

[5] [5]

Available: https://biologyinsights.com/gunshot-audio- muzzle-blast-shock-waves-and-health-impact/

[Online]. Available: https://biologyinsights.com/gunshot-audio- muzzle-blast-shock-waves-and-health-impact/

[6] [6]

Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification,

Y . Yamamoto, J. Nam, H. Terasawa, and Y . Hiraga, “Investigating Time-Frequency Representations for Audio Feature Extraction in Singing Technique Classification,” in2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2021, pp. 890–896, iSSN: 2640-0103. [Online]. Available: https://ieeexplore.iee...

work page arXiv 2021

[7] [7]

A Gunshot Recognition Method Based on Multi-Scale Spectrum Shift Module,

J. Li, J. Guo, M. Ma, Y . Zeng, C. Li, and J. Xu, “A Gunshot Recognition Method Based on Multi-Scale Spectrum Shift Module,”Electronics, vol. 11, no. 23, 2022. [Online]. Available: https://www.mdpi.com/2079-9292/11/23/3859

2022

[8] [8]

A reduced complexity MFCC-based deep neural network approach for speech enhancement,

R. Razani, H. Chung, Y . Attabi, and B. Champagne, “A reduced complexity MFCC-based deep neural network approach for speech enhancement,” in2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). Bilbao: IEEE, Dec. 2017, pp. 331–336. [Online]. Available: https://ieeexplore.ieee.org/ document/8388664/

work page arXiv 2017

[9] [9]

Know Your Tech: ShotSpotter,

ACLU of Oregon, “Know Your Tech: ShotSpotter,” 2022, published: Web resource. [Online]. Available: https://www.aclu-or.org/know-your- tech-shotspotter/

2022

[10] [10]

Field Evaluation of the ShotSpotter Gunshot Location System: Final Report on the Redwood City Field Trial | Office of Justice Programs

“Field Evaluation of the ShotSpotter Gunshot Location System: Final Report on the Redwood City Field Trial | Office of Justice Programs.” [On- line]. Available: https://www.ojp.gov/library/publications/field-evaluation- shotspotter-gunshot-location-system-final-report-redwood-city

[11] [11]

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features,

S.-Y . Jung, C.-H. Liao, Y .-S. Wu, S.-M. Yuan, and C.-T. Sun, “Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features,”Diagnostics, vol. 11, no. 4, p. 732, Apr

[12] [12]

Available: https://www.mdpi.com/2075-4418/11/4/732

[Online]. Available: https://www.mdpi.com/2075-4418/11/4/732

2075

[13] [13]

Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning,

S. Carvalho and E. F. Gomes, “Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning,” Vietnam Journal of Computer Science, vol. 10, no. 01, pp. 39–54, Feb

[14] [14]

Available: https://www.worldscientific.com/doi/10.1142/ S2196888822500300

[Online]. Available: https://www.worldscientific.com/doi/10.1142/ S2196888822500300

[15] [15]

Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations,

P. Doungpaisan and P. Khunarsa, “Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations,”Journal of Imaging, vol. 11, no. 8, p. 281, Aug. 2025. [Online]. Available: https://www.mdpi.com/2313- 433X/11/8/281

2025

[16] [16]

Sound of Guns: Digital Forensics of Gun Audio Samples Meets Artificial Intelligence,

S. Raponi, G. Oligeri, and I. M. Ali, “Sound of Guns: Digital Forensics of Gun Audio Samples Meets Artificial Intelligence,”Multimedia Tools and Applications, vol. 81, pp. 30 387–30 412, 2022. [Online]. Available: https://link.springer.com/article/10.1007/s11042-022-12612-w

work page doi:10.1007/s11042-022-12612-w 2022

[17] [17]

A Fast Identification Method of Gunshot Types Based on Knowledge Distillation,

J. Li, J. Guo, X. Sun, C. Li, and L. Meng, “A Fast Identification Method of Gunshot Types Based on Knowledge Distillation,”Applied Sciences, vol. 12, no. 11, p. 5526, 2022. [Online]. Available: https://www.mdpi.com/2076-3417/12/11/5526

2022

[18] [18]

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings,

A. Shah, R. Singh, B. Raj, and A. Hauptmann, “Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings,” Jun. 2025, arXiv:2506.20609 [cs]. [Online]. Available: http://arxiv.org/abs/ 2506.20609

work page arXiv 2025

[19] [19]

Independent Channel Residual Convolutional Network for Gunshot Detection,

J. Bajzik, J. Prinosil, R. Jarina, and J. Mekyska, “Independent Channel Residual Convolutional Network for Gunshot Detection,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 4, 2022. [Online]. Available: http://thesai.org/Publications/ ViewPaper?V olume=13&Issue=4&Code=IJACSA&SerialNo=108

2022

[20] [20]

Development of Computational Methods for the Audio Analysis of Gunshots,

R. Lilien, “Development of Computational Methods for the Audio Analysis of Gunshots,” Cadre Research Labs, LLC, Final Research Performance Progress Report 252947, Jun. 2018. [Online]. Available: https://www.ojp.gov/pdffiles1/nij/grants/252947.pdf

2018

[21] [21]

A multi-firearm, multi-orientation audio dataset of gunshots,

R. Kabealo, S. Wyatt, A. Aravamudan, X. Zhang, D. N. Acaron, M. P. Dao, D. Elliott, A. O. Smith, C. E. Otero, L. D. Otero, G. C. Anagnostopoulos, A. M. Peter, W. Jones, and E. Lam, “A multi-firearm, multi-orientation audio dataset of gunshots,” Data in Brief, vol. 48, p. 109091, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/...

2023

[22] [22]

The Free Firearm Sound Library,

bart, “The Free Firearm Sound Library,” Mar. 2014. [Online]. Available: https://opengameart.org/content/the-free-firearm-sound-library

2014

[23] [23]

Certus Caliber Classification Gunshot Dataset (C3GD),

S. Gurny and R. Quinn, “Certus Caliber Classification Gunshot Dataset (C3GD),” May 2026. [Online]. Available: https://zenodo.org/records/ 20274400

2026

[24] [24]

Bag of Tricks for Image Classification with Convolutional Neural Networks,

T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of Tricks for Image Classification with Convolutional Neural Networks,” Dec

[25] [25]

Bag of Tricks for Image Classification with Convolutional Neural Networks

[Online]. Available: https://arxiv.org/abs/1812.01187v2

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

J. Salamon and J. P. Bello, “Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification,”IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, Mar. 2017, arXiv:1608.04363 [cs]. [Online]. Available: http://arxiv.org/abs/1608. 04363

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

GitHub - iver56/audiomentations: A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab. · GitHub

“GitHub - iver56/audiomentations: A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab. · GitHub.” [Online]. Available: https://github.com/iver56/audiomentations

[28] [28]

O’Shaughnessy,Speech Communication: Human and Machine

D. O’Shaughnessy,Speech Communication: Human and Machine. Addison-Wesley Publishing Company, 1987, google-Books-ID: mH- FQAAAAMAAJ

1987

[29] [29]

Environmental sound classification with convolutional neural networks,

K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). Boston, MA, USA: IEEE, Sep. 2015, pp. 1–6. [Online]. Available: http: //ieeexplore.ieee.org/document/7324337/

work page arXiv 2015

[30] [30]

C.-h. Chen,Pattern Recognition and Artificial Intelligence: Proceedings of the Joint Workshop on Pattern Recognition and Artificial Intelligence, Held at Hyannis, Massachusetts, June 1-3, 1976. Academic Press, 1976, google-Books-ID: wW9QAAAAMAAJ

1976

[31] [31]

Representing Audio Data: An In-Depth Look at STFT and MFCC

“Representing Audio Data: An In-Depth Look at STFT and MFCC.” [Online]. Available: https://www.ideas2it.com/blogs/mfcc-stft- from-audio-data

[32] [32]

On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition,

V . Tyagi and C. Wellekens, “On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition,” inProceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol. 1, Mar. 2005, pp. I/529–I/532 V ol. 1, iSSN: 2379-190X. [Online]. Available: https://ieeexplore.ieee.org/document/1415167

work page arXiv 2005

[33] [33]

Towards an Indoor Gunshot Detection and Notification System Using Deep Learning,

T. Khan, “Towards an Indoor Gunshot Detection and Notification System Using Deep Learning,”Applied System Innovation, vol. 6, no. 5, p. 94, Oct. 2023. [Online]. Available: https://www.mdpi.com/2571-5577/6/5/94

2023

[34] [34]

Efficient Feature Set Developed for Acoustic Gunshot Detection in Open Space,

M. Sigmund and M. Hrabina, “Efficient Feature Set Developed for Acoustic Gunshot Detection in Open Space,”Elektronika ir Elektrotechnika, vol. 27, no. 4, pp. 62–68, Aug. 2021. [Online]. Available: https://eejournal.ktu.lt/index.php/elt/article/view/28877

2021

[35] [35]

Choice of Hop Size | Spectral Audio Signal Processing

“Choice of Hop Size | Spectral Audio Signal Processing.” [Online]. Available: https://dsprelated.com/freebooks/sasp/Choice Hop Size.html

[36] [36]

Machine Learning Analysis on Gunshot Recognition,

S. B. Nesar, B. M. Whitaker, and R. C. Maher, “Machine Learning Analysis on Gunshot Recognition,” in2024 Intermountain Engineering, Technology and Computing (IETC). Logan, UT, USA: IEEE, May 2024, pp. 249–254. [Online]. Available: https: //ieeexplore.ieee.org/document/10564263/

work page arXiv 2024

[37] [37]

Measurements, Analysis, Classification, and Detection of Gunshot and Gunshot-like Sounds,

R. B. Singh and H. Zhuang, “Measurements, Analysis, Classification, and Detection of Gunshot and Gunshot-like Sounds,”Sensors, vol. 22, no. 23, p. 9170, Nov. 2022. [Online]. Available: https://www.mdpi.com/1424- 8220/22/23/9170

2022

[38] [38]

Sound Event Detection: A Tutorial,

A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley, “Sound Event Detection: A Tutorial,”IEEE Signal Processing Magazine, vol. 38, no. 5, pp. 67–83, Sep. 2021, arXiv:2107.05463 [eess.AS]. [Online]. Available: http://arxiv.org/abs/2107.05463

work page arXiv 2021