Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation
Pith reviewed 2026-06-30 08:27 UTC · model grok-4.3
The pith
A curated underwater audio dataset and margin-enhanced loss with feature alignment improve zero-shot ship detection by 42.6 percent under domain shift.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that their margin-enhanced loss with feature alignment, when applied to a CNN trained on the new curated underwater dataset, delivers a 42.60 percent improvement in zero-shot ship detection on the ShipsEar dataset relative to the baseline model, thereby demonstrating increased robustness to distribution mismatch.
What carries the argument
Margin-enhanced loss with feature alignment, which mitigates class confusion arising from data imbalance, acoustic similarity, and cross-domain mismatch.
If this is right
- The released dataset supplies additional training material for models operating in data-limited underwater settings.
- The curation pipeline enables reproducible experiments on imbalance mitigation and domain adaptation.
- The benchmark supports systematic comparison of cross-domain generalization in underwater acoustic classification.
- The observed robustness gain indicates that the approach can help surveillance systems handle real acoustic distribution shifts without target-domain labels.
Where Pith is reading between the lines
- The same alignment step might transfer to other audio classification tasks that face similar imbalance and domain-shift problems.
- Combining the method with additional adaptation techniques could further close the remaining performance gap between in-domain and out-of-domain accuracy.
- Evaluating the pipeline on a wider collection of underwater recording environments would test whether the robustness holds beyond the two datasets examined.
- Making the curation code public lowers the effort required for other groups to create comparable labeled resources.
Load-bearing premise
The curated labels are accurate and the 42.60 percent gain on ShipsEar is caused by the margin-enhanced loss and feature alignment rather than unstated data choices or tuning.
What would settle it
Independent re-labeling of the ShipsEar test segments followed by re-running both the baseline and the proposed method and finding no meaningful improvement from the alignment step would falsify the central performance claim.
read the original abstract
Machine learning for underwater acoustics is constrained by the scarcity of publicly available labeled datasets. In contrast to air-acoustic domains, where large benchmarks enable rapid model development, underwater datasets are typically small and limited in acoustic diversity, restricting robust model training and cross-domain generalization. To help address this gap, we introduce a curated underwater audio dataset derived from an open-source maritime sound archive. The dataset contains over one thousand labeled audio segments across eight biologically and mechanically relevant acoustic classes, providing an additional resource for training models in data-limited underwater environments. Additionally, we establish a lightweight Convolutional Neural Network (CNN) baseline and propose a margin-enhanced loss with feature alignment to mitigate class confusion arising from data imbalance, acoustic similarity, and cross-domain mismatch. While the baseline achieves 96.35% in-domain accuracy, evaluation on ShipsEar reveals substantial domain shift; the proposed feature alignment improve zero-shot ship detection by 42.60%, demonstrating stronger robustness under distribution mismatch. We further release a transparent curation pipeline and reproducible benchmark to support future research on imbalance mitigation, domain adaptation, and data-efficient underwater acoustic classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a curated underwater audio dataset of over 1000 labeled segments across eight acoustic classes derived from an open maritime archive. It establishes a lightweight CNN baseline achieving 96.35% in-domain accuracy and proposes a margin-enhanced loss together with feature alignment to mitigate imbalance and cross-domain mismatch. The central empirical claim is that the proposed feature alignment yields a 42.60% improvement in zero-shot ship detection on the external ShipsEar dataset.
Significance. If the 42.60% zero-shot gain can be shown to arise specifically from the margin-enhanced loss and feature alignment (rather than from curation or selection effects), the work would supply a useful public resource for data-scarce underwater acoustics and a practical technique for improving robustness under distribution shift. Release of the curation pipeline supports reproducibility in the domain.
major comments (2)
- [Abstract] Abstract: the claim that feature alignment improves zero-shot ship detection by 42.60% is presented without (a) baseline CNN results on identical ShipsEar partitions, (b) an ablation that removes only the alignment term, or (c) error bars or statistical tests. These controls are required to attribute the gain to the proposed loss and alignment rather than to unstated choices in dataset curation, labeling, or splitting.
- [Abstract] Abstract: no dataset statistics (class counts, segment durations, label-verification procedure) or validation protocol (cross-validation folds, hyperparameter search) are supplied. Without these, the 96.35% in-domain accuracy and the cross-domain improvement cannot be independently verified.
Simulated Author's Rebuttal
Thank you for the thorough review of our manuscript arXiv:2606.28988. We have carefully considered the major comments and provide point-by-point responses below. We agree that additional details and controls are needed to strengthen the claims and will incorporate revisions in the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that feature alignment improves zero-shot ship detection by 42.60% is presented without (a) baseline CNN results on identical ShipsEar partitions, (b) an ablation that removes only the alignment term, or (c) error bars or statistical tests. These controls are required to attribute the gain to the proposed loss and alignment rather than to unstated choices in dataset curation, labeling, or splitting.
Authors: We acknowledge this point. While the manuscript body includes comparisons, the abstract does not detail the baselines or ablations. To address this, we will revise the abstract to reference the specific controls and add a dedicated ablation study in the experiments section that isolates the feature alignment term. We will also report results with error bars and appropriate statistical tests on the ShipsEar evaluation using the same partitions. This will allow clear attribution of the 42.60% gain. revision: yes
-
Referee: [Abstract] Abstract: no dataset statistics (class counts, segment durations, label-verification procedure) or validation protocol (cross-validation folds, hyperparameter search) are supplied. Without these, the 96.35% in-domain accuracy and the cross-domain improvement cannot be independently verified.
Authors: We agree that these details are crucial for reproducibility. The current manuscript provides high-level description of the dataset (over 1000 segments, eight classes) but lacks the requested specifics. In the revised version, we will include a table with class counts and average segment durations, describe the label verification process, specify the cross-validation procedure (e.g., number of folds), and detail the hyperparameter search strategy used for the baseline CNN. This will enable independent verification of the reported accuracies. revision: yes
Circularity Check
No significant circularity; cross-domain gain measured on external ShipsEar dataset
full rationale
The paper's central claims rest on a new curated dataset for training plus quantitative evaluation of a CNN baseline versus margin-enhanced loss + feature alignment on the independent external ShipsEar corpus. The 96.35% in-domain accuracy and 42.60% zero-shot lift are reported as empirical measurements on held-out partitions and a distinct domain, not as quantities defined by construction from the same fitted parameters. No self-citation chain, ansatz smuggling, or renaming of known results is invoked to support the headline numbers. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- margin hyperparameter
axioms (1)
- domain assumption Labels assigned during curation are accurate and consistent across the eight acoustic classes.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Machine learning (ML) has become a key technique for un- derwater acoustic analysis, supporting applications such as sensor monitoring, vessel detection, maritime surveillance, and sound source classification and localization [1, 2, 3, 4, 5, 6, 7]. Despite these advances, progress remains limited by the scarcity of publicly available underwat...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Audio Curation and Segmentation Procedure 2.1.1
METHODOLOGY 2.1. Audio Curation and Segmentation Procedure 2.1.1. Data Acquisition and Manual Curation The raw audio recordings were obtained from [18]. Because the original recordings contained introductory material, si- lence periods, and unrelated acoustic content, each file was manually inspected prior to segmentation. Audio portions not corresponding...
-
[3]
We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al
and sonar (class 6), primarily due to class imbalance and spectral similarity. We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al. [25]. For ship samples, we enforce a minimum margin between the ship logit and the competing whale and sonar logits:...
-
[4]
Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively
RESULTS AND ANALYSIS 3.1. Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively. Ship detection rate is defined as the recall of the ship class: Recallship = T P T P+F N .(13) The F1 score combines precision and recall: F1 = 2P R P+R ,(14) where P= T P T P+F P ,...
-
[5]
Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization
CONCLUSIONS Despite the improvements achieved with margin-enhanced training, limitations still remain. Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization. In addition, the transfer evaluation relies on direct model reuse without...
-
[6]
Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,
B. A. Muthu and C. Cherubini, “Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,”Water, vol. 17, no. 9, 2025
2025
-
[7]
A transformer-based deep learning network for underwater acoustic target recognition,
S. Feng and X. Zhu, “A transformer-based deep learning network for underwater acoustic target recognition,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022
2022
-
[8]
Underwater acoustic signal classification using hierarchical audio transformer with noisy input,
Q. T. V o and D. K. Han, “Underwater acoustic signal classification using hierarchical audio transformer with noisy input,” inIEEE 33rd Workshop on Machine Learning for Signal Processing Conference, 2023, pp. 1–6
2023
-
[9]
Modulation recognition of underwater acoustic signals using deep hybrid neural networks,
W. Zhang, X. tong Yang, C. Leng, J. Wang, and S. Mao, “Modulation recognition of underwater acoustic signals using deep hybrid neural networks,”IEEE Transactions on Wireless Communications, vol. PP, pp. 1–1, 2022
2022
-
[10]
Adaptive control attention network for underwater acoustic localization and domain adaptation,
Q. T. V o, J. Woods, P. Chowdhury, and D. K. Han, “Adaptive control attention network for underwater acoustic localization and domain adaptation,” in2025 33rd European Signal Processing Conference (EUSIPCO). IEEE, 2025, pp. 1–5
2025
-
[11]
Underwater target detection and localization with feature map and cnn-based classification,
T. Guo, Y . Song, Z. Kong, E. Lim, M. L ´opez-Ben´ıtez, F. Ma, and L. Yu, “Underwater target detection and localization with feature map and cnn-based classification,”4th Int. Conf. CTISC, 2022
2022
-
[12]
Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,
Q. T. V o and D. K. Han, “Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,” inICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, pp. 14 702–14 706
2026
-
[13]
Esc: Dataset for environmental sound classification,
K. J. Piczak, “Esc: Dataset for environmental sound classification,” inProceedings of the 23rd ACM international conference on Multi- media, 2015, pp. 1015–1018
2015
-
[14]
Musical genre classification of audio signals,
G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,”IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002
2002
-
[15]
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,”arXiv preprint arXiv:1212.0402, 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[16]
Activitynet: A large-scale video benchmark for human activity un- derstanding,
F. Caba Heilbron, V . Escorcia, B. Ghanem, and J. Carlos Niebles, “Activitynet: A large-scale video benchmark for human activity un- derstanding,” inProceedings of the ieee conference on computer vi- sion and pattern recognition, 2015, pp. 961–970
2015
-
[17]
Sounding the call for a global library of underwater biological sounds,
M. J. Parsons, T.-H. Lin, T. A. Mooney, C. Erbe, F. Juanes, M. Lam- mers, S. Li, S. Linke, A. Looby, S. L. Nedelecet al., “Sounding the call for a global library of underwater biological sounds,”Frontiers in Ecology and Evolution, vol. 10, p. 810156, 2022
2022
-
[18]
Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,
Y . Chen, H. Liu, P. Song, L. Dai, X. Zhang, R. Ding, and S. Li, “Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,”Neurocomputing, vol. 528, pp. 20–34, 2021
2021
-
[19]
Domain-robust marine plastic detection using vision models,
S. Kataria, “Domain-robust marine plastic detection using vision models,”ArXiv, vol. abs/2510.03294, 2025
-
[20]
Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,
Y . Xie, J. Ren, and J. Xu, “Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,”Expert Syst. Appl., vol. 249, p. 123431, 2024
2024
-
[21]
The expansion of data science: Dataset stan- dardization,
N. Pessanha Santos, “The expansion of data science: Dataset stan- dardization,”Standards, vol. 3, no. 4, pp. 400–410, 2023
2023
-
[22]
Feature analysis of pas- sive underwater targets recognition based on deep neural network,
J. Ren, Z. Huang, C. Li, X. Guo, and J. Xu, “Feature analysis of pas- sive underwater targets recognition based on deep neural network,” inOCEANS 2019-marseille. IEEE, 2019, pp. 1–5
2019
-
[23]
Historic naval sound and video,
“Historic naval sound and video,” https://maritime.org/sound/#calls, accessed: 2026-02-27
2026
-
[24]
Quantization mimic: Towards very tiny cnn for object detection,
Y . Wei, X. Pan, H. Qin, W. Ouyang, and J. Yan, “Quantization mimic: Towards very tiny cnn for object detection,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 267– 283
2018
-
[25]
Shipsear: An underwater vessel noise database,
D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016
2016
-
[26]
pydub: Manipulate audio with a simple and easy high- level interface,
J. Robert, “pydub: Manipulate audio with a simple and easy high- level interface,” https://pypi.org/project/pydub/, 2018, python library for audio manipulation
2018
-
[27]
Sound event detection and localization with distance estimation,
D. A. Krause, A. Politis, and A. Mesaros, “Sound event detection and localization with distance estimation,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, 2024, pp. 286–290
2024
-
[28]
Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,
Q. T. V o and D. K. Han, “Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,” DCASE2024 Challenge, Tech. Rep., June 2024
2024
-
[29]
Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,
K. R. M. Fernando and C. P. Tsokos, “Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2021
2021
-
[30]
Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,
D. Roy, R. Pramanik, and R. Sarkar, “Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,”IEEE Transactions on AI, vol. 5, pp. 776–785, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.