Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

David K. Han; Quoc Thinh Vo

arxiv: 2606.28988 · v1 · pith:RCNTQPTZnew · submitted 2026-06-27 · 💻 cs.SD · eess.AS· eess.SP

Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

Quoc Thinh Vo , David K. Han This is my paper

Pith reviewed 2026-06-30 08:27 UTC · model grok-4.3

classification 💻 cs.SD eess.ASeess.SP

keywords underwater acousticsaudio datasetdomain adaptationship detectionconvolutional neural networkmargin-enhanced lossfeature alignmentcross-domain evaluation

0 comments

The pith

A curated underwater audio dataset and margin-enhanced loss with feature alignment improve zero-shot ship detection by 42.6 percent under domain shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses data scarcity in underwater acoustics by introducing a curated dataset of over one thousand labeled audio segments spanning eight classes drawn from a maritime archive. A lightweight CNN baseline reaches 96.35 percent accuracy inside the new dataset yet suffers clear performance drops when evaluated on the ShipsEar collection because of distribution mismatch. Adding a margin-enhanced loss together with feature alignment reduces confusion caused by imbalance and acoustic similarity, producing a 42.60 percent gain in zero-shot ship detection. The work also supplies a transparent curation pipeline and benchmark to support further work on imbalance handling and cross-domain acoustic classification.

Core claim

The authors establish that their margin-enhanced loss with feature alignment, when applied to a CNN trained on the new curated underwater dataset, delivers a 42.60 percent improvement in zero-shot ship detection on the ShipsEar dataset relative to the baseline model, thereby demonstrating increased robustness to distribution mismatch.

What carries the argument

Margin-enhanced loss with feature alignment, which mitigates class confusion arising from data imbalance, acoustic similarity, and cross-domain mismatch.

If this is right

The released dataset supplies additional training material for models operating in data-limited underwater settings.
The curation pipeline enables reproducible experiments on imbalance mitigation and domain adaptation.
The benchmark supports systematic comparison of cross-domain generalization in underwater acoustic classification.
The observed robustness gain indicates that the approach can help surveillance systems handle real acoustic distribution shifts without target-domain labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment step might transfer to other audio classification tasks that face similar imbalance and domain-shift problems.
Combining the method with additional adaptation techniques could further close the remaining performance gap between in-domain and out-of-domain accuracy.
Evaluating the pipeline on a wider collection of underwater recording environments would test whether the robustness holds beyond the two datasets examined.
Making the curation code public lowers the effort required for other groups to create comparable labeled resources.

Load-bearing premise

The curated labels are accurate and the 42.60 percent gain on ShipsEar is caused by the margin-enhanced loss and feature alignment rather than unstated data choices or tuning.

What would settle it

Independent re-labeling of the ShipsEar test segments followed by re-running both the baseline and the proposed method and finding no meaningful improvement from the alignment step would falsify the central performance claim.

read the original abstract

Machine learning for underwater acoustics is constrained by the scarcity of publicly available labeled datasets. In contrast to air-acoustic domains, where large benchmarks enable rapid model development, underwater datasets are typically small and limited in acoustic diversity, restricting robust model training and cross-domain generalization. To help address this gap, we introduce a curated underwater audio dataset derived from an open-source maritime sound archive. The dataset contains over one thousand labeled audio segments across eight biologically and mechanically relevant acoustic classes, providing an additional resource for training models in data-limited underwater environments. Additionally, we establish a lightweight Convolutional Neural Network (CNN) baseline and propose a margin-enhanced loss with feature alignment to mitigate class confusion arising from data imbalance, acoustic similarity, and cross-domain mismatch. While the baseline achieves 96.35% in-domain accuracy, evaluation on ShipsEar reveals substantial domain shift; the proposed feature alignment improve zero-shot ship detection by 42.60%, demonstrating stronger robustness under distribution mismatch. We further release a transparent curation pipeline and reproducible benchmark to support future research on imbalance mitigation, domain adaptation, and data-efficient underwater acoustic classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The curated underwater dataset is the solid addition here; the 42.60% zero-shot gain still needs ablations to pin it on the margin loss and alignment rather than curation choices.

read the letter

The paper curates over 1000 labeled segments from an open maritime archive into eight classes relevant to ships, biology, and mechanics. That is the clearest new thing: a public resource in a domain where labeled underwater audio has been scarce.

They pair it with a standard CNN baseline plus a margin-enhanced loss and feature alignment step. In-domain accuracy hits 96.35%. On the separate ShipsEar set the alignment term is credited with a 42.60% lift in zero-shot ship detection.

The curation pipeline and benchmark release are useful. Anyone working on marine surveillance or bioacoustics now has another starting point and can check the exact splits and processing steps.

The soft spot is the causal claim for the 42.60% number. The abstract states the improvement but does not show an ablation that removes only the alignment term, a direct baseline comparison on identical partitions, or label-verification counts. If the curation or split inadvertently favored clearer examples, the gain could trace to data handling instead of the loss or alignment. That uncertainty is the main limit on how far the result can be trusted without more controls.

This is for people who need more underwater training data or who want to test domain-adaptation tricks in audio. A reader focused on dataset papers or incremental robustness work will find it worth a look.

It should go to peer review. The dataset contribution is concrete and the cross-domain experiment is a reasonable next step even if the method attribution needs tightening.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a curated underwater audio dataset of over 1000 labeled segments across eight acoustic classes derived from an open maritime archive. It establishes a lightweight CNN baseline achieving 96.35% in-domain accuracy and proposes a margin-enhanced loss together with feature alignment to mitigate imbalance and cross-domain mismatch. The central empirical claim is that the proposed feature alignment yields a 42.60% improvement in zero-shot ship detection on the external ShipsEar dataset.

Significance. If the 42.60% zero-shot gain can be shown to arise specifically from the margin-enhanced loss and feature alignment (rather than from curation or selection effects), the work would supply a useful public resource for data-scarce underwater acoustics and a practical technique for improving robustness under distribution shift. Release of the curation pipeline supports reproducibility in the domain.

major comments (2)

[Abstract] Abstract: the claim that feature alignment improves zero-shot ship detection by 42.60% is presented without (a) baseline CNN results on identical ShipsEar partitions, (b) an ablation that removes only the alignment term, or (c) error bars or statistical tests. These controls are required to attribute the gain to the proposed loss and alignment rather than to unstated choices in dataset curation, labeling, or splitting.
[Abstract] Abstract: no dataset statistics (class counts, segment durations, label-verification procedure) or validation protocol (cross-validation folds, hyperparameter search) are supplied. Without these, the 96.35% in-domain accuracy and the cross-domain improvement cannot be independently verified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the thorough review of our manuscript arXiv:2606.28988. We have carefully considered the major comments and provide point-by-point responses below. We agree that additional details and controls are needed to strengthen the claims and will incorporate revisions in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that feature alignment improves zero-shot ship detection by 42.60% is presented without (a) baseline CNN results on identical ShipsEar partitions, (b) an ablation that removes only the alignment term, or (c) error bars or statistical tests. These controls are required to attribute the gain to the proposed loss and alignment rather than to unstated choices in dataset curation, labeling, or splitting.

Authors: We acknowledge this point. While the manuscript body includes comparisons, the abstract does not detail the baselines or ablations. To address this, we will revise the abstract to reference the specific controls and add a dedicated ablation study in the experiments section that isolates the feature alignment term. We will also report results with error bars and appropriate statistical tests on the ShipsEar evaluation using the same partitions. This will allow clear attribution of the 42.60% gain. revision: yes
Referee: [Abstract] Abstract: no dataset statistics (class counts, segment durations, label-verification procedure) or validation protocol (cross-validation folds, hyperparameter search) are supplied. Without these, the 96.35% in-domain accuracy and the cross-domain improvement cannot be independently verified.

Authors: We agree that these details are crucial for reproducibility. The current manuscript provides high-level description of the dataset (over 1000 segments, eight classes) but lacks the requested specifics. In the revised version, we will include a table with class counts and average segment durations, describe the label verification process, specify the cross-validation procedure (e.g., number of folds), and detail the hyperparameter search strategy used for the baseline CNN. This will enable independent verification of the reported accuracies. revision: yes

Circularity Check

0 steps flagged

No significant circularity; cross-domain gain measured on external ShipsEar dataset

full rationale

The paper's central claims rest on a new curated dataset for training plus quantitative evaluation of a CNN baseline versus margin-enhanced loss + feature alignment on the independent external ShipsEar corpus. The 96.35% in-domain accuracy and 42.60% zero-shot lift are reported as empirical measurements on held-out partitions and a distinct domain, not as quantities defined by construction from the same fitted parameters. No self-citation chain, ansatz smuggling, or renaming of known results is invoked to support the headline numbers. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the work rests on standard supervised ML assumptions plus one domain-specific assumption about label quality; no invented physical entities appear.

free parameters (1)

margin hyperparameter
Margin-enhanced loss requires selection or tuning of a margin value that is not derived from first principles.

axioms (1)

domain assumption Labels assigned during curation are accurate and consistent across the eight acoustic classes.
All reported accuracies and the 42.60% improvement presuppose correct ground-truth labels in the new dataset.

pith-pipeline@v0.9.1-grok · 5731 in / 1339 out tokens · 45087 ms · 2026-06-30T08:27:03.454640+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

INTRODUCTION Machine learning (ML) has become a key technique for un- derwater acoustic analysis, supporting applications such as sensor monitoring, vessel detection, maritime surveillance, and sound source classification and localization [1, 2, 3, 4, 5, 6, 7]. Despite these advances, progress remains limited by the scarcity of publicly available underwat...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Audio Curation and Segmentation Procedure 2.1.1

METHODOLOGY 2.1. Audio Curation and Segmentation Procedure 2.1.1. Data Acquisition and Manual Curation The raw audio recordings were obtained from [18]. Because the original recordings contained introductory material, si- lence periods, and unrelated acoustic content, each file was manually inspected prior to segmentation. Audio portions not corresponding...
[3]

We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al

and sonar (class 6), primarily due to class imbalance and spectral similarity. We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al. [25]. For ship samples, we enforce a minimum margin between the ship logit and the competing whale and sonar logits:...
[4]

Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively

RESULTS AND ANALYSIS 3.1. Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively. Ship detection rate is defined as the recall of the ship class: Recallship = T P T P+F N .(13) The F1 score combines precision and recall: F1 = 2P R P+R ,(14) where P= T P T P+F P ,...
[5]

Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization

CONCLUSIONS Despite the improvements achieved with margin-enhanced training, limitations still remain. Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization. In addition, the transfer evaluation relies on direct model reuse without...
[6]

Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,

B. A. Muthu and C. Cherubini, “Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,”Water, vol. 17, no. 9, 2025

2025
[7]

A transformer-based deep learning network for underwater acoustic target recognition,

S. Feng and X. Zhu, “A transformer-based deep learning network for underwater acoustic target recognition,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

2022
[8]

Underwater acoustic signal classification using hierarchical audio transformer with noisy input,

Q. T. V o and D. K. Han, “Underwater acoustic signal classification using hierarchical audio transformer with noisy input,” inIEEE 33rd Workshop on Machine Learning for Signal Processing Conference, 2023, pp. 1–6

2023
[9]

Modulation recognition of underwater acoustic signals using deep hybrid neural networks,

W. Zhang, X. tong Yang, C. Leng, J. Wang, and S. Mao, “Modulation recognition of underwater acoustic signals using deep hybrid neural networks,”IEEE Transactions on Wireless Communications, vol. PP, pp. 1–1, 2022

2022
[10]

Adaptive control attention network for underwater acoustic localization and domain adaptation,

Q. T. V o, J. Woods, P. Chowdhury, and D. K. Han, “Adaptive control attention network for underwater acoustic localization and domain adaptation,” in2025 33rd European Signal Processing Conference (EUSIPCO). IEEE, 2025, pp. 1–5

2025
[11]

Underwater target detection and localization with feature map and cnn-based classification,

T. Guo, Y . Song, Z. Kong, E. Lim, M. L ´opez-Ben´ıtez, F. Ma, and L. Yu, “Underwater target detection and localization with feature map and cnn-based classification,”4th Int. Conf. CTISC, 2022

2022
[12]

Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,

Q. T. V o and D. K. Han, “Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,” inICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, pp. 14 702–14 706

2026
[13]

Esc: Dataset for environmental sound classification,

K. J. Piczak, “Esc: Dataset for environmental sound classification,” inProceedings of the 23rd ACM international conference on Multi- media, 2015, pp. 1015–1018

2015
[14]

Musical genre classification of audio signals,

G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,”IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002

2002
[15]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,”arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[16]

Activitynet: A large-scale video benchmark for human activity un- derstanding,

F. Caba Heilbron, V . Escorcia, B. Ghanem, and J. Carlos Niebles, “Activitynet: A large-scale video benchmark for human activity un- derstanding,” inProceedings of the ieee conference on computer vi- sion and pattern recognition, 2015, pp. 961–970

2015
[17]

Sounding the call for a global library of underwater biological sounds,

M. J. Parsons, T.-H. Lin, T. A. Mooney, C. Erbe, F. Juanes, M. Lam- mers, S. Li, S. Linke, A. Looby, S. L. Nedelecet al., “Sounding the call for a global library of underwater biological sounds,”Frontiers in Ecology and Evolution, vol. 10, p. 810156, 2022

2022
[18]

Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,

Y . Chen, H. Liu, P. Song, L. Dai, X. Zhang, R. Ding, and S. Li, “Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,”Neurocomputing, vol. 528, pp. 20–34, 2021

2021
[19]

Domain-robust marine plastic detection using vision models,

S. Kataria, “Domain-robust marine plastic detection using vision models,”ArXiv, vol. abs/2510.03294, 2025

work page arXiv 2025
[20]

Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,

Y . Xie, J. Ren, and J. Xu, “Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,”Expert Syst. Appl., vol. 249, p. 123431, 2024

2024
[21]

The expansion of data science: Dataset stan- dardization,

N. Pessanha Santos, “The expansion of data science: Dataset stan- dardization,”Standards, vol. 3, no. 4, pp. 400–410, 2023

2023
[22]

Feature analysis of pas- sive underwater targets recognition based on deep neural network,

J. Ren, Z. Huang, C. Li, X. Guo, and J. Xu, “Feature analysis of pas- sive underwater targets recognition based on deep neural network,” inOCEANS 2019-marseille. IEEE, 2019, pp. 1–5

2019
[23]

Historic naval sound and video,

“Historic naval sound and video,” https://maritime.org/sound/#calls, accessed: 2026-02-27

2026
[24]

Quantization mimic: Towards very tiny cnn for object detection,

Y . Wei, X. Pan, H. Qin, W. Ouyang, and J. Yan, “Quantization mimic: Towards very tiny cnn for object detection,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 267– 283

2018
[25]

Shipsear: An underwater vessel noise database,

D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016

2016
[26]

pydub: Manipulate audio with a simple and easy high- level interface,

J. Robert, “pydub: Manipulate audio with a simple and easy high- level interface,” https://pypi.org/project/pydub/, 2018, python library for audio manipulation

2018
[27]

Sound event detection and localization with distance estimation,

D. A. Krause, A. Politis, and A. Mesaros, “Sound event detection and localization with distance estimation,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, 2024, pp. 286–290

2024
[28]

Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,

Q. T. V o and D. K. Han, “Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,” DCASE2024 Challenge, Tech. Rep., June 2024

2024
[29]

Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,

K. R. M. Fernando and C. P. Tsokos, “Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2021

2021
[30]

Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,

D. Roy, R. Pramanik, and R. Sarkar, “Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,”IEEE Transactions on AI, vol. 5, pp. 776–785, 2024

2024

[1] [1]

Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

INTRODUCTION Machine learning (ML) has become a key technique for un- derwater acoustic analysis, supporting applications such as sensor monitoring, vessel detection, maritime surveillance, and sound source classification and localization [1, 2, 3, 4, 5, 6, 7]. Despite these advances, progress remains limited by the scarcity of publicly available underwat...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Audio Curation and Segmentation Procedure 2.1.1

METHODOLOGY 2.1. Audio Curation and Segmentation Procedure 2.1.1. Data Acquisition and Manual Curation The raw audio recordings were obtained from [18]. Because the original recordings contained introductory material, si- lence periods, and unrelated acoustic content, each file was manually inspected prior to segmentation. Audio portions not corresponding...

[3] [3]

We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al

and sonar (class 6), primarily due to class imbalance and spectral similarity. We introduceCE-PlusPairMargin, which augments the weighted cross-entropy with margin-based logit constraints inspired by the Adaptive-Weighted-Loss of Roy et al. [25]. For ship samples, we enforce a minimum margin between the ship logit and the competing whale and sonar logits:...

[4] [4]

Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively

RESULTS AND ANALYSIS 3.1. Evaluation Metrics LetT P,T N,F P, andF Ndenote the number of true pos- itives, true negatives, false positives, and false negatives, re- spectively. Ship detection rate is defined as the recall of the ship class: Recallship = T P T P+F N .(13) The F1 score combines precision and recall: F1 = 2P R P+R ,(14) where P= T P T P+F P ,...

[5] [5]

Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization

CONCLUSIONS Despite the improvements achieved with margin-enhanced training, limitations still remain. Differences in vessel charac- teristics, operational behavior, and environmental conditions are not fully represented in the current dataset, limiting cross- domain generalization. In addition, the transfer evaluation relies on direct model reuse without...

[6] [6]

Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,

B. A. Muthu and C. Cherubini, “Underwater digital twin sensor network-based maritime communication and monitoring using ex- ponential hyperbolic crisp adaptive network-based fuzzy inference system,”Water, vol. 17, no. 9, 2025

2025

[7] [7]

A transformer-based deep learning network for underwater acoustic target recognition,

S. Feng and X. Zhu, “A transformer-based deep learning network for underwater acoustic target recognition,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

2022

[8] [8]

Underwater acoustic signal classification using hierarchical audio transformer with noisy input,

Q. T. V o and D. K. Han, “Underwater acoustic signal classification using hierarchical audio transformer with noisy input,” inIEEE 33rd Workshop on Machine Learning for Signal Processing Conference, 2023, pp. 1–6

2023

[9] [9]

Modulation recognition of underwater acoustic signals using deep hybrid neural networks,

W. Zhang, X. tong Yang, C. Leng, J. Wang, and S. Mao, “Modulation recognition of underwater acoustic signals using deep hybrid neural networks,”IEEE Transactions on Wireless Communications, vol. PP, pp. 1–1, 2022

2022

[10] [10]

Adaptive control attention network for underwater acoustic localization and domain adaptation,

Q. T. V o, J. Woods, P. Chowdhury, and D. K. Han, “Adaptive control attention network for underwater acoustic localization and domain adaptation,” in2025 33rd European Signal Processing Conference (EUSIPCO). IEEE, 2025, pp. 1–5

2025

[11] [11]

Underwater target detection and localization with feature map and cnn-based classification,

T. Guo, Y . Song, Z. Kong, E. Lim, M. L ´opez-Ben´ıtez, F. Ma, and L. Yu, “Underwater target detection and localization with feature map and cnn-based classification,”4th Int. Conf. CTISC, 2022

2022

[12] [12]

Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,

Q. T. V o and D. K. Han, “Spiking attention network: A hybrid neuro- morphic approach to underwater acoustic localization and zero-shot adaptation,” inICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, pp. 14 702–14 706

2026

[13] [13]

Esc: Dataset for environmental sound classification,

K. J. Piczak, “Esc: Dataset for environmental sound classification,” inProceedings of the 23rd ACM international conference on Multi- media, 2015, pp. 1015–1018

2015

[14] [14]

Musical genre classification of audio signals,

G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,”IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002

2002

[15] [15]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,”arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[16] [16]

Activitynet: A large-scale video benchmark for human activity un- derstanding,

F. Caba Heilbron, V . Escorcia, B. Ghanem, and J. Carlos Niebles, “Activitynet: A large-scale video benchmark for human activity un- derstanding,” inProceedings of the ieee conference on computer vi- sion and pattern recognition, 2015, pp. 961–970

2015

[17] [17]

Sounding the call for a global library of underwater biological sounds,

M. J. Parsons, T.-H. Lin, T. A. Mooney, C. Erbe, F. Juanes, M. Lam- mers, S. Li, S. Linke, A. Looby, S. L. Nedelecet al., “Sounding the call for a global library of underwater biological sounds,”Frontiers in Ecology and Evolution, vol. 10, p. 810156, 2022

2022

[18] [18]

Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,

Y . Chen, H. Liu, P. Song, L. Dai, X. Zhang, R. Ding, and S. Li, “Achieving domain generalization for underwater object detection by domain mixup and contrastive learning,”Neurocomputing, vol. 528, pp. 20–34, 2021

2021

[19] [19]

Domain-robust marine plastic detection using vision models,

S. Kataria, “Domain-robust marine plastic detection using vision models,”ArXiv, vol. abs/2510.03294, 2025

work page arXiv 2025

[20] [20]

Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,

Y . Xie, J. Ren, and J. Xu, “Unraveling complex data diversity in un- derwater acoustic target recognition through convolution-based mix- ture of experts,”Expert Syst. Appl., vol. 249, p. 123431, 2024

2024

[21] [21]

The expansion of data science: Dataset stan- dardization,

N. Pessanha Santos, “The expansion of data science: Dataset stan- dardization,”Standards, vol. 3, no. 4, pp. 400–410, 2023

2023

[22] [22]

Feature analysis of pas- sive underwater targets recognition based on deep neural network,

J. Ren, Z. Huang, C. Li, X. Guo, and J. Xu, “Feature analysis of pas- sive underwater targets recognition based on deep neural network,” inOCEANS 2019-marseille. IEEE, 2019, pp. 1–5

2019

[23] [23]

Historic naval sound and video,

“Historic naval sound and video,” https://maritime.org/sound/#calls, accessed: 2026-02-27

2026

[24] [24]

Quantization mimic: Towards very tiny cnn for object detection,

Y . Wei, X. Pan, H. Qin, W. Ouyang, and J. Yan, “Quantization mimic: Towards very tiny cnn for object detection,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 267– 283

2018

[25] [25]

Shipsear: An underwater vessel noise database,

D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016

2016

[26] [26]

pydub: Manipulate audio with a simple and easy high- level interface,

J. Robert, “pydub: Manipulate audio with a simple and easy high- level interface,” https://pypi.org/project/pydub/, 2018, python library for audio manipulation

2018

[27] [27]

Sound event detection and localization with distance estimation,

D. A. Krause, A. Politis, and A. Mesaros, “Sound event detection and localization with distance estimation,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, 2024, pp. 286–290

2024

[28] [28]

Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,

Q. T. V o and D. K. Han, “Resnet-conformer network with shared weights and attention mechanism for sound event localization, detec- tion, and distance estimation,” DCASE2024 Challenge, Tech. Rep., June 2024

2024

[29] [29]

Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,

K. R. M. Fernando and C. P. Tsokos, “Dynamically weighted bal- anced loss: class imbalanced learning and confidence calibration of deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2021

2021

[30] [30]

Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,

D. Roy, R. Pramanik, and R. Sarkar, “Margin-aware adaptive- weighted-loss for deep learning based imbalanced data classifica- tion,”IEEE Transactions on AI, vol. 5, pp. 776–785, 2024

2024