pith. sign in

arxiv: 2504.13102 · v1 · submitted 2025-04-17 · 💻 cs.SD · cs.AI· eess.AS

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

Pith reviewed 2026-05-22 18:56 UTC · model grok-4.3

classification 💻 cs.SD cs.AIeess.AS
keywords underwater acoustic target recognitionfew-shot learningmulti-task learningchannel attentionconvolutional neural networkmarine bioacousticsWatkins Marine Life Datasetsonar signal processing
0
0 comments X

The pith

A multi-task balanced attention CNN reaches 97 percent accuracy on 27-class few-shot underwater sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that few-shot underwater acoustic target recognition becomes practical when a convolutional network is trained jointly on classification and feature reconstruction while a channel attention layer highlights useful patterns like harmonics and quiets background noise. The central idea is that sharing a feature extractor across these two tasks lets the model learn representations that remain stable even when training examples are scarce and ocean recordings contain heavy interference. Experiments on the Watkins Marine Life Dataset report that this MT-BCA-CNN reaches 97 percent accuracy and 95 percent F1-score across 27 classes, beating both plain CNNs and prior state-of-the-art UATR methods. Ablation results are presented to argue that the attention and multi-task components reinforce each other rather than merely adding independent gains. If the approach holds, it would give marine biologists and sonar operators a concrete way to identify ships or animals from very small sets of labeled recordings.

Core claim

The central claim is that a shared feature extractor inside a CNN, optimized simultaneously for target classification and signal reconstruction, combined with a channel attention mechanism that amplifies discriminative acoustic structures such as harmonics and suppresses noise, produces 97 percent classification accuracy and 95 percent F1-score in 27-class few-shot settings on the Watkins Marine Life Dataset and outperforms standard CNN, ACNN, and existing UATR baselines.

What carries the argument

A shared CNN feature extractor trained under multi-task learning with dynamic task weighting and a channel attention module that reweights feature maps to emphasize harmonic structures.

If this is right

  • Joint optimization of classification and reconstruction yields synergistic gains confirmed by ablation studies on the same dataset.
  • Dynamic weighting during training keeps the two tasks balanced so neither dominates the shared extractor.
  • The resulting model maintains high accuracy even when only a handful of examples per class are available.
  • Performance exceeds both conventional CNNs and prior published UATR methods under identical few-shot conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same joint-classification-plus-reconstruction pattern could be tested on other noisy few-shot audio tasks such as bird calls or industrial fault detection.
  • Evaluating the trained model on continuous ocean recordings rather than pre-segmented clips would show whether the reported accuracy survives real-time streaming conditions.
  • Pairing the architecture with simple spectrogram augmentations might push accuracy still higher in the lowest-data regimes without changing the core design.

Load-bearing premise

The channel attention mechanism can reliably pick out harmonic structures while suppressing noise and that the classification and reconstruction tasks produce mutual benefits rather than conflicting gradients on noisy underwater recordings.

What would settle it

Remove the channel attention module, retrain on the identical 27-class few-shot split of the Watkins dataset, and observe whether accuracy remains above 90 percent or drops to the level of a plain CNN.

Figures

Figures reproduced from arXiv: 2504.13102 by Hao Zhang, Junpeng Lu, Shumeng Sun, Wei Huang, Zhengyang Xiu, Zhenpeng Xu.

Figure 1
Figure 1. Figure 1: Target Recognition Task Process Against this backdrop, the emergence of deep learning technologies has brought new possibilities to underwater audio classification, marking a transformative shift in the field. The general deep-learning-based audio recognition process is illustrated in Figure1, which encompasses the following workflow: target signal acquisition, database creation, data preprocessing, featur… view at source ↗
Figure 2
Figure 2. Figure 2: MT-BCA-CNN Model Architecture 2. Methodology In this paper, we propose a few-shot UATR method, centered on a convolutional neural network model that integrates multi-task learning with a channel attention mechanism. Below, we elaborate on the architecture and implementation details of the proposed MT-BCA-CNN, including the design of the attention module and the multi-task learning training strategy. 2.1. O… view at source ↗
Figure 3
Figure 3. Figure 3: Flowchart of multi-task learning implementation, where 𝜆0 and 𝜆1 denote the weights of the task-specific classifiers, 𝐿1 and 𝐿2 represent the task losses, and 𝐿𝑡𝑜𝑡𝑎𝑙 is the joint loss function applying a sigmoid function to the sum of the outputs from both branches. These weights recalibrate the importance of each channel in the feature map, enhancing the model’s focus on critical channels (e.g., harmonic-… view at source ↗
Figure 4
Figure 4. Figure 4: Examples of raw signal waveforms and their corresponding Mel-spectrogram. (a) Clymene Dolphin-wave. (b) Clymene Dolphin-Mel. (c) Common Dolphin-wave. (d) Common Dolphin-Mel. (e) Beluga White Whale-wave.(f) Beluga White Whale-Mel. Huang et al.: Preprint submitted to Elsevier Page 12 of 18 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) CAM++(Acc:0.62). (b) ERes2Net(Acc:0.63). (c) ResNetSE(Acc:0.78). (d) MT-BCA-CNN(Acc:0.97). Huang et al.: Preprint submitted to Elsevier Page 13 of 18 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Parameter Counts and Performance Among Three Classical Models, Baseline CNN, and Our Proposed MT-BCA-CNN on the Dataset. 3.4. Ablation Studies To validate the effectiveness of our proposed modules (Channel Attention, CA, and Multi-Task Learning, MTL), we conducted a series of ablation experiments to evaluate their impact on classification accuracy. Using our custom dataset, we trained two var… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study results. (a)Only Classify Acc(0.89). (b) CNN Acc(0.91). (c) MT-CNN Acc(0.95). (d) MT-BCA￾CNN Acc(0.97). The results in [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Underwater acoustic target recognition (UATR) is of great significance for the protection of marine diversity and national defense security. The development of deep learning provides new opportunities for UATR, but faces challenges brought by the scarcity of reference samples and complex environmental interference. To address these issues, we proposes a multi-task balanced channel attention convolutional neural network (MT-BCA-CNN). The method integrates a channel attention mechanism with a multi-task learning strategy, constructing a shared feature extractor and multi-task classifiers to jointly optimize target classification and feature reconstruction tasks. The channel attention mechanism dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise. Experiments on the Watkins Marine Life Dataset demonstrate that MT-BCA-CNN achieves 97\% classification accuracy and 95\% $F1$-score in 27-class few-shot scenarios, significantly outperforming traditional CNN and ACNN models, as well as popular state-of-the-art UATR methods. Ablation studies confirm the synergistic benefits of multi-task learning and attention mechanisms, while a dynamic weighting adjustment strategy effectively balances task contributions. This work provides an efficient solution for few-shot underwater acoustic recognition, advancing research in marine bioacoustics and sonar signal processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MT-BCA-CNN, a multi-task balanced channel attention CNN for few-shot underwater acoustic target recognition (UATR). It integrates channel attention for enhancing discriminative features (e.g., harmonics) while suppressing noise, with multi-task learning for joint classification and feature reconstruction using dynamic task weighting. On the Watkins Marine Life Dataset, it reports 97% classification accuracy and 95% F1-score in 27-class few-shot scenarios, outperforming traditional CNN, ACNN, and other SOTA UATR methods, with ablation studies supporting the contributions of attention and multi-task components.

Significance. If the central performance claims hold under proper generalization conditions, the work offers a practical approach to few-shot UATR by showing potential synergies between channel attention and multi-task learning on noisy marine data. The ablation studies and dynamic weighting strategy provide concrete evidence for the design choices, which could inform future bioacoustics and sonar applications if reproducibility is ensured.

major comments (2)
  1. [Experimental section] Experimental section (likely §4 or §5): The description of the data splitting protocol on the Watkins Marine Life Dataset is insufficient. It does not specify whether splits are performed at the recording level (to ensure independence) or at the clip level. Given that the dataset consists of multiple short clips extracted from longer continuous recordings, clip-level random splits risk data leakage via shared background noise, hydrophone artifacts, or call patterns. This directly undermines the validity of the reported 97% accuracy and 95% F1-score in the 27-class few-shot setting and the outperformance claims relative to baselines.
  2. [Results section] Results section (performance tables): No error bars, standard deviations, or statistical significance tests (e.g., paired t-tests or McNemar tests) are reported for the 97% accuracy and 95% F1-score across runs or folds. Without these, the numerical superiority over baselines cannot be assessed as robust rather than due to random variation, weakening support for the central claim.
minor comments (2)
  1. [Abstract] Abstract: The sentence 'we proposes a multi-task...' contains a subject-verb agreement error and should be corrected for clarity.
  2. [Method section] Notation: The dynamic weighting parameters in the multi-task loss are introduced but lack explicit equations or initialization details, making the 'balanced' aspect harder to reproduce.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor that we will address to strengthen the paper. We provide point-by-point responses below and commit to revisions where appropriate.

read point-by-point responses
  1. Referee: [Experimental section] Experimental section (likely §4 or §5): The description of the data splitting protocol on the Watkins Marine Life Dataset is insufficient. It does not specify whether splits are performed at the recording level (to ensure independence) or at the clip level. Given that the dataset consists of multiple short clips extracted from longer continuous recordings, clip-level random splits risk data leakage via shared background noise, hydrophone artifacts, or call patterns. This directly undermines the validity of the reported 97% accuracy and 95% F1-score in the 27-class few-shot setting and the outperformance claims relative to baselines.

    Authors: We agree that the current description of the data splitting protocol is insufficient and could raise concerns about potential data leakage. In the revised manuscript, we will expand the experimental section to explicitly detail that splits were performed at the recording level: all clips derived from the same original continuous recording are assigned to the same train, validation, or test partition. We will also add a brief justification for this choice and, space permitting, include pseudocode or a flowchart illustrating the procedure to ensure reproducibility and independence of samples. revision: yes

  2. Referee: [Results section] Results section (performance tables): No error bars, standard deviations, or statistical significance tests (e.g., paired t-tests or McNemar tests) are reported for the 97% accuracy and 95% F1-score across runs or folds. Without these, the numerical superiority over baselines cannot be assessed as robust rather than due to random variation, weakening support for the central claim.

    Authors: We acknowledge that the absence of variability measures and statistical tests limits the strength of the performance claims. In the revision, we will re-run the experiments with multiple random seeds (or k-fold cross-validation) and report mean accuracy and F1-score along with standard deviations in the tables. We will also add paired t-tests or McNemar tests comparing MT-BCA-CNN against the baselines, with p-values, to demonstrate that the improvements are statistically significant rather than attributable to random variation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; empirical results on external dataset

full rationale

The paper presents an empirical ML architecture (MT-BCA-CNN) with channel attention and multi-task learning, evaluated via accuracy/F1 on the public Watkins Marine Life Dataset against external baselines. No equations, predictions, or first-principles claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central performance numbers are measured outcomes, not algebraically forced, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus one domain-specific modeling choice about feature enhancement; no new physical entities are introduced.

free parameters (1)
  • dynamic task weighting parameters
    Adjusts relative contribution of classification and reconstruction losses during joint training.
axioms (1)
  • domain assumption Channel attention dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise.
    Stated directly in the abstract as the intended behavior of the attention module on underwater signals.

pith-pipeline@v0.9.0 · 5759 in / 1419 out tokens · 62426 ms · 2026-05-22T18:56:41.033124+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

  1. [2]

    Mel frequency cepstral coefficient and its applications: A review

    Abdul, Z.K., Al-Talabani, A.K., 2022b. Mel frequency cepstral coefficient and its applications: A review. IEEE Access 10, 122136–122158. doi:10.1109/ACCESS.2022.3223444

  2. [3]

    Time–frequency signal processing: Today and future

    Akan, A., Karabiber Cura, O., 2021. Time–frequency signal processing: Today and future. Digital Signal Processing 119, 103216. URL: https://www.sciencedirect.com/science/article/pii/S1051200421002554, doi:https://doi.org/10.1016/j.dsp.2021. 103216

  3. [4]

    Bat detective—deep learning tools for bat acoustic signal detection

    Aodha,O.,Gibb,R.,Barlow,K.,Browning,E.,Firman,M.,Freeman,R.,Harder,B.,Kinsey,L.,Mead,G.,Newson,S.,Pandourski,I.,Parsons, S., Russ, J., Szodoray-Parádi, A., Szodoray-Parádi, F., Tilova, E., Girolami, M., Brostow, G., Jones, K., 2018. Bat detective—deep learning tools for bat acoustic signal detection. PLOS Computational Biology 14. doi:10.1371/journal.pcbi...

  4. [5]

    Analysis of recent advancements in support vector machine

    Bist, U.S., Singh, N., 2022. Analysis of recent advancements in support vector machine. Concurrency and Computation: Practice and Experience34,e7270. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.7270,doi: https://doi.org/10.1002/ cpe.7270, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.7270

  5. [6]

    An enhanced res2net with local and global feature fusion for speaker verification

    Chen, Y., Zheng, S., Wang, H., Cheng, L., Chen, Q., Qi, J., 2023. An enhanced res2net with local and global feature fusion for speaker verification. URL:https://arxiv.org/abs/2305.12838, arXiv:2305.12838

  6. [7]

    Vip:Virtualpoolingforacceleratingcnn-basedimageclassificationandobjectdetection

    Chen,Z.,Zhang,J.,Ding,R.,Marculescu,D.,2020. Vip:Virtualpoolingforacceleratingcnn-basedimageclassificationandobjectdetection. URL: https://arxiv.org/abs/1906.07912, arXiv:1906.07912

  7. [9]

    Demystifyingbatchnormalizationinrelunetworks:Equivalent convex optimization models and implicit regularization

    Ergen,T.,Sahiner,A.,Ozturkler,B.,Pauly,J.,Mardani,M.,Pilanci,M.,2022b. Demystifyingbatchnormalizationinrelunetworks:Equivalent convex optimization models and implicit regularization. URL:https://arxiv.org/abs/2103.01499, arXiv:2103.01499

  8. [10]

    Atransformer-baseddeeplearningnetworkforunderwateracoustictargetrecognition

    Feng,S.,Zhu,X.,2022. Atransformer-baseddeeplearningnetworkforunderwateracoustictargetrecognition. IEEEGeoscienceandRemote Sensing Letters 19, 1–5. doi:10.1109/LGRS.2022.3201396

  9. [11]

    Deep learning application in plant stress imaging: a review

    Gao, Z., Luo, Z., Zhang, W., Lv, Z., Xu, Y., 2020. Deep learning application in plant stress imaging: a review. AgriEngineering 2, 430–446. doi:10.3390/agriengineering2030029

  10. [12]

    ORCA-SPYenableskillerwhale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

    Hauer,C.,Nöth,E.,Barnhill,A.,Maier,A.,Guthunz,J.,Hofer,H.,Cheng,R.X.,Barth,V.,Bergler,C.,2023. ORCA-SPYenableskillerwhale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation. Scientific Reports

  11. [13]

    cRIS-Team Scopus Importer:2023-07-21

    doi:10.1038/s41598-023-38132-7. cRIS-Team Scopus Importer:2023-07-21

  12. [14]

    Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. doi:10.1109/CVPR.2018.00745

  13. [15]

    A multi-task learning framework for sound event detection using high-level acoustic characteristics of sounds

    Khandelwal, T., Das, R.K., 2023. A multi-task learning framework for sound event detection using high-level acoustic characteristics of sounds. URL: https://arxiv.org/abs/2305.10729, arXiv:2305.10729

  14. [16]

    3639–3648

    Khattar,A.,Hegde,S.,Hebbalaguppe,R.,2021.Cross-domainmulti-tasklearningforobjectdetectionandsaliencyestimation,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3639–3648

  15. [17]

    Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

    Kong,Q.,Cao,Y.,Iqbal,T.,Xu,Y.,Wang,W.,Plumbley,M.D.,2019. Cross-tasklearningforaudiotagging,soundeventdetectionandspatial localization: Dcase 2019 baseline systems. URL:https://arxiv.org/abs/1904.03476, arXiv:1904.03476

  16. [18]

    Noise robust voice conversion with the fusion of mel-spectrum enhancement and feature disentanglement

    Lele, C., Xiongwei, Z., Meng, S., Xingyu, Z., 2023. Noise robust voice conversion with the fusion of mel-spectrum enhancement and feature disentanglement. ACTA ACUSTICA 48, 1070–1080. URL: https://www.jac.ac.cn/en/article/doi/10.12395/0371-0025. 2022093, doi:10.12395/0371-0025.2022093

  17. [19]

    Multitask learning for acoustic scene classification with topic-based soft labels and a mutual attention mechanism

    Leng, Y., Zhuang, J., Pan, J., Sun, C., 2023. Multitask learning for acoustic scene classification with topic-based soft labels and a mutual attention mechanism. Knowledge-Based Systems 268, 110460. URL: https://www.sciencedirect.com/science/article/pii/ S0950705123002101, doi:https://doi.org/10.1016/j.knosys.2023.110460

  18. [20]

    Underwatertargetrecognitionusingconvolutionalrecurrentneuralnetworkswith3-dmel- spectrogram and data augmentation

    Liu,F.,Shen,T.,Luo,Z.,Zhao,D.,Guo,S.,2021. Underwatertargetrecognitionusingconvolutionalrecurrentneuralnetworkswith3-dmel- spectrogram and data augmentation. Applied Acoustics 178, 107989. URL:https://www.sciencedirect.com/science/article/ pii/S0003682X21000827, doi:https://doi.org/10.1016/j.apacoust.2021.107989

  19. [21]

    A survey of underwater acoustic target recognition methods based on machine learning

    Luo, X., Chen, L., Zhou, H., Cao, H., 2023. A survey of underwater acoustic target recognition methods based on machine learning. Journal of Marine Science and Engineering 11. URL:https://www.mdpi.com/2077-1312/11/2/384, doi:10.3390/jmse11020384

  20. [22]

    Underwateracousticsignalclassificationbasedonsparsetime-frequencyrepresentation and deep learning

    Miao,Y.,Zakharov,Y.,Sun,H.,Li,J.,Wang,J.,2020. Underwateracousticsignalclassificationbasedonsparsetime-frequencyrepresentation and deep learning. IEEE Journal of Oceanic Engineering , 1–14URL:https://eprints.whiterose.ac.uk/id/eprint/167766/. in Press

  21. [23]

    URL:https://onlinelibrary.wiley.com/doi/abs/10.1155/2018/6593037, doi:https://doi.org/ 10.1155/2018/6593037, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2018/6593037

    Mohammed,S.K.,Hariharan,S.M.,Kamal,S.,2018.Agtcc-basedunderwaterhmmtargetclassifierwithfadingchannelcompensation.Journal of Sensors 2018, 6593037. URL:https://onlinelibrary.wiley.com/doi/abs/10.1155/2018/6593037, doi:https://doi.org/ 10.1155/2018/6593037, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2018/6593037

  22. [24]

    Areviewontheattentionmechanismofdeeplearning

    Niu,Z.,Zhong,G.,Yu,H.,2021. Areviewontheattentionmechanismofdeeplearning. Neurocomputing452,48–62. URL: https://www. sciencedirect.com/science/article/pii/S092523122100477X, doi:https://doi.org/10.1016/j.neucom.2021.03.091

  23. [26]

    Comprehensiveunderwaterobjecttrackingbenchmarkdatasetandunderwaterimage enhancement with gan

    Panetta,K.,Kezebou,L.,Oludare,V.,Agaian,S.,2022b. Comprehensiveunderwaterobjecttrackingbenchmarkdatasetandunderwaterimage enhancement with gan. IEEE Journal of Oceanic Engineering 47, 59–75. doi:10.1109/JOE.2021.3086907

  24. [27]

    Time–Frequency Processing: Methods and Tools

    Pulkki, V., Delikaris-Manias, S., Politis, A., 2018. Time–Frequency Processing: Methods and Tools. pp. 1–24. doi: 10.1002/ 9781119252634.ch1

  25. [28]

    The watkins marine mammal sound database: An online, freely accessible resource

    Sayigh, L., Daher, M.A., Allen, J., Gordon, H., Joyce, K., Stuhlmann, C., Tyack, P., 2017. The watkins marine mammal sound database: An online, freely accessible resource. Proceedings of Meet- ings on Acoustics 27, 040013. URL: https://doi.org/10.1121/2.0000358, doi: 10.1121/2.0000358, arXiv:https://pubs.aip.org/asa/poma/article-pdf/doi/10.1121/2.0000358/...

  26. [29]

    Differential treatment for time and frequency dimensions in mel-spectrograms: An efficient 3d spectrogram network for underwater acoustic target classification

    Tang, N., Zhou, F., Wang, Y., Zhang, H., Lyu, T., Wang, Z., Chang, L., 2023. Differential treatment for time and frequency dimensions in mel-spectrograms: An efficient 3d spectrogram network for underwater acoustic target classification. Ocean Engineering 287, 115863. URL: https://www.sciencedirect.com/science/article/pii/S0029801823022473, doi:https://do...

  27. [30]

    Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

    Thanda, A., Venkatesan, S.M., 2017. Multi-task learning of deep neural networks for audio visual automatic speech recognition. URL: https://arxiv.org/abs/1701.02477, arXiv:1701.02477

  28. [31]

    Anunderwateracoustictargetrecognitionmethodbasedonamnet

    Wang,B.,Zhang,W.,Zhu,Y.,Wu,C.,Zhang,S.,2023a. Anunderwateracoustictargetrecognitionmethodbasedonamnet. IEEEGeoscience and Remote Sensing Letters 20, 1–5. doi:10.1109/LGRS.2023.3235659. Huang et al.: Preprint submitted to Elsevier Page 17 of 18 A MT-BCA-CNN Model for Few-shot UATR

  29. [32]

    CAM++: A fast and efficient network for speaker verification using context- aware masking

    Wang, H., andYafeng Chen, S.Z., Cheng, L., Chen, Q., 2023b. CAM++: A fast and efficient network for speaker verification using context- aware masking. CoRR abs/2303.00332. URL:https://doi.org/10.48550/arXiv.2303.00332, doi:10.48550/ARXIV.2303.00332, arXiv:2303.00332

  30. [33]

    Underwater acoustic target recognition using attention-based deep neural network

    Xiao, X., Wang, W., Ren, Q., Gerstoft, P., Ma, L., 2021. Underwater acoustic target recognition using attention-based deep neural network. JASA Express Letters 1, 106001. URL: https://doi.org/10.1121/10.0006299, doi:10.1121/10.0006299, arXiv:https://pubs.aip.org/asa/jel/article-pdf/doi/10.1121/10.0006299/14785347/106001_1_online.pdf

  31. [34]

    A novel deep-learning method with channel attention mechanism for underwater target recognition

    Xue, L., Zeng, X., Jin, A., 2022. A novel deep-learning method with channel attention mechanism for underwater target recognition. Sensors 22, 5492. doi:10.3390/s22155492

  32. [35]

    An adaptive algorithm for target recognition using gaussian mixture models

    Xue, W., Jiang, T., 2018. An adaptive algorithm for target recognition using gaussian mixture models. Measurement 124, 233–

  33. [36]

    Masset, R

    URL: https://www.sciencedirect.com/science/article/pii/S0263224118302951, doi:https://doi.org/10.1016/j. measurement.2018.04.019

  34. [37]

    Cross-view scene image localization with triplet network integrating netvlad and fully connected layers

    XUE, Z., ZHOU, Y., QIANG, Y., LIU, Y., LIN, H., 2021. Cross-view scene image localization with triplet network integrating netvlad and fully connected layers. National Remote Sensing Bulletin 25, 1095–1107. doi:10.11834/jrs.20210188

  35. [38]

    A deep convolutional neural network inspired by auditory perception for underwater acoustic target recognition

    Yang, H., Li, J., Shen, S., Xu, G., 2019. A deep convolutional neural network inspired by auditory perception for underwater acoustic target recognition. Sensors 19. URL:https://www.mdpi.com/1424-8220/19/5/1104, doi:10.3390/s19051104

  36. [39]

    Anoverviewonunderwateracousticpassivetargetrecognitionbasedondeep learning

    ZHANG,Q.,DA,L.,WANG,C.,ZHANG,Y.,ZHUO,J.,2023. Anoverviewonunderwateracousticpassivetargetrecognitionbasedondeep learning. Journal of Electronics & Information Technology 45, 4190. doi:10.11999/JEIT221301". Huang et al.: Preprint submitted to Elsevier Page 18 of 18