Histogram-based Parameter-efficient Tuning for Passive and Active Sonar Classification

Alexandra Van Dine; Amirmohammad Mohammadi; Davelle Carreiro; Joshua Peeples

arxiv: 2504.15214 · v3 · submitted 2025-04-21 · 💻 cs.LG · cs.SD

Histogram-based Parameter-efficient Tuning for Passive and Active Sonar Classification

Amirmohammad Mohammadi , Davelle Carreiro , Alexandra Van Dine , Joshua Peeples This is my paper

Pith reviewed 2026-05-22 18:05 UTC · model grok-4.3

classification 💻 cs.LG cs.SD

keywords parameter-efficient tuninghistogram-based adaptationsonar classificationpassive sonaractive sonartransfer learningdistributional shiftsfeature modulation

0 comments

The pith

Histogram-based tuning captures target domain statistics to modulate embeddings for sonar classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces histogram-based parameter-efficient tuning, or HPT, as a way to adapt large neural networks to sonar tasks without retraining the full model. HPT computes histograms on intermediate feature embeddings to capture the statistical properties of the new data domain and then uses those statistics to adjust the embeddings directly. This approach is tested on passive sonar datasets including ShipsEar, DeepShip, and VTUAD, where it reaches 91.8 percent accuracy compared with 89.8 percent for standard adapters, and it remains competitive on active sonar imagery sets. The method also produces representations closer to those from complete fine-tuning while keeping the added parameter count low. A reader would care because it supplies a concrete distribution-aware alternative for efficient transfer learning in settings where labeled target data and compute resources are scarce.

Core claim

HPT captures the statistics of the target domain through histograms of intermediate feature embeddings and modulates those embeddings, outperforming conventional adapters on passive sonar classification while remaining competitive on active sonar imagery and yielding features closer to fully fine-tuned models.

What carries the argument

Histogram-based parameter-efficient tuning (HPT), which extracts distributional statistics via histograms on intermediate embeddings and applies them to modulate the features for domain adaptation.

If this is right

HPT reaches 91.8 percent accuracy on VTUAD compared with 89.8 percent for conventional adapters.
The method remains competitive with other parameter-efficient techniques on active sonar imagery datasets such as Watertank and Turntable.
Feature representations obtained with HPT lie closer to those of fully fine-tuned models than those from standard adapters.
HPT achieves its gains while adding only a small number of parameters relative to full fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same histogram-modulation idea could be tested on other signal-processing tasks that face domain shifts, such as radar or medical ultrasound classification.
If histogram summaries prove robust across tasks, they might reduce the volume of target-domain labels needed for effective adaptation.
Pairing HPT with existing adapter families could produce hybrid methods that combine statistical correction with learned transformations.

Load-bearing premise

Histogram statistics drawn from intermediate feature embeddings are sufficient to capture and correct the distributional shifts that occur when adapting models to new sonar domains.

What would settle it

On a held-out sonar dataset with a clear distributional shift, if HPT shows no accuracy gain over standard adapters and produces embeddings no closer to full fine-tuning results than the baseline adapters, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2504.15214 by Alexandra Van Dine, Amirmohammad Mohammadi, Davelle Carreiro, Joshua Peeples.

**Figure 2.** Figure 2: Scalability comparison of HPT and adapters. HPT performance [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Loss convergence for adapters (left column) and HPT (right [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Layer-wise feature similarity for ShipsEar, DeepShip, VTUAD (shared), and VTUAD (non-shared). Histograms maintain closer [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Parameter-efficient transfer learning (PETL) methods adapt large artificial neural networks to downstream tasks without fine-tuning the entire model. However, existing additive methods, such as adapters, sometimes struggle to capture distributional shifts in intermediate feature embeddings. We propose a novel histogram-based parameter-efficient tuning (HPT) technique that captures the statistics of the target domain and modulates the embeddings. Experimental results on three downstream passive sonar datasets (ShipsEar, DeepShip, Vessel Type Underwater Acoustic Data (VTUAD)) demonstrate that HPT outperforms conventional adapters. Notably, HPT achieves 91.8% vs. 89.8% accuracy on VTUAD. For active sonar imagery (Watertank, Turntable), HPT is competitive with other PETL methods. Furthermore, HPT yields feature representations closer to those of fully fine-tuned models. Overall, HPT balances parameter savings and provides a distribution-aware alternative to existing adapters and shows a promising direction for transfer learning in resource-constrained environments. The code is publicly available: https://github.com/Advanced-Vision-and-Learning-Lab/HLAST_DeepShip_ParameterEfficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HPT gives a modest 2% lift over adapters on one passive sonar set by modulating embeddings with histograms, but the gain could easily trace to extra parameters rather than any special handling of distributional shifts.

read the letter

The paper introduces histogram-based parameter-efficient tuning as a way to adapt pretrained models to sonar classification without full fine-tuning. It modulates intermediate embeddings using statistics from the target domain and reports better accuracy than standard adapters on passive sonar tasks, reaching 91.8% on VTUAD versus 89.8%. They also test on active sonar imagery and note that the resulting features sit closer to those from full fine-tuning. The code release on GitHub is helpful for anyone who wants to check the implementation or try it on their own data.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Histogram-based Parameter-efficient Tuning (HPT) as a novel PETL technique that computes histograms of intermediate feature embeddings to capture target-domain statistics and modulate embeddings for sonar classification. It reports that HPT outperforms conventional adapters on passive sonar datasets (e.g., 91.8% vs. 89.8% accuracy on VTUAD) and is competitive on active sonar imagery, while producing features closer to those of full fine-tuning and remaining parameter-efficient. The code is released publicly.

Significance. If the central claim holds, HPT would provide a distribution-aware PETL alternative useful for resource-constrained sonar tasks. The public code release supports reproducibility and is a clear strength. However, the significance depends on demonstrating that the histogram mechanism specifically addresses distributional shifts in sonar data (reverberation, multipath) beyond what capacity-matched adapters achieve.

major comments (3)

[Method (HPT formulation)] The method description does not specify the histogram bin count, normalization procedure, or the exact mapping from histogram statistics to the modulation vector (e.g., learned affine parameters per bin or other transformation). Without these details it is impossible to determine whether the reported gains arise from explicit distributional correction or from any low-rank/statistical adapter of comparable parameter budget.
[Experimental results (VTUAD and ablation tables)] Table reporting VTUAD results shows a 2% absolute gain but provides no error bars, number of runs, or statistical significance test. No ablation compares HPT against adapters with matched parameter count or against a non-histogram modulation baseline, leaving open the possibility that the improvement is explained by added capacity rather than the histogram's capture of higher-order moments relevant to sonar effects.
[Feature representation analysis] The claim that HPT produces feature representations closer to full fine-tuning is stated without quantitative support such as explicit distance metrics (e.g., MMD or cosine distance on embeddings) or a dedicated table/figure comparing representations across methods.

minor comments (2)

[Abstract] Ensure dataset names and counts are consistent between abstract and experimental section (three passive sonar datasets are mentioned but the listed names should be verified).
[Method] Add a short equation or pseudocode block clarifying how the histogram vector is applied as a modulation to the embedding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below and have revised the manuscript to improve clarity and provide additional evidence where appropriate.

read point-by-point responses

Referee: [Method (HPT formulation)] The method description does not specify the histogram bin count, normalization procedure, or the exact mapping from histogram statistics to the modulation vector (e.g., learned affine parameters per bin or other transformation). Without these details it is impossible to determine whether the reported gains arise from explicit distributional correction or from any low-rank/statistical adapter of comparable parameter budget.

Authors: We appreciate the referee highlighting this point. The implementation details for histogram construction and modulation are present in the publicly released code, but we agree that the main text should be self-contained. In the revised manuscript we have expanded Section 3 to explicitly state the bin count, the normalization procedure applied to the histograms, and the precise transformation (a learned linear mapping) that produces the modulation vector from the histogram statistics. These additions make clear that the mechanism is designed to capture target-domain distributional information rather than functioning as a generic low-rank adapter. revision: yes
Referee: [Experimental results (VTUAD and ablation tables)] Table reporting VTUAD results shows a 2% absolute gain but provides no error bars, number of runs, or statistical significance test. No ablation compares HPT against adapters with matched parameter count or against a non-histogram modulation baseline, leaving open the possibility that the improvement is explained by added capacity rather than the histogram's capture of higher-order moments relevant to sonar effects.

Authors: We acknowledge that the current experimental presentation would be strengthened by reporting variability and by including targeted ablations. In the revised version we have added error bars obtained from multiple independent runs with different random seeds to the VTUAD table and included a brief discussion of statistical significance. We have also inserted a new ablation subsection that compares HPT against both parameter-count-matched adapters and a non-histogram statistical modulation baseline. The results of these ablations indicate that the histogram component contributes measurably beyond capacity alone, consistent with the distributional-shift motivation for sonar data. revision: yes
Referee: [Feature representation analysis] The claim that HPT produces feature representations closer to full fine-tuning is stated without quantitative support such as explicit distance metrics (e.g., MMD or cosine distance on embeddings) or a dedicated table/figure comparing representations across methods.

Authors: We thank the referee for this observation. The original manuscript supported the claim with qualitative visualization; we agree that quantitative metrics would provide stronger evidence. In the revised manuscript we have added a dedicated table and accompanying figure that report Maximum Mean Discrepancy (MMD) and mean cosine distance between the intermediate embeddings of HPT, standard adapters, and full fine-tuning. These metrics confirm that HPT embeddings lie closer to the fully fine-tuned distribution than those of the compared PETL baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: HPT is an empirical PETL proposal validated on external sonar datasets

full rationale

The paper introduces histogram-based parameter-efficient tuning as a new modulation technique that captures target-domain statistics in intermediate embeddings and is directly compared against adapters and other PETL baselines on independent public sonar datasets (ShipsEar, DeepShip, VTUAD, Watertank, Turntable). No derivation chain, uniqueness theorem, or self-citation is invoked to justify the core method; performance claims rest on reported accuracy numbers and feature-similarity metrics rather than any quantity that is fitted and then re-labeled as a prediction. The approach therefore remains self-contained against external benchmarks and exhibits no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method appears to build on standard neural network transfer learning assumptions without introducing new postulated objects.

pith-pipeline@v0.9.0 · 5736 in / 1032 out tokens · 68203 ms · 2026-05-22T18:05:16.329897+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 5 internal anchors

[1]

A comprehensive study of transfer learning under constraints,

T. P´egeot, I. Kucher, A. Popescu, and B. Delezoide, “A comprehensive study of transfer learning under constraints,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1148– 1157

work page 2023
[2]

Head2toe: Utilizing intermediate representations for better transfer learning,

U. Evci, V . Dumoulin, H. Larochelle, and M. C. Mozer, “Head2toe: Utilizing intermediate representations for better transfer learning,” in International Conference on Machine Learning . PMLR, 2022, pp. 6009–6033

work page 2022
[3]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International conference on machine learning . PMLR, 2019, pp. 2790–2799

work page 2019
[5]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter- efficient prompt tuning,” arXiv preprint arXiv:2104.08691 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

Scaling & shifting your features: A new baseline for efficient model tuning,

D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems , vol. 35, pp. 109–123, 2022

work page 2022
[8]

A multi-device dataset for urban acoustic scene classification

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” arXiv preprint arXiv:1807.09840 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Detecting submerged objects using active acoustics and deep neural networks: A test case for pelagic fish,

A. Testolin, D. Kipnis, and R. Diamant, “Detecting submerged objects using active acoustics and deep neural networks: A test case for pelagic fish,” IEEE Transactions on Mobile Computing , vol. 21, no. 8, pp. 2776– 2788, 2020

work page 2020
[10]

Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,

B. Beckler, A. Pfau, M. Orescanin, S. Atchley, N. Villemez, J. E. Joseph, C. W. Miller, and T. Margolina, “Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,” IEEE Journal of Oceanic Engineering , vol. 47, no. 4, pp. 1143–1154, 2022

work page 2022
[11]

Towards a unified view of parameter-efficient transfer learning

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021

work page arXiv 2021
[12]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[13]

Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2022

E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199 , 2021

work page arXiv 2021
[14]

Unipelt: A unified framework for parameter-efficient language model tuning,

Y . Mao, L. Mathias, R. Hou, A. Almahairi, H. Ma, J. Han, W.-t. Yih, and M. Khabsa, “Unipelt: A unified framework for parameter-efficient language model tuning,” arXiv preprint arXiv:2110.07577 , 2021

work page arXiv 2021
[15]

Return of frustratingly easy domain adaptation,

B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in Proceedings of the AAAI conference on artificial intelli- gence, vol. 30, 2016

work page 2016
[16]

Histogram layers for texture analysis,

J. Peeples, W. Xu, and A. Zare, “Histogram layers for texture analysis,” IEEE Transactions on Artificial Intelligence , vol. 3, no. 4, pp. 541–552, 2021

work page 2021
[17]

Deep convolution stack for waveform in underwater acoustic target recognition,

S. Tian, D. Chen, H. Wang, and J. Liu, “Deep convolution stack for waveform in underwater acoustic target recognition,” Scientific reports, vol. 11, no. 1, p. 9614, 2021

work page 2021
[18]

Histogram layer time delay neural networks for passive sonar classification,

J. Ritu, E. Barnes, R. Martell, A. Van Dine, and J. Peeples, “Histogram layer time delay neural networks for passive sonar classification,” in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2023, pp. 1–5

work page 2023
[19]

Ast: Audio spectrogram transformer,

Y . Gong, Y .-A. Chung, and J. Glass, “Ast: Audio spectrogram transformer,” arXiv preprint arXiv:2104.01778 , 2021

work page arXiv 2021
[20]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning . PMLR, 2021, pp. 10 347–10 357

work page 2021
[21]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009
[22]

Audio set: An ontology and human- labeled dataset for audio events,

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human- labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2017, pp. 776–780

work page 2017
[23]

Shipsear: An underwater vessel noise database,

D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016

work page 2016
[24]

Deepship: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification,

M. Irfan, Z. Jiangbin, S. Ali, M. Iqbal, Z. Masood, and U. Hamid, “Deepship: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification,” Expert Systems with Applications, vol. 183, p. 115270, 2021

work page 2021
[25]

An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data,

L. C. F. Domingos, P. E. Santos, P. S. M. Skelton, R. S. A. Brinkworth, and K. Sammut, “An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data,” IEEE Access, vol. 10, pp. 117 582–117 596, 2022

work page 2022
[26]

Weakly labelled audioset tagging with attention neural networks,

Q. Kong, C. Yu, Y . Xu, T. Iqbal, W. Wang, and M. D. Plumbley, “Weakly labelled audioset tagging with attention neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 1791–1802, 2019

work page 2019
[27]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision , 2015, pp. 1026–1034

work page 2015

[1] [1]

A comprehensive study of transfer learning under constraints,

T. P´egeot, I. Kucher, A. Popescu, and B. Delezoide, “A comprehensive study of transfer learning under constraints,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1148– 1157

work page 2023

[2] [2]

Head2toe: Utilizing intermediate representations for better transfer learning,

U. Evci, V . Dumoulin, H. Larochelle, and M. C. Mozer, “Head2toe: Utilizing intermediate representations for better transfer learning,” in International Conference on Machine Learning . PMLR, 2022, pp. 6009–6033

work page 2022

[3] [3]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International conference on machine learning . PMLR, 2019, pp. 2790–2799

work page 2019

[5] [5]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter- efficient prompt tuning,” arXiv preprint arXiv:2104.08691 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

Scaling & shifting your features: A new baseline for efficient model tuning,

D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems , vol. 35, pp. 109–123, 2022

work page 2022

[8] [8]

A multi-device dataset for urban acoustic scene classification

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” arXiv preprint arXiv:1807.09840 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Detecting submerged objects using active acoustics and deep neural networks: A test case for pelagic fish,

A. Testolin, D. Kipnis, and R. Diamant, “Detecting submerged objects using active acoustics and deep neural networks: A test case for pelagic fish,” IEEE Transactions on Mobile Computing , vol. 21, no. 8, pp. 2776– 2788, 2020

work page 2020

[10] [10]

Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,

B. Beckler, A. Pfau, M. Orescanin, S. Atchley, N. Villemez, J. E. Joseph, C. W. Miller, and T. Margolina, “Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,” IEEE Journal of Oceanic Engineering , vol. 47, no. 4, pp. 1143–1154, 2022

work page 2022

[11] [11]

Towards a unified view of parameter-efficient transfer learning

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021

work page arXiv 2021

[12] [12]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[13] [13]

Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2022

E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199 , 2021

work page arXiv 2021

[14] [14]

Unipelt: A unified framework for parameter-efficient language model tuning,

Y . Mao, L. Mathias, R. Hou, A. Almahairi, H. Ma, J. Han, W.-t. Yih, and M. Khabsa, “Unipelt: A unified framework for parameter-efficient language model tuning,” arXiv preprint arXiv:2110.07577 , 2021

work page arXiv 2021

[15] [15]

Return of frustratingly easy domain adaptation,

B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in Proceedings of the AAAI conference on artificial intelli- gence, vol. 30, 2016

work page 2016

[16] [16]

Histogram layers for texture analysis,

J. Peeples, W. Xu, and A. Zare, “Histogram layers for texture analysis,” IEEE Transactions on Artificial Intelligence , vol. 3, no. 4, pp. 541–552, 2021

work page 2021

[17] [17]

Deep convolution stack for waveform in underwater acoustic target recognition,

S. Tian, D. Chen, H. Wang, and J. Liu, “Deep convolution stack for waveform in underwater acoustic target recognition,” Scientific reports, vol. 11, no. 1, p. 9614, 2021

work page 2021

[18] [18]

Histogram layer time delay neural networks for passive sonar classification,

J. Ritu, E. Barnes, R. Martell, A. Van Dine, and J. Peeples, “Histogram layer time delay neural networks for passive sonar classification,” in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2023, pp. 1–5

work page 2023

[19] [19]

Ast: Audio spectrogram transformer,

Y . Gong, Y .-A. Chung, and J. Glass, “Ast: Audio spectrogram transformer,” arXiv preprint arXiv:2104.01778 , 2021

work page arXiv 2021

[20] [20]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning . PMLR, 2021, pp. 10 347–10 357

work page 2021

[21] [21]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009

[22] [22]

Audio set: An ontology and human- labeled dataset for audio events,

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human- labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2017, pp. 776–780

work page 2017

[23] [23]

Shipsear: An underwater vessel noise database,

D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016

work page 2016

[24] [24]

Deepship: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification,

M. Irfan, Z. Jiangbin, S. Ali, M. Iqbal, Z. Masood, and U. Hamid, “Deepship: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification,” Expert Systems with Applications, vol. 183, p. 115270, 2021

work page 2021

[25] [25]

An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data,

L. C. F. Domingos, P. E. Santos, P. S. M. Skelton, R. S. A. Brinkworth, and K. Sammut, “An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data,” IEEE Access, vol. 10, pp. 117 582–117 596, 2022

work page 2022

[26] [26]

Weakly labelled audioset tagging with attention neural networks,

Q. Kong, C. Yu, Y . Xu, T. Iqbal, W. Wang, and M. D. Plumbley, “Weakly labelled audioset tagging with attention neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 1791–1802, 2019

work page 2019

[27] [27]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision , 2015, pp. 1026–1034

work page 2015