Histogram-based Parameter-efficient Tuning for Passive and Active Sonar Classification
Pith reviewed 2026-05-22 18:05 UTC · model grok-4.3
The pith
Histogram-based tuning captures target domain statistics to modulate embeddings for sonar classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HPT captures the statistics of the target domain through histograms of intermediate feature embeddings and modulates those embeddings, outperforming conventional adapters on passive sonar classification while remaining competitive on active sonar imagery and yielding features closer to fully fine-tuned models.
What carries the argument
Histogram-based parameter-efficient tuning (HPT), which extracts distributional statistics via histograms on intermediate embeddings and applies them to modulate the features for domain adaptation.
If this is right
- HPT reaches 91.8 percent accuracy on VTUAD compared with 89.8 percent for conventional adapters.
- The method remains competitive with other parameter-efficient techniques on active sonar imagery datasets such as Watertank and Turntable.
- Feature representations obtained with HPT lie closer to those of fully fine-tuned models than those from standard adapters.
- HPT achieves its gains while adding only a small number of parameters relative to full fine-tuning.
Where Pith is reading between the lines
- The same histogram-modulation idea could be tested on other signal-processing tasks that face domain shifts, such as radar or medical ultrasound classification.
- If histogram summaries prove robust across tasks, they might reduce the volume of target-domain labels needed for effective adaptation.
- Pairing HPT with existing adapter families could produce hybrid methods that combine statistical correction with learned transformations.
Load-bearing premise
Histogram statistics drawn from intermediate feature embeddings are sufficient to capture and correct the distributional shifts that occur when adapting models to new sonar domains.
What would settle it
On a held-out sonar dataset with a clear distributional shift, if HPT shows no accuracy gain over standard adapters and produces embeddings no closer to full fine-tuning results than the baseline adapters, the central claim would be refuted.
Figures
read the original abstract
Parameter-efficient transfer learning (PETL) methods adapt large artificial neural networks to downstream tasks without fine-tuning the entire model. However, existing additive methods, such as adapters, sometimes struggle to capture distributional shifts in intermediate feature embeddings. We propose a novel histogram-based parameter-efficient tuning (HPT) technique that captures the statistics of the target domain and modulates the embeddings. Experimental results on three downstream passive sonar datasets (ShipsEar, DeepShip, Vessel Type Underwater Acoustic Data (VTUAD)) demonstrate that HPT outperforms conventional adapters. Notably, HPT achieves 91.8% vs. 89.8% accuracy on VTUAD. For active sonar imagery (Watertank, Turntable), HPT is competitive with other PETL methods. Furthermore, HPT yields feature representations closer to those of fully fine-tuned models. Overall, HPT balances parameter savings and provides a distribution-aware alternative to existing adapters and shows a promising direction for transfer learning in resource-constrained environments. The code is publicly available: https://github.com/Advanced-Vision-and-Learning-Lab/HLAST_DeepShip_ParameterEfficient.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Histogram-based Parameter-efficient Tuning (HPT) as a novel PETL technique that computes histograms of intermediate feature embeddings to capture target-domain statistics and modulate embeddings for sonar classification. It reports that HPT outperforms conventional adapters on passive sonar datasets (e.g., 91.8% vs. 89.8% accuracy on VTUAD) and is competitive on active sonar imagery, while producing features closer to those of full fine-tuning and remaining parameter-efficient. The code is released publicly.
Significance. If the central claim holds, HPT would provide a distribution-aware PETL alternative useful for resource-constrained sonar tasks. The public code release supports reproducibility and is a clear strength. However, the significance depends on demonstrating that the histogram mechanism specifically addresses distributional shifts in sonar data (reverberation, multipath) beyond what capacity-matched adapters achieve.
major comments (3)
- [Method (HPT formulation)] The method description does not specify the histogram bin count, normalization procedure, or the exact mapping from histogram statistics to the modulation vector (e.g., learned affine parameters per bin or other transformation). Without these details it is impossible to determine whether the reported gains arise from explicit distributional correction or from any low-rank/statistical adapter of comparable parameter budget.
- [Experimental results (VTUAD and ablation tables)] Table reporting VTUAD results shows a 2% absolute gain but provides no error bars, number of runs, or statistical significance test. No ablation compares HPT against adapters with matched parameter count or against a non-histogram modulation baseline, leaving open the possibility that the improvement is explained by added capacity rather than the histogram's capture of higher-order moments relevant to sonar effects.
- [Feature representation analysis] The claim that HPT produces feature representations closer to full fine-tuning is stated without quantitative support such as explicit distance metrics (e.g., MMD or cosine distance on embeddings) or a dedicated table/figure comparing representations across methods.
minor comments (2)
- [Abstract] Ensure dataset names and counts are consistent between abstract and experimental section (three passive sonar datasets are mentioned but the listed names should be verified).
- [Method] Add a short equation or pseudocode block clarifying how the histogram vector is applied as a modulation to the embedding.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below and have revised the manuscript to improve clarity and provide additional evidence where appropriate.
read point-by-point responses
-
Referee: [Method (HPT formulation)] The method description does not specify the histogram bin count, normalization procedure, or the exact mapping from histogram statistics to the modulation vector (e.g., learned affine parameters per bin or other transformation). Without these details it is impossible to determine whether the reported gains arise from explicit distributional correction or from any low-rank/statistical adapter of comparable parameter budget.
Authors: We appreciate the referee highlighting this point. The implementation details for histogram construction and modulation are present in the publicly released code, but we agree that the main text should be self-contained. In the revised manuscript we have expanded Section 3 to explicitly state the bin count, the normalization procedure applied to the histograms, and the precise transformation (a learned linear mapping) that produces the modulation vector from the histogram statistics. These additions make clear that the mechanism is designed to capture target-domain distributional information rather than functioning as a generic low-rank adapter. revision: yes
-
Referee: [Experimental results (VTUAD and ablation tables)] Table reporting VTUAD results shows a 2% absolute gain but provides no error bars, number of runs, or statistical significance test. No ablation compares HPT against adapters with matched parameter count or against a non-histogram modulation baseline, leaving open the possibility that the improvement is explained by added capacity rather than the histogram's capture of higher-order moments relevant to sonar effects.
Authors: We acknowledge that the current experimental presentation would be strengthened by reporting variability and by including targeted ablations. In the revised version we have added error bars obtained from multiple independent runs with different random seeds to the VTUAD table and included a brief discussion of statistical significance. We have also inserted a new ablation subsection that compares HPT against both parameter-count-matched adapters and a non-histogram statistical modulation baseline. The results of these ablations indicate that the histogram component contributes measurably beyond capacity alone, consistent with the distributional-shift motivation for sonar data. revision: yes
-
Referee: [Feature representation analysis] The claim that HPT produces feature representations closer to full fine-tuning is stated without quantitative support such as explicit distance metrics (e.g., MMD or cosine distance on embeddings) or a dedicated table/figure comparing representations across methods.
Authors: We thank the referee for this observation. The original manuscript supported the claim with qualitative visualization; we agree that quantitative metrics would provide stronger evidence. In the revised manuscript we have added a dedicated table and accompanying figure that report Maximum Mean Discrepancy (MMD) and mean cosine distance between the intermediate embeddings of HPT, standard adapters, and full fine-tuning. These metrics confirm that HPT embeddings lie closer to the fully fine-tuned distribution than those of the compared PETL baselines. revision: yes
Circularity Check
No circularity: HPT is an empirical PETL proposal validated on external sonar datasets
full rationale
The paper introduces histogram-based parameter-efficient tuning as a new modulation technique that captures target-domain statistics in intermediate embeddings and is directly compared against adapters and other PETL baselines on independent public sonar datasets (ShipsEar, DeepShip, VTUAD, Watertank, Turntable). No derivation chain, uniqueness theorem, or self-citation is invoked to justify the core method; performance claims rest on reported accuracy numbers and feature-similarity metrics rather than any quantity that is fitted and then re-labeled as a prediction. The approach therefore remains self-contained against external benchmarks and exhibits no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A comprehensive study of transfer learning under constraints,
T. P´egeot, I. Kucher, A. Popescu, and B. Delezoide, “A comprehensive study of transfer learning under constraints,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1148– 1157
work page 2023
-
[2]
Head2toe: Utilizing intermediate representations for better transfer learning,
U. Evci, V . Dumoulin, H. Larochelle, and M. C. Mozer, “Head2toe: Utilizing intermediate representations for better transfer learning,” in International Conference on Machine Learning . PMLR, 2022, pp. 6009–6033
work page 2022
-
[3]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Parameter-efficient transfer learning for nlp,
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International conference on machine learning . PMLR, 2019, pp. 2790–2799
work page 2019
-
[5]
The Power of Scale for Parameter-Efficient Prompt Tuning
B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter- efficient prompt tuning,” arXiv preprint arXiv:2104.08691 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[6]
LoRA: Low-Rank Adaptation of Large Language Models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Scaling & shifting your features: A new baseline for efficient model tuning,
D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems , vol. 35, pp. 109–123, 2022
work page 2022
-
[8]
A multi-device dataset for urban acoustic scene classification
A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” arXiv preprint arXiv:1807.09840 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
A. Testolin, D. Kipnis, and R. Diamant, “Detecting submerged objects using active acoustics and deep neural networks: A test case for pelagic fish,” IEEE Transactions on Mobile Computing , vol. 21, no. 8, pp. 2776– 2788, 2020
work page 2020
-
[10]
Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,
B. Beckler, A. Pfau, M. Orescanin, S. Atchley, N. Villemez, J. E. Joseph, C. W. Miller, and T. Margolina, “Multilabel classification of heterogeneous underwater soundscapes with bayesian deep learning,” IEEE Journal of Oceanic Engineering , vol. 47, no. 4, pp. 1143–1154, 2022
work page 2022
-
[11]
Towards a unified view of parameter-efficient transfer learning
J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021
-
[12]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[13]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2022
E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199 , 2021
-
[14]
Unipelt: A unified framework for parameter-efficient language model tuning,
Y . Mao, L. Mathias, R. Hou, A. Almahairi, H. Ma, J. Han, W.-t. Yih, and M. Khabsa, “Unipelt: A unified framework for parameter-efficient language model tuning,” arXiv preprint arXiv:2110.07577 , 2021
-
[15]
Return of frustratingly easy domain adaptation,
B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in Proceedings of the AAAI conference on artificial intelli- gence, vol. 30, 2016
work page 2016
-
[16]
Histogram layers for texture analysis,
J. Peeples, W. Xu, and A. Zare, “Histogram layers for texture analysis,” IEEE Transactions on Artificial Intelligence , vol. 3, no. 4, pp. 541–552, 2021
work page 2021
-
[17]
Deep convolution stack for waveform in underwater acoustic target recognition,
S. Tian, D. Chen, H. Wang, and J. Liu, “Deep convolution stack for waveform in underwater acoustic target recognition,” Scientific reports, vol. 11, no. 1, p. 9614, 2021
work page 2021
-
[18]
Histogram layer time delay neural networks for passive sonar classification,
J. Ritu, E. Barnes, R. Martell, A. Van Dine, and J. Peeples, “Histogram layer time delay neural networks for passive sonar classification,” in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2023, pp. 1–5
work page 2023
-
[19]
Ast: Audio spectrogram transformer,
Y . Gong, Y .-A. Chung, and J. Glass, “Ast: Audio spectrogram transformer,” arXiv preprint arXiv:2104.01778 , 2021
-
[20]
Training data-efficient image transformers & distillation through attention,
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning . PMLR, 2021, pp. 10 347–10 357
work page 2021
-
[21]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255
work page 2009
-
[22]
Audio set: An ontology and human- labeled dataset for audio events,
J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human- labeled dataset for audio events,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2017, pp. 776–780
work page 2017
-
[23]
Shipsear: An underwater vessel noise database,
D. Santos-Dom ´ınguez, S. Torres-Guijarro, A. Cardenal-L ´opez, and A. Pena-Gimenez, “Shipsear: An underwater vessel noise database,” Applied Acoustics, vol. 113, pp. 64–69, 2016
work page 2016
-
[24]
M. Irfan, Z. Jiangbin, S. Ali, M. Iqbal, Z. Masood, and U. Hamid, “Deepship: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification,” Expert Systems with Applications, vol. 183, p. 115270, 2021
work page 2021
-
[25]
L. C. F. Domingos, P. E. Santos, P. S. M. Skelton, R. S. A. Brinkworth, and K. Sammut, “An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data,” IEEE Access, vol. 10, pp. 117 582–117 596, 2022
work page 2022
-
[26]
Weakly labelled audioset tagging with attention neural networks,
Q. Kong, C. Yu, Y . Xu, T. Iqbal, W. Wang, and M. D. Plumbley, “Weakly labelled audioset tagging with attention neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 1791–1802, 2019
work page 2019
-
[27]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision , 2015, pp. 1026–1034
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.