Deep learning for echo sounder data

Ketil Malde

arxiv: 2606.10811 · v1 · pith:O2VBO2SGnew · submitted 2026-06-09 · 💻 cs.CV

Deep learning for echo sounder data

Ketil Malde This is my paper

Pith reviewed 2026-06-27 13:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords deep learningecho sounderacoustic dataechogramsunderwater acousticsmachine learningdata formatsdatasets

0 comments

The pith

Acoustic echo sounder data requires deep learning methods designed for its intrinsic properties rather than adapted from image processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that deep learning has transformed image and text analysis but produced only modest results when applied to underwater acoustic observations from echo sounders. It claims that the distinctive character of acoustic data means further progress depends on creating new methods instead of reusing image-based models. The main barrier today is the absence of consistent data formats and high-quality public datasets that include agreed performance targets. Removing these barriers would allow targeted research that could improve how acoustic data is interpreted for underwater monitoring.

Core claim

The central claim is that due to intrinsic properties of acoustic data, substantial advances will likely require research into deep learning methods beyond mere recycling of models and techniques from image processing. Currently, the potential for breakthroughs in method development is hindered by the lack of standard data formats and organization, and even more by the lack of readily available, high quality data sets with established performance goals. To advance the field, these shortcomings should be remedied.

What carries the argument

The intrinsic properties of acoustic data that set it apart from images and require purpose-built deep learning approaches for echograms and related recordings.

If this is right

New deep learning architectures will need to be developed specifically for the structure of acoustic recordings rather than transferred from vision tasks.
Standard data formats and consistent organization practices must be adopted across the field.
Public high-quality datasets accompanied by defined performance targets will become necessary to measure and drive progress.
Underwater observation tasks that rely on echo sounder data will see improved processing once these resources exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Marine monitoring applications such as fish detection or seabed mapping could gain from methods that respect acoustic signal characteristics like range-dependent resolution.
Similar data-format and benchmark gaps may exist in other sensor domains that produce non-image time-series data.
Community efforts to curate and share acoustic recordings would likely accelerate method testing even before new model families appear.

Load-bearing premise

The lack of standard data formats, organization, and readily available high-quality data sets with established performance goals is the primary current hindrance to breakthroughs.

What would settle it

A public release of standardized high-quality acoustic datasets with clear benchmarks followed by rapid development of image-processing-derived models that match or exceed specialized methods on those benchmarks.

read the original abstract

There is no doubt that over the last decade, techniques from the field of machine learning have revolutionized how we process and interpret data, especially images and text. For underwater observations acoustics is a primary source of information, and naturally, deep learning methods have been applied to echograms and other acoustics data, but so far with rather modest results. Here, we argue that due to intrinsic properties of acoustic data, substantial advances will likely require research into deep learning methods beyond mere recycling of models and techniques from image processing. Currently, the potential for breakthroughs in method development is hindered by the lack of standard data formats and organization, and even more by the lack of readily available, high quality data sets with established performance goals. To advance the field, these shortcomings should be remedied

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short position paper calling for better data standards in acoustic DL, with no new methods or evidence.

read the letter

The paper's core point is that echo sounder data has not seen big DL gains because off-the-shelf image models do not transfer well, and that the real blocker is missing public benchmarks plus standardized formats. It recommends fixing the data side to enable progress.

It does a reasonable job naming the practical problem: fisheries acoustics groups often work with private or poorly organized datasets, which makes it hard to compare methods or train larger models. Anyone who has tried to pull together echogram data will recognize that.

The weak part is the claim that intrinsic acoustic properties will require DL methods beyond image techniques. The text gives no examples of those properties (beam geometry, attenuation, reverberation patterns) or cases where a standard CNN fails in a way that proves the need for new architectures. Without that, the argument stays general and could apply to any domain with small or noisy data.

No experiments, derivations, or even cited failure cases appear, so the piece functions as an opinion rather than a technical contribution. The citation pattern is light and does not engage deeply with existing sonar ML work.

This is mainly for people already inside marine acoustics who want to organize community datasets. A methods researcher or someone outside the subfield will not get much from it. It does not rise to the level of needing referee time as a research paper; a perspectives or editorial slot would be a better fit if the journal has one.

Referee Report

2 major / 0 minor

Summary. The manuscript argues that deep learning has produced only modest results when applied to underwater acoustic data such as echograms, attributes this to intrinsic properties of acoustic data that make standard image-processing models insufficient, and identifies the lack of standardized data formats and high-quality public datasets with established benchmarks as the primary barriers to progress; it concludes by calling for remediation of these data shortcomings.

Significance. If the central claim were supported by concrete examples of acoustic-specific failure modes or successful tailored architectures, the perspective could usefully direct community effort toward data curation and domain-adapted methods rather than direct transfer of vision models. In its current form the argument remains an unsupported assertion, limiting its ability to shape research priorities.

major comments (2)

[Abstract] Abstract: the claim that 'modest results' stem from 'intrinsic properties of acoustic data' is presented without enumerating any such properties (e.g., range-dependent spreading, frequency-dependent attenuation, multi-beam geometry, or non-stationary reverberation) or citing even one documented case in which an image-derived CNN or transformer demonstrably underperforms relative to a purpose-built acoustic model.
[Abstract] Abstract: no quantitative evidence, performance tables, or literature citations are supplied to establish the baseline of 'rather modest results' against which future advances would be measured, rendering the necessity of 'research into deep learning methods beyond mere recycling' an assertion rather than a demonstrated conclusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. As authors of this perspective piece, we appreciate the opportunity to clarify our arguments and will revise the manuscript to better support the claims while preserving its concise, forward-looking nature.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'modest results' stem from 'intrinsic properties of acoustic data' is presented without enumerating any such properties (e.g., range-dependent spreading, frequency-dependent attenuation, multi-beam geometry, or non-stationary reverberation) or citing even one documented case in which an image-derived CNN or transformer demonstrably underperforms relative to a purpose-built acoustic model.

Authors: The manuscript is a short perspective intended to stimulate discussion rather than serve as an empirical study or exhaustive review. The listed acoustic properties are standard in the underwater acoustics literature and explain why direct transfer of image models is often insufficient. We will revise the abstract to briefly enumerate key properties and add 1-2 targeted citations to published cases demonstrating underperformance of standard CNNs/transformers on echogram data relative to acoustic-aware approaches. revision: partial
Referee: [Abstract] Abstract: no quantitative evidence, performance tables, or literature citations are supplied to establish the baseline of 'rather modest results' against which future advances would be measured, rendering the necessity of 'research into deep learning methods beyond mere recycling' an assertion rather than a demonstrated conclusion.

Authors: We agree that referencing specific performance levels from the literature would strengthen the perspective. The observation of modest results reflects the authors' synthesis of existing applications (e.g., fish classification and seabed mapping tasks). In revision we will incorporate citations to representative studies along with their reported metrics to establish a clearer baseline without turning the piece into a review article. revision: partial

Circularity Check

0 steps flagged

Position paper with no derivation chain or quantitative claims; central assertion is non-circular by absence of reduction

full rationale

The manuscript is an opinion piece arguing that acoustic data properties require DL methods beyond image processing and that data standardization hinders progress. No equations, fitted parameters, predictions, self-citations, or uniqueness theorems appear in the abstract or described content. The strongest claim is a forward-looking assertion without any derivation that reduces to its inputs by construction, matching the reader's assessment of score 0.0. No load-bearing steps exist to analyze under the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, fitted parameters, or new entities are introduced; the document rests on the domain assumption that acoustic data possess intrinsic properties that make image models unsuitable, but provides no further breakdown.

pith-pipeline@v0.9.1-grok · 5644 in / 1071 out tokens · 18549 ms · 2026-06-27T13:16:49.410769+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 2 linked inside Pith

[1]

Bahlburg, Dominik, Sebastian Menze, Bjørn A Krafft, Andy D Lowther, and Bettina Meyer. 2025. ``Mapping Encounters Between Antarctic Krill Fishing Vessels and Air-Breathing Krill Predators Using Acoustic Data from the Fishery.'' Proceedings of the National Academy of Sciences 122 (25): e2417203122

2025
[2]

Bokhovkin, Alexey, and Evgeny Burnaev. 2019. ``Boundary Loss for Remote Sensing Imagery Semantic Segmentation.'' In International Symposium on Neural Networks, 388--401. Springer

2019
[3]

Malde, R

Brautaset, O., K. Malde, R. J. Korneliussen, and N. O. Handegard. 2020. ``Acoustic Fish Detection Using Deep Learning.'' ICES Journal of Marine Science 77 (10): 3614--27

2020
[4]

Coetzee, Janet. 2000. ``Use of a Shoal Analysis and Patch Estimation System (SHAPES) to Characterise Sardine Schools.'' Aquatic Living Resources 13 (1): 1--10

2000
[5]

Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ``Imagenet: A Large-Scale Hierarchical Image Database.'' In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248--55. IEEE

2009
[6]

Deng, Li. 2012. ``The Mnist Database of Handwritten Digit Images for Machine Learning Research [ Best of the Web ] .'' IEEE Signal Processing Magazine 29 (6): 141--42

2012
[7]

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2020. ``An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.'' arXiv Preprint arXiv:2010.11929

Pith/arXiv arXiv 2020
[8]

Echoview Software Pty Ltd. 2025. Echoview Version 15.1. https://www.echoview.com

2025
[9]

Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. ``The Pascal Visual Object Classes (Voc) Challenge.'' International Journal of Computer Vision 88 (2): 303--38

2010
[10]

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press

2016
[11]

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. ``Deep Residual Learning for Image Recognition.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770--78

2016
[12]

Hirama, Yudai, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura, Keiji Suzuki, and Masaaki Wada. 2017. ``Discriminating Fish Species by an Echo Sounder in a Set-Net Using a CNN.'' In 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES), 112--15. IEEE

2017
[13]

Korneliussen, Rolf J, Yngve Heggelund, Gavin J Macaulay, Daniel Patel, Espen Johnsen, and Inge K Eliassen. 2016. ``Acoustic Identification of Marine Species Using a Feature Library.'' Methods in Oceanography 17: 187--205

2016
[14]

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. ``Imagenet Classification with Deep Convolutional Neural Networks.'' Advances in Neural Information Processing Systems 25

2012
[15]

Li, Yangdong, Qinghong Mao, Zhuang Chen, and Guoping Zhu. 2025. ``Enhancing Multi-Frequency Acoustic Signal Extraction of Antarctic Krill Euphausia Superba Using u-Net Convolutional Neural Network.'' Marine Ecology Progress Series 760: 55--69

2025
[16]

Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. ``Microsoft Coco: Common Objects in Context.'' In European Conference on Computer Vision, 740--55. Springer

2014
[17]

Macaulay, Gavin, and Hector Peña. 2018. The SONAR-netCDF4 Convention for Sonar Data, Version 1.0. ICES Cooperative Research Reports (CRR)

2018
[18]

Marques, Tiago A. et al. 2020. ``Detection and Classification of Fish Schools Using YOLO and Faster r-CNN on Echosounder Data.'' Proceedings of IEEE OCEANS

2020
[19]

McQuinn, Ian H, and D Reid. 2022. ``Description of the ICES HAC Standard Data Exchange Format, Version 1.60.''

2022
[20]

Pala, Ahmet, Anna Oleynik, Ketil Malde, and Nils Olav Handegard. 2024. ``Self-Supervised Feature Learning for Acoustic Data Analysis.'' Ecological Informatics 84: 102878

2024
[21]

Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. ``You Only Look Once: Unified, Real-Time Object Detection.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779--88

2016
[22]

Redmon, Joseph, and Ali Farhadi. 2017. ``YOLO9000: Better, Faster, Stronger.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7263--71

2017
[23]

Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2016. ``Faster r-CNN: Towards Real-Time Object Detection with Region Proposal Networks.'' IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6): 1137--49

2016
[24]

Rezvanifar, Ashkan et al. 2019. ``Deep Convolutional Networks for Marine Acoustic Classification.'' IEEE Oceans Conference

2019
[25]

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. 2015. ``U-Net: Convolutional Networks for Biomedical Image Segmentation.'' In International Conference on Medical Image Computing and Computer-Assisted Intervention, 234--41. Springer

2015
[26]

Schmidhuber, Jürgen. 2022. ``Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award.'' Technical Report IDSIA-77-21 (v3), IDSIA, Lugano, Switzerland, 2021--2022

2022
[27]

Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. ``Intriguing Properties of Neural Networks.'' arXiv Preprint arXiv:1312.6199

Pith/arXiv arXiv 2013
[28]

Yassir, Anas, Said Jai Andaloussi, Ouail Ouchetto, Kamal Mamza, and Mansour Serghini. 2023. ``Acoustic Fish Species Identification Using Deep Learning and Machine Learning Algorithms: A Systematic Review.'' Fisheries Research 266: 106790. CSLReferences document

2023

[1] [1]

Bahlburg, Dominik, Sebastian Menze, Bjørn A Krafft, Andy D Lowther, and Bettina Meyer. 2025. ``Mapping Encounters Between Antarctic Krill Fishing Vessels and Air-Breathing Krill Predators Using Acoustic Data from the Fishery.'' Proceedings of the National Academy of Sciences 122 (25): e2417203122

2025

[2] [2]

Bokhovkin, Alexey, and Evgeny Burnaev. 2019. ``Boundary Loss for Remote Sensing Imagery Semantic Segmentation.'' In International Symposium on Neural Networks, 388--401. Springer

2019

[3] [3]

Malde, R

Brautaset, O., K. Malde, R. J. Korneliussen, and N. O. Handegard. 2020. ``Acoustic Fish Detection Using Deep Learning.'' ICES Journal of Marine Science 77 (10): 3614--27

2020

[4] [4]

Coetzee, Janet. 2000. ``Use of a Shoal Analysis and Patch Estimation System (SHAPES) to Characterise Sardine Schools.'' Aquatic Living Resources 13 (1): 1--10

2000

[5] [5]

Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ``Imagenet: A Large-Scale Hierarchical Image Database.'' In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248--55. IEEE

2009

[6] [6]

Deng, Li. 2012. ``The Mnist Database of Handwritten Digit Images for Machine Learning Research [ Best of the Web ] .'' IEEE Signal Processing Magazine 29 (6): 141--42

2012

[7] [7]

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2020. ``An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.'' arXiv Preprint arXiv:2010.11929

Pith/arXiv arXiv 2020

[8] [8]

Echoview Software Pty Ltd. 2025. Echoview Version 15.1. https://www.echoview.com

2025

[9] [9]

Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. ``The Pascal Visual Object Classes (Voc) Challenge.'' International Journal of Computer Vision 88 (2): 303--38

2010

[10] [10]

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press

2016

[11] [11]

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. ``Deep Residual Learning for Image Recognition.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770--78

2016

[12] [12]

Hirama, Yudai, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura, Keiji Suzuki, and Masaaki Wada. 2017. ``Discriminating Fish Species by an Echo Sounder in a Set-Net Using a CNN.'' In 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES), 112--15. IEEE

2017

[13] [13]

Korneliussen, Rolf J, Yngve Heggelund, Gavin J Macaulay, Daniel Patel, Espen Johnsen, and Inge K Eliassen. 2016. ``Acoustic Identification of Marine Species Using a Feature Library.'' Methods in Oceanography 17: 187--205

2016

[14] [14]

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. ``Imagenet Classification with Deep Convolutional Neural Networks.'' Advances in Neural Information Processing Systems 25

2012

[15] [15]

Li, Yangdong, Qinghong Mao, Zhuang Chen, and Guoping Zhu. 2025. ``Enhancing Multi-Frequency Acoustic Signal Extraction of Antarctic Krill Euphausia Superba Using u-Net Convolutional Neural Network.'' Marine Ecology Progress Series 760: 55--69

2025

[16] [16]

Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. ``Microsoft Coco: Common Objects in Context.'' In European Conference on Computer Vision, 740--55. Springer

2014

[17] [17]

Macaulay, Gavin, and Hector Peña. 2018. The SONAR-netCDF4 Convention for Sonar Data, Version 1.0. ICES Cooperative Research Reports (CRR)

2018

[18] [18]

Marques, Tiago A. et al. 2020. ``Detection and Classification of Fish Schools Using YOLO and Faster r-CNN on Echosounder Data.'' Proceedings of IEEE OCEANS

2020

[19] [19]

McQuinn, Ian H, and D Reid. 2022. ``Description of the ICES HAC Standard Data Exchange Format, Version 1.60.''

2022

[20] [20]

Pala, Ahmet, Anna Oleynik, Ketil Malde, and Nils Olav Handegard. 2024. ``Self-Supervised Feature Learning for Acoustic Data Analysis.'' Ecological Informatics 84: 102878

2024

[21] [21]

Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. ``You Only Look Once: Unified, Real-Time Object Detection.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779--88

2016

[22] [22]

Redmon, Joseph, and Ali Farhadi. 2017. ``YOLO9000: Better, Faster, Stronger.'' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7263--71

2017

[23] [23]

Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2016. ``Faster r-CNN: Towards Real-Time Object Detection with Region Proposal Networks.'' IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6): 1137--49

2016

[24] [24]

Rezvanifar, Ashkan et al. 2019. ``Deep Convolutional Networks for Marine Acoustic Classification.'' IEEE Oceans Conference

2019

[25] [25]

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. 2015. ``U-Net: Convolutional Networks for Biomedical Image Segmentation.'' In International Conference on Medical Image Computing and Computer-Assisted Intervention, 234--41. Springer

2015

[26] [26]

Schmidhuber, Jürgen. 2022. ``Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award.'' Technical Report IDSIA-77-21 (v3), IDSIA, Lugano, Switzerland, 2021--2022

2022

[27] [27]

Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. ``Intriguing Properties of Neural Networks.'' arXiv Preprint arXiv:1312.6199

Pith/arXiv arXiv 2013

[28] [28]

Yassir, Anas, Said Jai Andaloussi, Ouail Ouchetto, Kamal Mamza, and Mansour Serghini. 2023. ``Acoustic Fish Species Identification Using Deep Learning and Machine Learning Algorithms: A Systematic Review.'' Fisheries Research 266: 106790. CSLReferences document

2023