Learning Objectness from Sonar Images for Class-Independent Object Detection

Matias Valdenegro-Toro

arxiv: 1907.00734 · v1 · pith:SINCQEADnew · submitted 2019-07-01 · 💻 cs.CV · cs.LG· cs.RO· eess.IV

Learning Objectness from Sonar Images for Class-Independent Object Detection

Matias Valdenegro-Toro This is my paper

Pith reviewed 2026-05-25 11:58 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.ROeess.IV

keywords sonar imagesobjectnessobject detectiondetection proposalsforward-looking sonarunderwater roboticsconvolutional neural networkclass-independent detection

0 comments

The pith

A fully convolutional network regresses objectness from sonar images to generate high-recall detection proposals for unknown objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training a fully convolutional neural network to predict an objectness score for regions in forward-looking sonar images. This score can be ranked to select a small number of candidate bounding boxes that are likely to contain objects, without needing to know the object classes in advance. The method achieves 96% recall using just 100 proposals per image, outperforming traditional proposal generators like EdgeBoxes and Selective Search that require thousands of proposals for similar recall. It also generalizes to objects not seen during training and beats a template matching approach. This approach is aimed at underwater robotics applications where training data for specific objects may be unavailable.

Core claim

The central claim is that a fully convolutional neural network can directly regress objectness values from sonar images, enabling the selection of a small set of high-recall proposals for class-independent object detection that generalizes to novel objects.

What carries the argument

The fully convolutional neural network that regresses an objectness value directly from the sonar image, used to rank and select proposals.

If this is right

96 percent recall is achieved with only 100 proposals per image.
EdgeBoxes needs 5000 proposals to reach 97 percent recall and Selective Search needs 2000 proposals to reach 95 percent recall.
The method outperforms a template matching baseline by a considerable margin.
The approach generalizes to completely new objects never seen in training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fewer proposals could reduce the computational load on subsequent classification steps in a robotic pipeline.
The same regression approach might transfer to other acoustic imaging types such as side-scan sonar without major redesign.
Real-time operation becomes more feasible on resource-limited underwater vehicles when proposal count is kept low.
Training on a broader set of marine objects could further improve robustness to novel shapes.

Load-bearing premise

The learned objectness scores will apply equally well to entirely new classes of objects never present in the training set.

What would settle it

Evaluating recall on a held-out test set containing only object categories completely absent from training; if recall with 100 proposals drops substantially below 96 percent, the generalization claim does not hold.

Figures

Figures reproduced from arXiv: 1907.00734 by Matias Valdenegro-Toro.

**Figure 3.** Figure 3: Objectness thresholding results with CNN, FCN and CC TM [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Effect of the number of proposals on recall for different techniques. State of the art detection proposals methods can achieve high recall but only [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of objectness maps produced by CNN and FCN on previously unseen Forward-Looking Sonar Images. In each group: Left is the [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Sample detections produced by objectness ranking with CNN and FCN scores. We show the top [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

Detecting novel objects without class information is not trivial, as it is difficult to generalize from a small training set. This is an interesting problem for underwater robotics, as modeling marine objects is inherently more difficult in sonar images, and training data might not be available apriori. Detection proposals algorithms can be used for this purpose but usually requires a large amount of output bounding boxes. In this paper we propose the use of a fully convolutional neural network that regresses an objectness value directly from a Forward-Looking sonar image. By ranking objectness, we can produce high recall (96 %) with only 100 proposals per image. In comparison, EdgeBoxes requires 5000 proposals to achieve a slightly better recall of 97 %, while Selective Search requires 2000 proposals to achieve 95 % recall. We also show that our method outperforms a template matching baseline by a considerable margin, and is able to generalize to completely new objects. We expect that this kind of technique can be used in the field to find lost objects under the sea.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

An FCN objectness regressor on sonar gets high recall with few proposals but the generalization claim has no supporting details on the split or data.

read the letter

The key point is that this work trains a fully convolutional network to regress objectness directly from forward-looking sonar images, reaching 96% recall with only 100 proposals, which is more efficient than EdgeBoxes or Selective Search on their reported figures. It also claims to generalize to completely new objects. What the paper does well is demonstrate a practical efficiency gain for class-independent detection in sonar, a domain where modeling objects is hard and data can be limited. The comparison to standard proposal algorithms and a template baseline gives a clear sense of where it stands. The soft spots are in the evidence. The abstract supplies no dataset size, no architecture or training details, and no description of how the new objects were chosen or split from training. This makes the generalization claim hard to evaluate, and the stress-test concern about whether the split is truly class-disjoint holds up based on what's here. The recall numbers are given without any measure of variability or significance, so it's unclear how robust the advantage is. This is aimed at people doing underwater robotics or sonar vision work. A reader in that area might find the approach worth trying if the details check out, but it won't move the broader computer vision field. It deserves a serious referee because the idea is simple and the efficiency result is concrete enough to be worth checking, even with the current gaps. I would send it to review and ask for the experimental protocol and the object categories used in the split.

Referee Report

3 major / 2 minor

Summary. The paper proposes training a fully convolutional network to regress an objectness score directly from forward-looking sonar images. By ranking these scores, the method generates a small number of object proposals (100 per image) that achieve 96% recall. This is compared to EdgeBoxes (97% recall at 5000 proposals) and Selective Search (95% recall at 2000 proposals). The work also reports outperforming a template-matching baseline and claims the learned objectness generalizes to completely new object categories absent from the training set.

Significance. If the reported recall figures and generalization hold under a properly class-disjoint evaluation, the result would be useful for underwater robotics applications where novel objects must be detected with limited or no class-specific training data. The efficiency gain (high recall at far fewer proposals) is a concrete, practically relevant improvement over established proposal generators.

major comments (3)

[Abstract / Results] Abstract and Results: The headline recall numbers (96 % at 100 proposals) are presented without any accompanying information on dataset size, number of images, number of object categories, or the precise train/test split protocol. This information is required to evaluate whether the generalization claim rests on a true class-disjoint partition or on shared low-level sonar features.
[Methods] Methods: No description is given of the FCN architecture (depth, filter sizes, output resolution), the regression loss, the training procedure, or any regularization. Without these details the central claim that a learned objectness measure outperforms hand-crafted proposal methods cannot be verified or reproduced.
[Results] Results / Generalization claim: The assertion that the method 'is able to generalize to completely new objects' is load-bearing for the paper's contribution, yet no table or section lists the object categories used in training versus testing or confirms that test objects belong to categories never seen during training.

minor comments (2)

[Abstract] Abstract: 'apriori' should be written as two words ('a priori').
[Results] The comparison tables or figures (if present) should report the exact number of test images and the number of object instances per category to allow readers to assess statistical reliability of the recall figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which identify important omissions that limit the paper's clarity and reproducibility. We will revise the manuscript to supply the requested details on the dataset, architecture, training, and evaluation protocol.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The headline recall numbers (96 % at 100 proposals) are presented without any accompanying information on dataset size, number of images, number of object categories, or the precise train/test split protocol. This information is required to evaluate whether the generalization claim rests on a true class-disjoint partition or on shared low-level sonar features.

Authors: We agree these details are necessary. The revised manuscript will expand both the abstract and results sections to report dataset size, number of images, object categories, and the exact train/test split protocol, explicitly stating that the evaluation uses a class-disjoint partition. revision: yes
Referee: [Methods] Methods: No description is given of the FCN architecture (depth, filter sizes, output resolution), the regression loss, the training procedure, or any regularization. Without these details the central claim that a learned objectness measure outperforms hand-crafted proposal methods cannot be verified or reproduced.

Authors: We acknowledge the methods section is incomplete for reproducibility. The revision will add a full specification of the FCN architecture (depth, filter sizes, output resolution), the regression loss, training procedure, and regularization. revision: yes
Referee: [Results] Results / Generalization claim: The assertion that the method 'is able to generalize to completely new objects' is load-bearing for the paper's contribution, yet no table or section lists the object categories used in training versus testing or confirms that test objects belong to categories never seen during training.

Authors: We will add a table (or dedicated subsection) that enumerates the training and test object categories and confirms the test categories were never present in training, thereby documenting the class-disjoint evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out data are self-contained

full rationale

The paper trains an FCN to regress objectness from sonar images and reports recall metrics on held-out test images. These are direct empirical measurements, not derivations that reduce by construction to fitted inputs or self-citations. No equations, ansatzes, or uniqueness theorems are invoked that would make the 96% recall claim equivalent to the training data by definition. Generalization to new objects is an empirical claim resting on the (unshown) train/test split protocol, which does not constitute circularity under the specified patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full training details, architecture, and dataset statistics unavailable. The central claim rests on the unstated premise that a standard CNN can learn sonar-specific objectness features that transfer to unseen objects.

axioms (1)

domain assumption Convolutional neural networks trained on image data can learn to regress a scalar objectness score that ranks true objects above background.
Implicit foundation for using an FCN to produce the objectness map.

pith-pipeline@v0.9.0 · 5716 in / 1027 out tokens · 33885 ms · 2026-05-25T11:58:54.336496+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 4 internal anchors

[1]

Automatic detection of underwater chain links using a forward-looking sonar,

N. Hurt´os, N. Palomeras, S. Nagappa, and J. Salvi, “Automatic detection of underwater chain links using a forward-looking sonar,” in OCEANS- Bergen, 2013 MTS/IEEE . IEEE, 2013, pp. 1–7

work page 2013
[2]

Cascade of boosted classiﬁers for rapid detection of underwater objects,

J. Sawas, Y . Petillot, and Y . Pailhas, “Cascade of boosted classiﬁers for rapid detection of underwater objects,” in Proceedings of the European Conference on Underwater Acoustics , 2010

work page 2010
[3]

Submerged Marine Debris Detection with Autonomous Underwater Vehicles,

M. Valdenegro-Toro, “Submerged Marine Debris Detection with Autonomous Underwater Vehicles,” in International Conference on Robotics and Automation for Humanitarian Applications (RAHA) . IEEE, 2016

work page 2016
[4]

How good are detection proposals, really?

J. Hosang, R. Benenson, and B. Schiele, “How good are detection proposals, really?” arXiv preprint arXiv:1406.6962 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[5]

Valdenegro-Toro, Objectness Scoring and Detection Proposals in F orward-Looking Sonar Images with Convolutional Neural Networks

M. Valdenegro-Toro, Objectness Scoring and Detection Proposals in F orward-Looking Sonar Images with Convolutional Neural Networks . Springer International Publishing, 2016

work page 2016
[6]

End-to-End Object Detection and Recognition in Forward- Looking Sonar Images with Convolutional Neural Networks,

——, “End-to-End Object Detection and Recognition in Forward- Looking Sonar Images with Convolutional Neural Networks,” in Autonomous Underwater V ehicles (AUV), 2016 IEEE/OES . IEEE, 2016, pp. 144–150

work page 2016
[7]

Category independent object proposals,

I. Endres and D. Hoiem, “Category independent object proposals,” in Computer Vision–ECCV 2010 . Springer, 2010, pp. 575–588

work page 2010
[8]

Measuring the objectness of image windows,

B. Alexe, T. Deselaers, and V . Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012

work page 2012
[9]

Edge boxes: Locating object proposals from edges,

C. L. Zitnick and P. Doll ´ar, “Edge boxes: Locating object proposals from edges,” in Computer Vision–ECCV 2014 . Springer, 2014, pp. 391–405

work page 2014
[10]

Selective search for object recognition,

J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International journal of computer vision , vol. 104, no. 2, pp. 154–171, 2013

work page 2013
[11]

Faster r-cnn: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in Neural Information Processing Systems , 2015, pp. 91–99

work page 2015
[12]

Fast r-cnn,

R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision , 2015, pp. 1440–1448

work page 2015
[13]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998
[14]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360 , 2016. 100 101 102 103 104 0 20 40 60 80859095100 # of Proposals Recall (%) SS Fast SS Quality EdgeBoxes CNN Threshold FCN Threshold TM Threshold CNN Ranking...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

Real-time convolutional networks for sonar image classification in low-power embedded systems

M. Valdenegro-Toro, “Real-time convolutional networks for sonar image classiﬁcation in low-power embedded systems,” CoRR, vol. abs/1709.02153, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Fully convolutional networks for semantic segmentation,

E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3431–3440, 2015

work page 2015
[17]

Adam: A Method for Stochastic Optimization

D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

What makes for effective detection proposals?

J. Hosang, R. Benenson, P. Doll ´ar, and B. Schiele, “What makes for effective detection proposals?” 2015

work page 2015
[19]

Object-proposal evaluation protocol is’ gameable’,

N. Chavali, H. Agrawal, A. Mahendru, and D. Batra, “Object-proposal evaluation protocol is’ gameable’,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 835–844

work page 2016
[20]

Template matching methods for object classiﬁcation in synthetic aperture sonar images,

H. Midelfart, J. Groen, and O. Midtgaard, “Template matching methods for object classiﬁcation in synthetic aperture sonar images,” in Proceedings of the Underwater Acoustic Measurements Conference , no. S S, 2009

work page 2009

[1] [1]

Automatic detection of underwater chain links using a forward-looking sonar,

N. Hurt´os, N. Palomeras, S. Nagappa, and J. Salvi, “Automatic detection of underwater chain links using a forward-looking sonar,” in OCEANS- Bergen, 2013 MTS/IEEE . IEEE, 2013, pp. 1–7

work page 2013

[2] [2]

Cascade of boosted classiﬁers for rapid detection of underwater objects,

J. Sawas, Y . Petillot, and Y . Pailhas, “Cascade of boosted classiﬁers for rapid detection of underwater objects,” in Proceedings of the European Conference on Underwater Acoustics , 2010

work page 2010

[3] [3]

Submerged Marine Debris Detection with Autonomous Underwater Vehicles,

M. Valdenegro-Toro, “Submerged Marine Debris Detection with Autonomous Underwater Vehicles,” in International Conference on Robotics and Automation for Humanitarian Applications (RAHA) . IEEE, 2016

work page 2016

[4] [4]

How good are detection proposals, really?

J. Hosang, R. Benenson, and B. Schiele, “How good are detection proposals, really?” arXiv preprint arXiv:1406.6962 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[5] [5]

Valdenegro-Toro, Objectness Scoring and Detection Proposals in F orward-Looking Sonar Images with Convolutional Neural Networks

M. Valdenegro-Toro, Objectness Scoring and Detection Proposals in F orward-Looking Sonar Images with Convolutional Neural Networks . Springer International Publishing, 2016

work page 2016

[6] [6]

End-to-End Object Detection and Recognition in Forward- Looking Sonar Images with Convolutional Neural Networks,

——, “End-to-End Object Detection and Recognition in Forward- Looking Sonar Images with Convolutional Neural Networks,” in Autonomous Underwater V ehicles (AUV), 2016 IEEE/OES . IEEE, 2016, pp. 144–150

work page 2016

[7] [7]

Category independent object proposals,

I. Endres and D. Hoiem, “Category independent object proposals,” in Computer Vision–ECCV 2010 . Springer, 2010, pp. 575–588

work page 2010

[8] [8]

Measuring the objectness of image windows,

B. Alexe, T. Deselaers, and V . Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012

work page 2012

[9] [9]

Edge boxes: Locating object proposals from edges,

C. L. Zitnick and P. Doll ´ar, “Edge boxes: Locating object proposals from edges,” in Computer Vision–ECCV 2014 . Springer, 2014, pp. 391–405

work page 2014

[10] [10]

Selective search for object recognition,

J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International journal of computer vision , vol. 104, no. 2, pp. 154–171, 2013

work page 2013

[11] [11]

Faster r-cnn: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in Neural Information Processing Systems , 2015, pp. 91–99

work page 2015

[12] [12]

Fast r-cnn,

R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision , 2015, pp. 1440–1448

work page 2015

[13] [13]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998

work page 1998

[14] [14]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360 , 2016. 100 101 102 103 104 0 20 40 60 80859095100 # of Proposals Recall (%) SS Fast SS Quality EdgeBoxes CNN Threshold FCN Threshold TM Threshold CNN Ranking...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

Real-time convolutional networks for sonar image classification in low-power embedded systems

M. Valdenegro-Toro, “Real-time convolutional networks for sonar image classiﬁcation in low-power embedded systems,” CoRR, vol. abs/1709.02153, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Fully convolutional networks for semantic segmentation,

E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3431–3440, 2015

work page 2015

[17] [17]

Adam: A Method for Stochastic Optimization

D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[18] [18]

What makes for effective detection proposals?

J. Hosang, R. Benenson, P. Doll ´ar, and B. Schiele, “What makes for effective detection proposals?” 2015

work page 2015

[19] [19]

Object-proposal evaluation protocol is’ gameable’,

N. Chavali, H. Agrawal, A. Mahendru, and D. Batra, “Object-proposal evaluation protocol is’ gameable’,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 835–844

work page 2016

[20] [20]

Template matching methods for object classiﬁcation in synthetic aperture sonar images,

H. Midelfart, J. Groen, and O. Midtgaard, “Template matching methods for object classiﬁcation in synthetic aperture sonar images,” in Proceedings of the Underwater Acoustic Measurements Conference , no. S S, 2009

work page 2009