Learning Objectness from Sonar Images for Class-Independent Object Detection
Pith reviewed 2026-05-25 11:58 UTC · model grok-4.3
The pith
A fully convolutional network regresses objectness from sonar images to generate high-recall detection proposals for unknown objects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a fully convolutional neural network can directly regress objectness values from sonar images, enabling the selection of a small set of high-recall proposals for class-independent object detection that generalizes to novel objects.
What carries the argument
The fully convolutional neural network that regresses an objectness value directly from the sonar image, used to rank and select proposals.
If this is right
- 96 percent recall is achieved with only 100 proposals per image.
- EdgeBoxes needs 5000 proposals to reach 97 percent recall and Selective Search needs 2000 proposals to reach 95 percent recall.
- The method outperforms a template matching baseline by a considerable margin.
- The approach generalizes to completely new objects never seen in training.
Where Pith is reading between the lines
- Fewer proposals could reduce the computational load on subsequent classification steps in a robotic pipeline.
- The same regression approach might transfer to other acoustic imaging types such as side-scan sonar without major redesign.
- Real-time operation becomes more feasible on resource-limited underwater vehicles when proposal count is kept low.
- Training on a broader set of marine objects could further improve robustness to novel shapes.
Load-bearing premise
The learned objectness scores will apply equally well to entirely new classes of objects never present in the training set.
What would settle it
Evaluating recall on a held-out test set containing only object categories completely absent from training; if recall with 100 proposals drops substantially below 96 percent, the generalization claim does not hold.
Figures
read the original abstract
Detecting novel objects without class information is not trivial, as it is difficult to generalize from a small training set. This is an interesting problem for underwater robotics, as modeling marine objects is inherently more difficult in sonar images, and training data might not be available apriori. Detection proposals algorithms can be used for this purpose but usually requires a large amount of output bounding boxes. In this paper we propose the use of a fully convolutional neural network that regresses an objectness value directly from a Forward-Looking sonar image. By ranking objectness, we can produce high recall (96 %) with only 100 proposals per image. In comparison, EdgeBoxes requires 5000 proposals to achieve a slightly better recall of 97 %, while Selective Search requires 2000 proposals to achieve 95 % recall. We also show that our method outperforms a template matching baseline by a considerable margin, and is able to generalize to completely new objects. We expect that this kind of technique can be used in the field to find lost objects under the sea.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes training a fully convolutional network to regress an objectness score directly from forward-looking sonar images. By ranking these scores, the method generates a small number of object proposals (100 per image) that achieve 96% recall. This is compared to EdgeBoxes (97% recall at 5000 proposals) and Selective Search (95% recall at 2000 proposals). The work also reports outperforming a template-matching baseline and claims the learned objectness generalizes to completely new object categories absent from the training set.
Significance. If the reported recall figures and generalization hold under a properly class-disjoint evaluation, the result would be useful for underwater robotics applications where novel objects must be detected with limited or no class-specific training data. The efficiency gain (high recall at far fewer proposals) is a concrete, practically relevant improvement over established proposal generators.
major comments (3)
- [Abstract / Results] Abstract and Results: The headline recall numbers (96 % at 100 proposals) are presented without any accompanying information on dataset size, number of images, number of object categories, or the precise train/test split protocol. This information is required to evaluate whether the generalization claim rests on a true class-disjoint partition or on shared low-level sonar features.
- [Methods] Methods: No description is given of the FCN architecture (depth, filter sizes, output resolution), the regression loss, the training procedure, or any regularization. Without these details the central claim that a learned objectness measure outperforms hand-crafted proposal methods cannot be verified or reproduced.
- [Results] Results / Generalization claim: The assertion that the method 'is able to generalize to completely new objects' is load-bearing for the paper's contribution, yet no table or section lists the object categories used in training versus testing or confirms that test objects belong to categories never seen during training.
minor comments (2)
- [Abstract] Abstract: 'apriori' should be written as two words ('a priori').
- [Results] The comparison tables or figures (if present) should report the exact number of test images and the number of object instances per category to allow readers to assess statistical reliability of the recall figures.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which identify important omissions that limit the paper's clarity and reproducibility. We will revise the manuscript to supply the requested details on the dataset, architecture, training, and evaluation protocol.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: The headline recall numbers (96 % at 100 proposals) are presented without any accompanying information on dataset size, number of images, number of object categories, or the precise train/test split protocol. This information is required to evaluate whether the generalization claim rests on a true class-disjoint partition or on shared low-level sonar features.
Authors: We agree these details are necessary. The revised manuscript will expand both the abstract and results sections to report dataset size, number of images, object categories, and the exact train/test split protocol, explicitly stating that the evaluation uses a class-disjoint partition. revision: yes
-
Referee: [Methods] Methods: No description is given of the FCN architecture (depth, filter sizes, output resolution), the regression loss, the training procedure, or any regularization. Without these details the central claim that a learned objectness measure outperforms hand-crafted proposal methods cannot be verified or reproduced.
Authors: We acknowledge the methods section is incomplete for reproducibility. The revision will add a full specification of the FCN architecture (depth, filter sizes, output resolution), the regression loss, training procedure, and regularization. revision: yes
-
Referee: [Results] Results / Generalization claim: The assertion that the method 'is able to generalize to completely new objects' is load-bearing for the paper's contribution, yet no table or section lists the object categories used in training versus testing or confirms that test objects belong to categories never seen during training.
Authors: We will add a table (or dedicated subsection) that enumerates the training and test object categories and confirms the test categories were never present in training, thereby documenting the class-disjoint evaluation. revision: yes
Circularity Check
No circularity; empirical results on held-out data are self-contained
full rationale
The paper trains an FCN to regress objectness from sonar images and reports recall metrics on held-out test images. These are direct empirical measurements, not derivations that reduce by construction to fitted inputs or self-citations. No equations, ansatzes, or uniqueness theorems are invoked that would make the 96% recall claim equivalent to the training data by definition. Generalization to new objects is an empirical claim resting on the (unshown) train/test split protocol, which does not constitute circularity under the specified patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Convolutional neural networks trained on image data can learn to regress a scalar objectness score that ranks true objects above background.
Reference graph
Works this paper leans on
-
[1]
Automatic detection of underwater chain links using a forward-looking sonar,
N. Hurt´os, N. Palomeras, S. Nagappa, and J. Salvi, “Automatic detection of underwater chain links using a forward-looking sonar,” in OCEANS- Bergen, 2013 MTS/IEEE . IEEE, 2013, pp. 1–7
work page 2013
-
[2]
Cascade of boosted classifiers for rapid detection of underwater objects,
J. Sawas, Y . Petillot, and Y . Pailhas, “Cascade of boosted classifiers for rapid detection of underwater objects,” in Proceedings of the European Conference on Underwater Acoustics , 2010
work page 2010
-
[3]
Submerged Marine Debris Detection with Autonomous Underwater Vehicles,
M. Valdenegro-Toro, “Submerged Marine Debris Detection with Autonomous Underwater Vehicles,” in International Conference on Robotics and Automation for Humanitarian Applications (RAHA) . IEEE, 2016
work page 2016
-
[4]
How good are detection proposals, really?
J. Hosang, R. Benenson, and B. Schiele, “How good are detection proposals, really?” arXiv preprint arXiv:1406.6962 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[5]
M. Valdenegro-Toro, Objectness Scoring and Detection Proposals in F orward-Looking Sonar Images with Convolutional Neural Networks . Springer International Publishing, 2016
work page 2016
-
[6]
——, “End-to-End Object Detection and Recognition in Forward- Looking Sonar Images with Convolutional Neural Networks,” in Autonomous Underwater V ehicles (AUV), 2016 IEEE/OES . IEEE, 2016, pp. 144–150
work page 2016
-
[7]
Category independent object proposals,
I. Endres and D. Hoiem, “Category independent object proposals,” in Computer Vision–ECCV 2010 . Springer, 2010, pp. 575–588
work page 2010
-
[8]
Measuring the objectness of image windows,
B. Alexe, T. Deselaers, and V . Ferrari, “Measuring the objectness of image windows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2189–2202, 2012
work page 2012
-
[9]
Edge boxes: Locating object proposals from edges,
C. L. Zitnick and P. Doll ´ar, “Edge boxes: Locating object proposals from edges,” in Computer Vision–ECCV 2014 . Springer, 2014, pp. 391–405
work page 2014
-
[10]
Selective search for object recognition,
J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International journal of computer vision , vol. 104, no. 2, pp. 154–171, 2013
work page 2013
-
[11]
Faster r-cnn: Towards real- time object detection with region proposal networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in Neural Information Processing Systems , 2015, pp. 91–99
work page 2015
-
[12]
R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision , 2015, pp. 1440–1448
work page 2015
-
[13]
Gradient-based learning applied to document recognition,
Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998
work page 1998
-
[14]
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360 , 2016. 100 101 102 103 104 0 20 40 60 80859095100 # of Proposals Recall (%) SS Fast SS Quality EdgeBoxes CNN Threshold FCN Threshold TM Threshold CNN Ranking...
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[15]
Real-time convolutional networks for sonar image classification in low-power embedded systems
M. Valdenegro-Toro, “Real-time convolutional networks for sonar image classification in low-power embedded systems,” CoRR, vol. abs/1709.02153, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Fully convolutional networks for semantic segmentation,
E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3431–3440, 2015
work page 2015
-
[17]
Adam: A Method for Stochastic Optimization
D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[18]
What makes for effective detection proposals?
J. Hosang, R. Benenson, P. Doll ´ar, and B. Schiele, “What makes for effective detection proposals?” 2015
work page 2015
-
[19]
Object-proposal evaluation protocol is’ gameable’,
N. Chavali, H. Agrawal, A. Mahendru, and D. Batra, “Object-proposal evaluation protocol is’ gameable’,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 835–844
work page 2016
-
[20]
Template matching methods for object classification in synthetic aperture sonar images,
H. Midelfart, J. Groen, and O. Midtgaard, “Template matching methods for object classification in synthetic aperture sonar images,” in Proceedings of the Underwater Acoustic Measurements Conference , no. S S, 2009
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.