CTFS : Collaborative Teacher Framework for Forward-Looking Sonar Image Semantic Segmentation with Extremely Limited Labels

Chengzhou Li; Guanchen Meng; Jinyuan Liu; Ping Guo; Qi Jia; Xin Fan; Yu Liu; Zhongxuan Luo; Zhu Liu

arxiv: 2603.21071 · v2 · pith:EAVR2C5Jnew · submitted 2026-03-22 · 💻 cs.CV · cs.AI

CTFS : Collaborative Teacher Framework for Forward-Looking Sonar Image Semantic Segmentation with Extremely Limited Labels

Ping Guo , Chengzhou Li , Guanchen Meng , Qi Jia , Jinyuan Liu , Zhu Liu , Yu Liu , Zhongxuan Luo

show 1 more author

Xin Fan

This is my paper

Pith reviewed 2026-05-21 10:12 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords forward-looking sonarsemantic segmentationlimited labelsteacher-student frameworkpseudo-label reliabilityunderwater imagingmulti-teacher collaborationnoise robustness

0 comments

The pith

A collaborative multi-teacher setup improves forward-looking sonar semantic segmentation when only 2 percent of data is labeled.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that pairs one general teacher with several sonar-specific teachers to guide a student model through alternating training phases. This setup is paired with a consistency-based check that scores pseudo-label quality across teachers and image views to down-weight noise. The approach targets the distinctive problems of forward-looking sonar images, including speckle, shadows, and low contrast, which defeat standard semi-supervised methods when labels are scarce. A sympathetic reader would care because underwater perception tasks often face high annotation costs, so gains at the 2 percent label level could make practical deployment more feasible.

Core claim

By training a student under the alternating guidance of one general teacher and multiple sonar-specific teachers while using cross-teacher consistency to assess and down-weight unreliable pseudo-labels, the method achieves more robust feature learning for semantic segmentation of forward-looking sonar images under extremely limited supervision, delivering a 5.08 percent mIoU gain on the FLSMD dataset at the 2 percent labeled-data regime relative to prior state-of-the-art approaches.

What carries the argument

Multi-teacher collaborative mechanism consisting of one general teacher and multiple sonar-specific teachers, together with a cross-teacher reliability assessment that measures prediction consistency across views and teachers to filter noisy pseudo-labels.

If this is right

The student acquires both broad semantic structure and sonar-specific cues that single-teacher methods miss.
The impact of speckle noise, acoustic shadows, and geometric distortions on training is reduced through dynamic pseudo-label filtering.
Performance advantages appear most clearly in the extremely low-label regime exemplified by 2 percent supervision.
The framework supplies a concrete way to combine general and domain-specific knowledge sources without manual label expansion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency-checking idea could be tested on other noisy imaging domains such as medical ultrasound or radar.
Integration with active learning might further lower the annotation budget needed for underwater mapping tasks.
The alternating guidance schedule could be examined for its effect on convergence speed in other multi-model semi-supervised settings.
Downstream tasks such as obstacle avoidance for autonomous underwater vehicles may see indirect gains from the improved segmentation masks.

Load-bearing premise

Consistency of predictions across the different teachers and image views can reliably detect and reduce the harm from noisy pseudo-labels that arise in forward-looking sonar data.

What would settle it

Ablating the cross-teacher reliability assessment on the FLSMD dataset at the 2 percent labeled-data setting and measuring whether the reported mIoU advantage over competing methods disappears.

Figures

Figures reproduced from arXiv: 2603.21071 by Chengzhou Li, Guanchen Meng, Jinyuan Liu, Ping Guo, Qi Jia, Xin Fan, Yu Liu, Zhongxuan Luo, Zhu Liu.

**Figure 2.** Figure 2: (a) The overall architecture of CTFS, where knowledge is transferred to the student through the collaboration between the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Example of shadows in sonar images: due to the obstruction of objects during sonar propagation, a shadow is formed behind [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (a) The collection process of the FSSG dataset. (b) Sample distribution of each category in the FSSG dataset, and the visualization [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative demonstrations of different approaches on the FLSMD dataset with 2% labeled data. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative demonstrations of different approaches on the FSSG dataset with 2% labeled data. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Performance comparison of tail-class categories on the FLSMD dataset with a 2% labeled and the FSSG dataset with a 5% [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: mIoU results for each parameter on the FLSMD dataset [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

As one of the most important underwater sensing technologies, forward-looking sonar exhibits unique imaging characteristics. Sonar images are often affected by severe speckle noise, low texture contrast, acoustic shadows, and geometric distortions. These factors make it difficult for traditional teacher-student frameworks to achieve satisfactory performance in sonar semantic segmentation tasks under extremely limited labeled data conditions. To address this issue, we propose a Collaborative Teacher Semantic Segmentation Framework for forward-looking sonar images. This framework introduces a multi-teacher collaborative mechanism composed of one general teacher and multiple sonar-specific teachers. By adopting a multi-teacher alternating guidance strategy, the student model can learn general semantic representations while simultaneously capturing the unique characteristics of sonar images, thereby achieving more comprehensive and robust feature modeling. Considering the challenges of sonar images, which can lead teachers to generate a large number of noisy pseudo-labels, we further design a cross-teacher reliability assessment mechanism. This mechanism dynamically quantifies the reliability of pseudo-labels by evaluating the consistency and stability of predictions across multiple views and multiple teachers, thereby mitigating the negative impact caused by noisy pseudo-labels. Notably, on the FLSMD dataset, when only 2% of the data is labeled, our method achieves a 5.08% improvement in mIoU compared to other state-of-the-art approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A domain-specific multi-teacher extension for sonar segmentation that reports a 5% mIoU gain at 2% labels, but the cross-teacher reliability step needs direct validation against artifact-induced errors.

read the letter

The core contribution is a collaborative teacher setup for forward-looking sonar semantic segmentation under extreme label scarcity. It combines one general teacher with multiple sonar-specific teachers, uses alternating guidance to the student, and adds a cross-teacher reliability score based on prediction consistency across views and teachers to filter noisy pseudo-labels. On the FLSMD dataset with 2% labeled data, the method shows a 5.08% mIoU improvement over other approaches. This is a practical tailoring of semi-supervised ideas to sonar's specific imaging problems like speckle, shadows, and low contrast. The alternating strategy and reliability quantification are straightforward extensions that make sense for the domain, and the empirical claim is concrete enough to be checked. If the full experiments include solid baselines, ablations on the teacher components, and statistical details, the work could help practitioners in underwater sensing where labels are costly. The main soft spot is the reliability assessment. It assumes consistency across teachers reliably down-weights errors, but sonar artifacts often produce correlated mistakes in the same regions. Without evidence that high reliability scores actually track lower per-pixel error on a labeled validation set, it's unclear whether this mechanism separates signal from shared noise or just reinforces common failures. The abstract gives no protocol details, so the reported gain needs context on implementation and variance. This paper is for researchers focused on semi-supervised segmentation in noisy, domain-specific imagery such as remote sensing or underwater vision. A reader working on limited-label problems in specialized sensors would get value from the design choices. It deserves a serious referee because the application is operationally relevant and the central claim is falsifiable with standard metrics. I would send it to peer review and ask specifically for ablations on the cross-teacher reliability step plus full experimental reproducibility details.

Referee Report

1 major / 2 minor

Summary. The paper proposes CTFS, a collaborative teacher framework for semantic segmentation of forward-looking sonar images under extremely limited labeled data. It combines one general teacher with multiple sonar-specific teachers via a multi-teacher alternating guidance strategy, and introduces a cross-teacher reliability assessment that quantifies pseudo-label reliability through prediction consistency and stability across views and teachers to mitigate noise from sonar artifacts such as speckle, low contrast, shadows, and distortions. The central empirical claim is a 5.08% mIoU gain over state-of-the-art methods on the FLSMD dataset when only 2% of the data is labeled.

Significance. If the reliability assessment proves robust, the work would advance semi-supervised segmentation for underwater sonar, a domain where annotation is costly and imaging artifacts are severe. The multi-teacher design and explicit handling of noisy pseudo-labels offer a targeted solution that could generalize to other noisy imaging modalities with systematic error patterns.

major comments (1)

[Cross-teacher reliability assessment mechanism (method section)] The central claim rests on the cross-teacher reliability assessment correctly down-weighting noisy pseudo-labels. However, sonar images contain systematic artifacts (acoustic shadows, geometric distortions, speckle) that can induce correlated errors across the general teacher and sonar-specific teachers. High cross-teacher consistency would then incorrectly assign high reliability to erroneous labels. The manuscript should provide an explicit check, such as the correlation between reliability scores and per-pixel error rates on a held-out labeled validation subset, to confirm the mechanism separates signal from shared artifact-induced error.

minor comments (2)

[Abstract and Experiments] The abstract states a quantitative improvement but supplies no experimental protocol, baseline details, or statistical tests; ensure the experiments section provides these with sufficient clarity for reproducibility.
[Method] Clarify the precise mathematical definition of the reliability score (e.g., how consistency and stability are combined) and any hyperparameters involved.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The concern about potential correlated errors in the cross-teacher reliability assessment due to systematic sonar artifacts is well-taken, and we address it directly below with a commitment to strengthen the validation in the revised version.

read point-by-point responses

Referee: The central claim rests on the cross-teacher reliability assessment correctly down-weighting noisy pseudo-labels. However, sonar images contain systematic artifacts (acoustic shadows, geometric distortions, speckle) that can induce correlated errors across the general teacher and sonar-specific teachers. High cross-teacher consistency would then incorrectly assign high reliability to erroneous labels. The manuscript should provide an explicit check, such as the correlation between reliability scores and per-pixel error rates on a held-out labeled validation subset, to confirm the mechanism separates signal from shared artifact-induced error.

Authors: We agree that systematic artifacts in sonar imagery could in principle produce correlated prediction errors across the general teacher and sonar-specific teachers, potentially leading the consistency-based reliability score to over-estimate label quality. Our multi-teacher alternating guidance and multi-view consistency design aim to diversify the sources of error, yet we acknowledge that an explicit empirical check is needed to confirm the mechanism’s robustness. In the revised manuscript we will add a dedicated analysis subsection that reports the correlation between the computed reliability scores and per-pixel error rates (measured against ground-truth labels) on a held-out labeled validation subset. We will include quantitative results (Pearson correlation coefficient) together with scatter plots and qualitative examples showing that low-reliability assignments align with artifact-induced errors. This addition will directly substantiate the central claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent validation

full rationale

The paper presents a Collaborative Teacher Semantic Segmentation Framework (CTFS) as an engineering solution for limited-label sonar segmentation. The central claim is an empirical performance gain (5.08% mIoU on FLSMD at 2% labels) obtained by comparing the proposed multi-teacher ensemble plus cross-teacher reliability assessment against external baselines. No equations, derivations, or first-principles results are described that reduce to fitted parameters or self-citations by construction. The reliability mechanism is motivated by domain challenges (speckle, shadows) but is not shown to be tautological with the inputs; its effectiveness is asserted via the reported dataset comparison rather than by definitional equivalence. Self-citations, if present, are not load-bearing for the core result. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted. The framework implicitly treats teacher-prediction consistency as a proxy for pseudo-label quality, which is a domain assumption rather than a derived result.

pith-pipeline@v0.9.0 · 5784 in / 1092 out tokens · 67187 ms · 2026-05-21T10:12:33.974343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation

Maregu Assefa, Muzammal Naseer, Iyyakutti Iyappan Gana- pathi, Syed Sadaf Ali, Mohamed L Seghier, and Naoufel Werghi. Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30850–30860, 2025. 2

work page 2025
[2]

Coco- stuff: Thing and stuff classes in context

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco- stuff: Thing and stuff classes in context. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1209–1218, 2018. 2

work page 2018
[3]

Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022

Xiang Cao, Lu Ren, and Changyin Sun. Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022. 1

work page 1968
[4]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

work page 2021
[5]

Changrui Chen, Jungong Han, and Kurt Debattista. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels.IEEE trans- actions on pattern analysis and machine intelligence, 46(8): 5595–5611, 2024. 2, 7

work page 2024
[6]

Semi-supervised semantic segmentation with cross pseudo supervision

Xiaokang Chen, Yuhui Yuan, Gang Zeng, and Jingdong Wang. Semi-supervised semantic segmentation with cross pseudo supervision. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2613–2622, 2021. 2

work page 2021
[7]

Cgmatch: A different perspective of semi- supervised learning

Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, and Lan Du. Cgmatch: A different perspective of semi- supervised learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15381–15391,

work page
[8]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 2

work page 2016
[9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2010
[10]

The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015

Mark Everingham, SM Ali Eslami, Luc Van Gool, Christo- pher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015. 2

work page 2015
[11]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2

work page 2022
[12]

Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier

Prantik Howlader, Srijan Das, Hieu Le, and Dimitris Sama- ras. Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier. In European Conference on Computer Vision, pages 342–360. Springer, 2024. 2, 7

work page 2024
[13]

Semivl: semi- supervised semantic segmentation with vision-language guidance

Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, and Federico Tombari. Semivl: semi- supervised semantic segmentation with vision-language guidance. InEuropean Conference on Computer Vision, pages 257–275. Springer, 2024. 2, 7

work page 2024
[14]

Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021

Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, and Liwei Wang. Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021. 2, 7

work page 2021
[15]

Physics-guided sonar image fine-grained recognition under scarce annotations

Chengzhou Li, Xiaokang Liu, Qi Jia, Jinyuan Liu, Zhiying Jiang, Longhan Feng, Yu Liu, Zhongxuan Luo, and Xin Fan. Physics-guided sonar image fine-grained recognition under scarce annotations. InProceedings of the 33rd ACM Interna- tional Conference on Multimedia, pages 1356–1365, 2025. 1

work page 2025
[16]

Pseco: Pseudo labeling and consistency training for semi-supervised object detection

Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, and Shanshan Zhang. Pseco: Pseudo labeling and consistency training for semi-supervised object detection. InEuropean Conference on Computer Vision, pages 457–472. Springer,

work page
[17]

Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024

Yangke Li and Xinman Zhang. Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024. 6

work page 2024
[18]

Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024

Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, and Ye Li. Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024. 1

work page 2024
[19]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 2

work page 2014
[20]

Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance

Yu Liu, Longhan Feng, Qi Jia, Zezheng Liu, and Zi-Huang Cao. Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance. InProceedings of the 32nd ACM International Con- ference on Multimedia, pages 6355–6363, 2024. 1

work page 2024
[21]

Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity

Chen-Yi Lu, Kasra Derakhshandeh, and Somali Chaterji. Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 20233–20243, 2025. 2

work page 2025
[22]

An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018

Erin McCann, Liling Li, Kevin Pangle, Nicholas Johnson, and Jesse Eickholt. An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018. 1

work page 2018
[23]

Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023

Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, and Wonjun Hwang. Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023. 2, 7

work page 2023
[24]

Classmix: Segmentation-based data aug- mentation for semi-supervised learning

Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, and Lennart Svensson. Classmix: Segmentation-based data aug- mentation for semi-supervised learning. InProceedings of the IEEE/CVF winter conference on applications of com- puter vision, pages 1369–1378, 2021. 2 9

work page 2021
[25]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Semi- supervised semantic segmentation with cross-consistency training

Yassine Ouali, C ´eline Hudelot, and Myriam Tami. Semi- supervised semantic segmentation with cross-consistency training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12674– 12684, 2020. 2

work page 2020
[27]

Vi- sion transformers for dense prediction

Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 6

work page 2021
[28]

The marine de- bris dataset for forward-looking sonar semantic segmenta- tion

Deepak Singh and Matias Valdenegro-Toro. The marine de- bris dataset for forward-looking sonar semantic segmenta- tion. InProceedings of the ieee/cvf international conference on computer vision, pages 3741–3749, 2021. 2, 6

work page 2021
[29]

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 2

work page 2020
[30]

Humble teachers teach better students for semi-supervised object detection

Yihe Tang, Weifeng Chen, Yijun Luo, and Yuting Zhang. Humble teachers teach better students for semi-supervised object detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3132–3141, 2021. 6

work page 2021
[31]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 2

work page 2017
[32]

Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection

Chenxu Wang, Chunyan Xu, Xiang Li, YuXuan Li, Xu Guo, Ziqi Gu, and Zhen Cui. Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7582–7590, 2025. 2

work page 2025
[33]

Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025

Mingjie Wang, Weiling Chen, Fengquan Lan, Naveed Ur Rehman Junejo, and Tiesong Zhao. Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025. 1

work page 2025
[34]

Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection

Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, and Wayne Zhang. Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3240–3249, 2023. 6

work page 2023
[35]

A dataset with multi- beam forward-looking sonar for underwater object detection

Kaibing Xie, Jian Yang, and Kang Qiu. A dataset with multi- beam forward-looking sonar for underwater object detection. Scientific Data, 9(1):739, 2022. 1

work page 2022
[36]

End-to- end semi-supervised object detection with soft teacher

Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to- end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3060–3069, 2021. 2

work page 2021
[37]

Revisiting weak-to-strong consistency in semi-supervised semantic segmentation

Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, and Yinghuan Shi. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7236–7246, 2023. 2, 7

work page 2023
[38]

Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Lihe Yang, Zhen Zhao, and Hengshuang Zhao. Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 7

work page 2025
[39]

Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025

Xin Yuan, Jiapeng Li, Weiwei Wang, Xiaoteng Zhou, Ning Li, and Changli Yu. Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025. 1

work page 2025
[40]

Cutmix: Regu- larization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 6023–6032, 2019. 2

work page 2019
[41]

Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fi- dler, Adela Barriuso, and Antonio Torralba. Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019. 2

work page 2019
[42]

Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025

Jifeng Zhu, Wenyu Cai, Meiyan Zhang, Yuxin Lin, and Mingming Liu. Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025. 1

work page 2025
[43]

Pseudoseg: Designing pseudo labels for semantic segmentation

Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, and Tomas Pfister. Pseudoseg: Designing pseudo labels for semantic segmentation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenRe- view.net, 2021. 2 10

work page 2021

[1] [1]

Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation

Maregu Assefa, Muzammal Naseer, Iyyakutti Iyappan Gana- pathi, Syed Sadaf Ali, Mohamed L Seghier, and Naoufel Werghi. Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30850–30860, 2025. 2

work page 2025

[2] [2]

Coco- stuff: Thing and stuff classes in context

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco- stuff: Thing and stuff classes in context. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1209–1218, 2018. 2

work page 2018

[3] [3]

Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022

Xiang Cao, Lu Ren, and Changyin Sun. Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022. 1

work page 1968

[4] [4]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

work page 2021

[5] [5]

Changrui Chen, Jungong Han, and Kurt Debattista. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels.IEEE trans- actions on pattern analysis and machine intelligence, 46(8): 5595–5611, 2024. 2, 7

work page 2024

[6] [6]

Semi-supervised semantic segmentation with cross pseudo supervision

Xiaokang Chen, Yuhui Yuan, Gang Zeng, and Jingdong Wang. Semi-supervised semantic segmentation with cross pseudo supervision. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2613–2622, 2021. 2

work page 2021

[7] [7]

Cgmatch: A different perspective of semi- supervised learning

Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, and Lan Du. Cgmatch: A different perspective of semi- supervised learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15381–15391,

work page

[8] [8]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 2

work page 2016

[9] [9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2010

[10] [10]

The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015

Mark Everingham, SM Ali Eslami, Luc Van Gool, Christo- pher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015. 2

work page 2015

[11] [11]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2

work page 2022

[12] [12]

Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier

Prantik Howlader, Srijan Das, Hieu Le, and Dimitris Sama- ras. Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier. In European Conference on Computer Vision, pages 342–360. Springer, 2024. 2, 7

work page 2024

[13] [13]

Semivl: semi- supervised semantic segmentation with vision-language guidance

Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, and Federico Tombari. Semivl: semi- supervised semantic segmentation with vision-language guidance. InEuropean Conference on Computer Vision, pages 257–275. Springer, 2024. 2, 7

work page 2024

[14] [14]

Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021

Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, and Liwei Wang. Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021. 2, 7

work page 2021

[15] [15]

Physics-guided sonar image fine-grained recognition under scarce annotations

Chengzhou Li, Xiaokang Liu, Qi Jia, Jinyuan Liu, Zhiying Jiang, Longhan Feng, Yu Liu, Zhongxuan Luo, and Xin Fan. Physics-guided sonar image fine-grained recognition under scarce annotations. InProceedings of the 33rd ACM Interna- tional Conference on Multimedia, pages 1356–1365, 2025. 1

work page 2025

[16] [16]

Pseco: Pseudo labeling and consistency training for semi-supervised object detection

Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, and Shanshan Zhang. Pseco: Pseudo labeling and consistency training for semi-supervised object detection. InEuropean Conference on Computer Vision, pages 457–472. Springer,

work page

[17] [17]

Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024

Yangke Li and Xinman Zhang. Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024. 6

work page 2024

[18] [18]

Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024

Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, and Ye Li. Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024. 1

work page 2024

[19] [19]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 2

work page 2014

[20] [20]

Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance

Yu Liu, Longhan Feng, Qi Jia, Zezheng Liu, and Zi-Huang Cao. Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance. InProceedings of the 32nd ACM International Con- ference on Multimedia, pages 6355–6363, 2024. 1

work page 2024

[21] [21]

Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity

Chen-Yi Lu, Kasra Derakhshandeh, and Somali Chaterji. Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 20233–20243, 2025. 2

work page 2025

[22] [22]

An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018

Erin McCann, Liling Li, Kevin Pangle, Nicholas Johnson, and Jesse Eickholt. An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018. 1

work page 2018

[23] [23]

Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023

Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, and Wonjun Hwang. Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023. 2, 7

work page 2023

[24] [24]

Classmix: Segmentation-based data aug- mentation for semi-supervised learning

Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, and Lennart Svensson. Classmix: Segmentation-based data aug- mentation for semi-supervised learning. InProceedings of the IEEE/CVF winter conference on applications of com- puter vision, pages 1369–1378, 2021. 2 9

work page 2021

[25] [25]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Semi- supervised semantic segmentation with cross-consistency training

Yassine Ouali, C ´eline Hudelot, and Myriam Tami. Semi- supervised semantic segmentation with cross-consistency training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12674– 12684, 2020. 2

work page 2020

[27] [27]

Vi- sion transformers for dense prediction

Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 6

work page 2021

[28] [28]

The marine de- bris dataset for forward-looking sonar semantic segmenta- tion

Deepak Singh and Matias Valdenegro-Toro. The marine de- bris dataset for forward-looking sonar semantic segmenta- tion. InProceedings of the ieee/cvf international conference on computer vision, pages 3741–3749, 2021. 2, 6

work page 2021

[29] [29]

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 2

work page 2020

[30] [30]

Humble teachers teach better students for semi-supervised object detection

Yihe Tang, Weifeng Chen, Yijun Luo, and Yuting Zhang. Humble teachers teach better students for semi-supervised object detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3132–3141, 2021. 6

work page 2021

[31] [31]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 2

work page 2017

[32] [32]

Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection

Chenxu Wang, Chunyan Xu, Xiang Li, YuXuan Li, Xu Guo, Ziqi Gu, and Zhen Cui. Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7582–7590, 2025. 2

work page 2025

[33] [33]

Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025

Mingjie Wang, Weiling Chen, Fengquan Lan, Naveed Ur Rehman Junejo, and Tiesong Zhao. Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025. 1

work page 2025

[34] [34]

Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection

Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, and Wayne Zhang. Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3240–3249, 2023. 6

work page 2023

[35] [35]

A dataset with multi- beam forward-looking sonar for underwater object detection

Kaibing Xie, Jian Yang, and Kang Qiu. A dataset with multi- beam forward-looking sonar for underwater object detection. Scientific Data, 9(1):739, 2022. 1

work page 2022

[36] [36]

End-to- end semi-supervised object detection with soft teacher

Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to- end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3060–3069, 2021. 2

work page 2021

[37] [37]

Revisiting weak-to-strong consistency in semi-supervised semantic segmentation

Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, and Yinghuan Shi. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7236–7246, 2023. 2, 7

work page 2023

[38] [38]

Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Lihe Yang, Zhen Zhao, and Hengshuang Zhao. Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 7

work page 2025

[39] [39]

Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025

Xin Yuan, Jiapeng Li, Weiwei Wang, Xiaoteng Zhou, Ning Li, and Changli Yu. Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025. 1

work page 2025

[40] [40]

Cutmix: Regu- larization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 6023–6032, 2019. 2

work page 2019

[41] [41]

Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fi- dler, Adela Barriuso, and Antonio Torralba. Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019. 2

work page 2019

[42] [42]

Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025

Jifeng Zhu, Wenyu Cai, Meiyan Zhang, Yuxin Lin, and Mingming Liu. Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025. 1

work page 2025

[43] [43]

Pseudoseg: Designing pseudo labels for semantic segmentation

Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, and Tomas Pfister. Pseudoseg: Designing pseudo labels for semantic segmentation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenRe- view.net, 2021. 2 10

work page 2021