pith. sign in

arxiv: 2603.21071 · v2 · pith:EAVR2C5Jnew · submitted 2026-03-22 · 💻 cs.CV · cs.AI

CTFS : Collaborative Teacher Framework for Forward-Looking Sonar Image Semantic Segmentation with Extremely Limited Labels

Pith reviewed 2026-05-21 10:12 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords forward-looking sonarsemantic segmentationlimited labelsteacher-student frameworkpseudo-label reliabilityunderwater imagingmulti-teacher collaborationnoise robustness
0
0 comments X

The pith

A collaborative multi-teacher setup improves forward-looking sonar semantic segmentation when only 2 percent of data is labeled.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that pairs one general teacher with several sonar-specific teachers to guide a student model through alternating training phases. This setup is paired with a consistency-based check that scores pseudo-label quality across teachers and image views to down-weight noise. The approach targets the distinctive problems of forward-looking sonar images, including speckle, shadows, and low contrast, which defeat standard semi-supervised methods when labels are scarce. A sympathetic reader would care because underwater perception tasks often face high annotation costs, so gains at the 2 percent label level could make practical deployment more feasible.

Core claim

By training a student under the alternating guidance of one general teacher and multiple sonar-specific teachers while using cross-teacher consistency to assess and down-weight unreliable pseudo-labels, the method achieves more robust feature learning for semantic segmentation of forward-looking sonar images under extremely limited supervision, delivering a 5.08 percent mIoU gain on the FLSMD dataset at the 2 percent labeled-data regime relative to prior state-of-the-art approaches.

What carries the argument

Multi-teacher collaborative mechanism consisting of one general teacher and multiple sonar-specific teachers, together with a cross-teacher reliability assessment that measures prediction consistency across views and teachers to filter noisy pseudo-labels.

If this is right

  • The student acquires both broad semantic structure and sonar-specific cues that single-teacher methods miss.
  • The impact of speckle noise, acoustic shadows, and geometric distortions on training is reduced through dynamic pseudo-label filtering.
  • Performance advantages appear most clearly in the extremely low-label regime exemplified by 2 percent supervision.
  • The framework supplies a concrete way to combine general and domain-specific knowledge sources without manual label expansion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same consistency-checking idea could be tested on other noisy imaging domains such as medical ultrasound or radar.
  • Integration with active learning might further lower the annotation budget needed for underwater mapping tasks.
  • The alternating guidance schedule could be examined for its effect on convergence speed in other multi-model semi-supervised settings.
  • Downstream tasks such as obstacle avoidance for autonomous underwater vehicles may see indirect gains from the improved segmentation masks.

Load-bearing premise

Consistency of predictions across the different teachers and image views can reliably detect and reduce the harm from noisy pseudo-labels that arise in forward-looking sonar data.

What would settle it

Ablating the cross-teacher reliability assessment on the FLSMD dataset at the 2 percent labeled-data setting and measuring whether the reported mIoU advantage over competing methods disappears.

Figures

Figures reproduced from arXiv: 2603.21071 by Chengzhou Li, Guanchen Meng, Jinyuan Liu, Ping Guo, Qi Jia, Xin Fan, Yu Liu, Zhongxuan Luo, Zhu Liu.

Figure 1
Figure 1. Figure 1: The unique characteristics of sonar images lead to [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) The overall architecture of CTFS, where knowledge is transferred to the student through the collaboration between the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Example of shadows in sonar images: due to the obstruction of objects during sonar propagation, a shadow is formed behind [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) The collection process of the FSSG dataset. (b) Sample distribution of each category in the FSSG dataset, and the visualization [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative demonstrations of different approaches on the FLSMD dataset with 2% labeled data. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative demonstrations of different approaches on the FSSG dataset with 2% labeled data. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance comparison of tail-class categories on the FLSMD dataset with a 2% labeled and the FSSG dataset with a 5% [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: mIoU results for each parameter on the FLSMD dataset [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

As one of the most important underwater sensing technologies, forward-looking sonar exhibits unique imaging characteristics. Sonar images are often affected by severe speckle noise, low texture contrast, acoustic shadows, and geometric distortions. These factors make it difficult for traditional teacher-student frameworks to achieve satisfactory performance in sonar semantic segmentation tasks under extremely limited labeled data conditions. To address this issue, we propose a Collaborative Teacher Semantic Segmentation Framework for forward-looking sonar images. This framework introduces a multi-teacher collaborative mechanism composed of one general teacher and multiple sonar-specific teachers. By adopting a multi-teacher alternating guidance strategy, the student model can learn general semantic representations while simultaneously capturing the unique characteristics of sonar images, thereby achieving more comprehensive and robust feature modeling. Considering the challenges of sonar images, which can lead teachers to generate a large number of noisy pseudo-labels, we further design a cross-teacher reliability assessment mechanism. This mechanism dynamically quantifies the reliability of pseudo-labels by evaluating the consistency and stability of predictions across multiple views and multiple teachers, thereby mitigating the negative impact caused by noisy pseudo-labels. Notably, on the FLSMD dataset, when only 2% of the data is labeled, our method achieves a 5.08% improvement in mIoU compared to other state-of-the-art approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes CTFS, a collaborative teacher framework for semantic segmentation of forward-looking sonar images under extremely limited labeled data. It combines one general teacher with multiple sonar-specific teachers via a multi-teacher alternating guidance strategy, and introduces a cross-teacher reliability assessment that quantifies pseudo-label reliability through prediction consistency and stability across views and teachers to mitigate noise from sonar artifacts such as speckle, low contrast, shadows, and distortions. The central empirical claim is a 5.08% mIoU gain over state-of-the-art methods on the FLSMD dataset when only 2% of the data is labeled.

Significance. If the reliability assessment proves robust, the work would advance semi-supervised segmentation for underwater sonar, a domain where annotation is costly and imaging artifacts are severe. The multi-teacher design and explicit handling of noisy pseudo-labels offer a targeted solution that could generalize to other noisy imaging modalities with systematic error patterns.

major comments (1)
  1. [Cross-teacher reliability assessment mechanism (method section)] The central claim rests on the cross-teacher reliability assessment correctly down-weighting noisy pseudo-labels. However, sonar images contain systematic artifacts (acoustic shadows, geometric distortions, speckle) that can induce correlated errors across the general teacher and sonar-specific teachers. High cross-teacher consistency would then incorrectly assign high reliability to erroneous labels. The manuscript should provide an explicit check, such as the correlation between reliability scores and per-pixel error rates on a held-out labeled validation subset, to confirm the mechanism separates signal from shared artifact-induced error.
minor comments (2)
  1. [Abstract and Experiments] The abstract states a quantitative improvement but supplies no experimental protocol, baseline details, or statistical tests; ensure the experiments section provides these with sufficient clarity for reproducibility.
  2. [Method] Clarify the precise mathematical definition of the reliability score (e.g., how consistency and stability are combined) and any hyperparameters involved.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The concern about potential correlated errors in the cross-teacher reliability assessment due to systematic sonar artifacts is well-taken, and we address it directly below with a commitment to strengthen the validation in the revised version.

read point-by-point responses
  1. Referee: The central claim rests on the cross-teacher reliability assessment correctly down-weighting noisy pseudo-labels. However, sonar images contain systematic artifacts (acoustic shadows, geometric distortions, speckle) that can induce correlated errors across the general teacher and sonar-specific teachers. High cross-teacher consistency would then incorrectly assign high reliability to erroneous labels. The manuscript should provide an explicit check, such as the correlation between reliability scores and per-pixel error rates on a held-out labeled validation subset, to confirm the mechanism separates signal from shared artifact-induced error.

    Authors: We agree that systematic artifacts in sonar imagery could in principle produce correlated prediction errors across the general teacher and sonar-specific teachers, potentially leading the consistency-based reliability score to over-estimate label quality. Our multi-teacher alternating guidance and multi-view consistency design aim to diversify the sources of error, yet we acknowledge that an explicit empirical check is needed to confirm the mechanism’s robustness. In the revised manuscript we will add a dedicated analysis subsection that reports the correlation between the computed reliability scores and per-pixel error rates (measured against ground-truth labels) on a held-out labeled validation subset. We will include quantitative results (Pearson correlation coefficient) together with scatter plots and qualitative examples showing that low-reliability assignments align with artifact-induced errors. This addition will directly substantiate the central claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent validation

full rationale

The paper presents a Collaborative Teacher Semantic Segmentation Framework (CTFS) as an engineering solution for limited-label sonar segmentation. The central claim is an empirical performance gain (5.08% mIoU on FLSMD at 2% labels) obtained by comparing the proposed multi-teacher ensemble plus cross-teacher reliability assessment against external baselines. No equations, derivations, or first-principles results are described that reduce to fitted parameters or self-citations by construction. The reliability mechanism is motivated by domain challenges (speckle, shadows) but is not shown to be tautological with the inputs; its effectiveness is asserted via the reported dataset comparison rather than by definitional equivalence. Self-citations, if present, are not load-bearing for the core result. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted. The framework implicitly treats teacher-prediction consistency as a proxy for pseudo-label quality, which is a domain assumption rather than a derived result.

pith-pipeline@v0.9.0 · 5784 in / 1092 out tokens · 67187 ms · 2026-05-21T10:12:33.974343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation

    Maregu Assefa, Muzammal Naseer, Iyyakutti Iyappan Gana- pathi, Syed Sadaf Ali, Mohamed L Seghier, and Naoufel Werghi. Dycon: Dynamic uncertainty-aware consistency and contrastive learning for semi-supervised medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30850–30860, 2025. 2

  2. [2]

    Coco- stuff: Thing and stuff classes in context

    Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco- stuff: Thing and stuff classes in context. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1209–1218, 2018. 2

  3. [3]

    Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022

    Xiang Cao, Lu Ren, and Changyin Sun. Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction.IEEE Transactions on Cybernetics, 53 (3):1968–1981, 2022. 1

  4. [4]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2

  5. [5]

    Changrui Chen, Jungong Han, and Kurt Debattista. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels.IEEE trans- actions on pattern analysis and machine intelligence, 46(8): 5595–5611, 2024. 2, 7

  6. [6]

    Semi-supervised semantic segmentation with cross pseudo supervision

    Xiaokang Chen, Yuhui Yuan, Gang Zeng, and Jingdong Wang. Semi-supervised semantic segmentation with cross pseudo supervision. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2613–2622, 2021. 2

  7. [7]

    Cgmatch: A different perspective of semi- supervised learning

    Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, and Lan Du. Cgmatch: A different perspective of semi- supervised learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15381–15391,

  8. [8]

    The cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 2

  9. [9]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2

  10. [10]

    The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015

    Mark Everingham, SM Ali Eslami, Luc Van Gool, Christo- pher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective.Inter- national journal of computer vision, 111(1):98–136, 2015. 2

  11. [11]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2

  12. [12]

    Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier

    Prantik Howlader, Srijan Das, Hieu Le, and Dimitris Sama- ras. Beyond pixels: Semi-supervised semantic segmenta- tion with a multi-scale patch-based multi-label classifier. In European Conference on Computer Vision, pages 342–360. Springer, 2024. 2, 7

  13. [13]

    Semivl: semi- supervised semantic segmentation with vision-language guidance

    Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, and Federico Tombari. Semivl: semi- supervised semantic segmentation with vision-language guidance. InEuropean Conference on Computer Vision, pages 257–275. Springer, 2024. 2, 7

  14. [14]

    Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021

    Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, and Liwei Wang. Semi-supervised semantic segmentation via adaptive equalization learning.Advances in Neural In- formation Processing Systems, 34:22106–22118, 2021. 2, 7

  15. [15]

    Physics-guided sonar image fine-grained recognition under scarce annotations

    Chengzhou Li, Xiaokang Liu, Qi Jia, Jinyuan Liu, Zhiying Jiang, Longhan Feng, Yu Liu, Zhongxuan Luo, and Xin Fan. Physics-guided sonar image fine-grained recognition under scarce annotations. InProceedings of the 33rd ACM Interna- tional Conference on Multimedia, pages 1356–1365, 2025. 1

  16. [16]

    Pseco: Pseudo labeling and consistency training for semi-supervised object detection

    Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, and Shanshan Zhang. Pseco: Pseudo labeling and consistency training for semi-supervised object detection. InEuropean Conference on Computer Vision, pages 457–472. Springer,

  17. [17]

    Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024

    Yangke Li and Xinman Zhang. Lightweight deep learning model for underwater waste segmentation based on sonar im- ages.Waste Management, 190:63–73, 2024. 6

  18. [18]

    Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024

    Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, and Ye Li. Rgb-sonar tracking benchmark and spatial cross-attention transformer tracker.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024. 1

  19. [19]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 2

  20. [20]

    Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance

    Yu Liu, Longhan Feng, Qi Jia, Zezheng Liu, and Zi-Huang Cao. Two teachers are better than one: Semi-supervised el- liptical object detection by dual-teacher collaborative guid- ance. InProceedings of the 32nd ACM International Con- ference on Multimedia, pages 6355–6363, 2024. 1

  21. [21]

    Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity

    Chen-Yi Lu, Kasra Derakhshandeh, and Somali Chaterji. Improving semi-supervised semantic segmentation with sliced-wasserstein feature alignment and uniformity. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 20233–20243, 2025. 2

  22. [22]

    An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018

    Erin McCann, Liling Li, Kevin Pangle, Nicholas Johnson, and Jesse Eickholt. An underwater observation dataset for fish classification and fishery assessment.Scientific data, 5 (1):1–8, 2018. 1

  23. [23]

    Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023

    Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, and Wonjun Hwang. Switching temporary teachers for semi-supervised semantic segmentation.Advances in Neural Information Processing Systems, 36:40367–40380, 2023. 2, 7

  24. [24]

    Classmix: Segmentation-based data aug- mentation for semi-supervised learning

    Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, and Lennart Svensson. Classmix: Segmentation-based data aug- mentation for semi-supervised learning. InProceedings of the IEEE/CVF winter conference on applications of com- puter vision, pages 1369–1378, 2021. 2 9

  25. [25]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 6

  26. [26]

    Semi- supervised semantic segmentation with cross-consistency training

    Yassine Ouali, C ´eline Hudelot, and Myriam Tami. Semi- supervised semantic segmentation with cross-consistency training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12674– 12684, 2020. 2

  27. [27]

    Vi- sion transformers for dense prediction

    Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 6

  28. [28]

    The marine de- bris dataset for forward-looking sonar semantic segmenta- tion

    Deepak Singh and Matias Valdenegro-Toro. The marine de- bris dataset for forward-looking sonar semantic segmenta- tion. InProceedings of the ieee/cvf international conference on computer vision, pages 3741–3749, 2021. 2, 6

  29. [29]

    Fixmatch: Simplifying semi-supervised learning with consistency and confidence

    Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 2

  30. [30]

    Humble teachers teach better students for semi-supervised object detection

    Yihe Tang, Weifeng Chen, Yijun Luo, and Yuting Zhang. Humble teachers teach better students for semi-supervised object detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3132–3141, 2021. 6

  31. [31]

    Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

    Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 2

  32. [32]

    Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection

    Chenxu Wang, Chunyan Xu, Xiang Li, YuXuan Li, Xu Guo, Ziqi Gu, and Zhen Cui. Multi-clue consistency learning to bridge gaps between general and oriented object in semi- supervised detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7582–7590, 2025. 2

  33. [33]

    Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025

    Mingjie Wang, Weiling Chen, Fengquan Lan, Naveed Ur Rehman Junejo, and Tiesong Zhao. Sonar image super- resolution based on structure-texture dual preservation.IEEE Transactions on Geoscience and Remote Sensing, 2025. 1

  34. [34]

    Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection

    Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, and Wayne Zhang. Consistent-teacher: Towards reducing incon- sistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3240–3249, 2023. 6

  35. [35]

    A dataset with multi- beam forward-looking sonar for underwater object detection

    Kaibing Xie, Jian Yang, and Kang Qiu. A dataset with multi- beam forward-looking sonar for underwater object detection. Scientific Data, 9(1):739, 2022. 1

  36. [36]

    End-to- end semi-supervised object detection with soft teacher

    Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. End-to- end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3060–3069, 2021. 2

  37. [37]

    Revisiting weak-to-strong consistency in semi-supervised semantic segmentation

    Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, and Yinghuan Shi. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7236–7246, 2023. 2, 7

  38. [38]

    Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Lihe Yang, Zhen Zhao, and Hengshuang Zhao. Unimatch v2: Pushing the limit of semi-supervised semantic segmen- tation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 7

  39. [39]

    Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025

    Xin Yuan, Jiapeng Li, Weiwei Wang, Xiaoteng Zhou, Ning Li, and Changli Yu. Improved yolov9 for underwater side scan sonar target detection.The Computer Journal, 68(6): 591–604, 2025. 1

  40. [40]

    Cutmix: Regu- larization strategy to train strong classifiers with localizable features

    Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 6023–6032, 2019. 2

  41. [41]

    Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019

    Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fi- dler, Adela Barriuso, and Antonio Torralba. Semantic under- standing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019. 2

  42. [42]

    Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025

    Jifeng Zhu, Wenyu Cai, Meiyan Zhang, Yuxin Lin, and Mingming Liu. Saliency detection for underwater moving object with sonar based on motion estimation and multi- trajectory analysis.Pattern Recognition, 158:111043, 2025. 1

  43. [43]

    Pseudoseg: Designing pseudo labels for semantic segmentation

    Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, and Tomas Pfister. Pseudoseg: Designing pseudo labels for semantic segmentation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenRe- view.net, 2021. 2 10