pith. machine review for the scientific record. sign in

arxiv: 2605.13674 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.AI

Recognition: unknown

Weakly Supervised Segmentation as Semantic-Based Regularization

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:24 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords weakly supervised semantic segmentationdifferentiable fuzzy logicpseudo-labelsSegment Anything Modelsemantic regularizationPascal VOC 2012optic disc segmentation
0
0 comments X

The pith

Differentiable fuzzy logic encodes weak annotations as continuous constraints to fine-tune SAM and generate higher-quality pseudo-labels for segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that weak annotations such as bounding boxes or image tags and domain priors can be expressed uniformly as logical constraints. These constraints are implemented via differentiable fuzzy logic to guide fine-tuning of the Segment Anything Model. The resulting model produces improved pseudo-labels that train a second-stage prompt-free segmentation network. Experiments on Pascal VOC 2012 and the REFUGE2 optic disc dataset report state-of-the-art accuracy that often surpasses fully supervised baselines. The method supplies a neurosymbolic route to incorporate heterogeneous supervision without heuristic prompting.

Core claim

Unifying weak annotations and domain priors as continuous logical constraints through differentiable fuzzy logic allows logic-guided fine-tuning of SAM, which yields higher-quality pseudo-labels and enables a second-stage prompt-free model to reach segmentation accuracy that frequently exceeds densely supervised performance on Pascal VOC 2012 and REFUGE2.

What carries the argument

Differentiable fuzzy logic that translates heterogeneous weak annotations and priors into continuous logical constraints for fine-tuning SAM to improve pseudo-label generation.

If this is right

  • Higher-quality pseudo-labels are obtained from the logic-refined SAM.
  • State-of-the-art segmentation accuracy is reached on Pascal VOC 2012.
  • Performance often exceeds densely supervised baselines on the REFUGE2 optic disc and cup task.
  • Heterogeneous weak labels and domain priors are incorporated in a single unified framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constraint mechanism could be applied to other foundation models beyond SAM for broader vision tasks.
  • Domains with strong prior knowledge such as medical imaging stand to gain most from reduced reliance on dense labels.
  • Integration with iterative pseudo-label refinement loops could further lower annotation budgets.

Load-bearing premise

Differentiable fuzzy logic can translate heterogeneous weak annotations and domain priors into continuous constraints without introducing systematic biases or requiring dataset-specific tuning.

What would settle it

On Pascal VOC 2012, if logic-guided fine-tuning of SAM produces pseudo-labels whose quality is no higher than standard heuristic prompting, the accuracy gains and claim of superiority over dense supervision would be falsified.

Figures

Figures reproduced from arXiv: 2605.13674 by Andrei-Bogdan Florea, Jaron Maene, Stefano Colamonaco.

Figure 1
Figure 1. Figure 1: Overview of the proposed neurosymbolic weakly supervised segmentation framework. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison of our two-stage weakly supervised segmentation pipeline on [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of selected samples from the Pascal VOC 2012 training set. The [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Additional comparison of our two-stage weakly supervised segmentation pipeline on [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional comparison of our two-stage weakly supervised segmentation pipeline on [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of our two-stage weakly supervised segmentation pipeline on the REFUGE2 [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
read the original abstract

Weakly supervised semantic segmentation (WSSS) trains dense pixel-level segmentation models from partial or coarse annotations such as bounding boxes, scribbles, or image-level tags. While recent work leverages foundation models such as the Segment Anything Model (SAM) to generate pseudo-labels, these approaches typically depend on heuristic prompt choices and offer limited ways to incorporate prior knowledge or heterogeneous labels. We address this gap by taking a neurosymbolic perspective: integrating differentiable fuzzy logic with deep segmentation models. Weak annotations and domain-specific priors are unified as continuous logical constraints that fine-tune SAM under weak supervision. The refined foundation model then produces improved pseudo-labels, from which we train a second-stage prompt-free segmentation model. Experiments on Pascal VOC 2012 and the REFUGE2 optic disc/cup segmentation dataset show that our logic-guided fine-tuning yields higher-quality pseudo-labels, leading to state-of-the-art segmentation accuracy that often exceeds densely supervised baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes integrating differentiable fuzzy logic with the Segment Anything Model (SAM) to encode weak annotations (bounding boxes, scribbles, image tags) and domain priors as continuous logical constraints. These constraints guide fine-tuning of SAM to produce higher-quality pseudo-labels, which then train a second-stage prompt-free segmentation network. Experiments on Pascal VOC 2012 and REFUGE2 are claimed to yield state-of-the-art results that often exceed densely supervised baselines.

Significance. If the empirical results hold after proper validation, the work would demonstrate a principled neurosymbolic route for unifying heterogeneous weak supervision in foundation-model-based segmentation, moving beyond heuristic prompting. The approach could reduce annotation costs while preserving or improving accuracy, with potential for broader adoption in medical imaging and scene understanding where priors are available.

major comments (2)
  1. [Experiments] Experiments section: the central claim of SOTA accuracy and outperformance of dense baselines on Pascal VOC 2012 and REFUGE2 is asserted without any reported mIoU values, ablation tables, statistical significance tests, or comparison numbers against the cited dense baselines, rendering the empirical contribution unevaluable.
  2. [Method] Method section on fuzzy logic constraints: the paper relies on continuous relaxations (t-norms, implications) of logical operations over bounding boxes, scribbles, and priors, yet provides no analysis or controls for known gradient pathologies and over-smoothing effects that could systematically bias pseudo-label quality toward high-confidence regions, leaving open whether reported gains stem from the neurosymbolic component or from SAM fine-tuning alone.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'state-of-the-art segmentation accuracy' should be accompanied by the specific metric (e.g., mIoU) and the exact baseline values being exceeded.
  2. Notation: ensure consistent use of symbols for fuzzy operators (AND/OR/NOT) across equations and text to avoid ambiguity in the constraint definitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central claim of SOTA accuracy and outperformance of dense baselines on Pascal VOC 2012 and REFUGE2 is asserted without any reported mIoU values, ablation tables, statistical significance tests, or comparison numbers against the cited dense baselines, rendering the empirical contribution unevaluable.

    Authors: We agree that the current manuscript version does not present explicit mIoU numbers, ablation tables, or statistical tests in the experiments section, which limits evaluability of the claims. This was an oversight in the submitted draft. In the revised version we will add comprehensive tables reporting mIoU scores for our method versus all cited baselines (including densely supervised ones) on both Pascal VOC 2012 and REFUGE2, plus ablation studies and statistical significance results. revision: yes

  2. Referee: [Method] Method section on fuzzy logic constraints: the paper relies on continuous relaxations (t-norms, implications) of logical operations over bounding boxes, scribbles, and priors, yet provides no analysis or controls for known gradient pathologies and over-smoothing effects that could systematically bias pseudo-label quality toward high-confidence regions, leaving open whether reported gains stem from the neurosymbolic component or from SAM fine-tuning alone.

    Authors: We acknowledge that the method section lacks explicit analysis of gradient pathologies or over-smoothing in the chosen continuous relaxations. In the revision we will expand the method to discuss the specific t-norms and implications used, include analysis of their gradient behavior, and add controls that compare logic-guided SAM fine-tuning against plain SAM fine-tuning without the fuzzy constraints to isolate the neurosymbolic contribution. revision: yes

Circularity Check

0 steps flagged

Empirical fine-tuning loop on external benchmarks with no self-referential reductions

full rationale

The paper describes a two-stage procedure: (1) encode weak labels and priors as differentiable fuzzy constraints to fine-tune SAM, then (2) generate pseudo-labels to train a prompt-free segmentation model. All reported gains are measured on held-out external datasets (Pascal VOC 2012, REFUGE2) against dense-supervision baselines. No equation equates a reported accuracy or pseudo-label quality metric to a parameter fitted on the same data; no uniqueness theorem or ansatz is imported via self-citation to force the architecture; the fuzzy-logic encoding is presented as an explicit modeling choice whose effect is measured rather than assumed. The only minor self-citation risk is the neurosymbolic framing itself, which is not load-bearing for the numerical claims. Hence a low but non-zero circularity score reflecting normal self-reference without derivation collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit equations or parameter lists; the method implicitly relies on the differentiability of fuzzy logic operators and the transferability of SAM weights, but no free parameters or invented entities are named.

pith-pipeline@v0.9.0 · 5460 in / 1092 out tokens · 52888 ms · 2026-05-14T20:24:40.043271+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 11 canonical work pages · 2 internal anchors

  1. [1]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Ahn, J., Kwak, S.: Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4981–4990 (2018)

  2. [2]

    2112.00390 , archivePrefix=

    Amit, T., Nachmani, E., Shaharabany, T., Wolf, L.: SegDiff: Image segmentation with diffusion probabilistic models. CoRRabs/2112.00390(2021)

  3. [3]

    Artificial Intelligence303, 103649 (2022)

    Badreddine, S., Garcez, A.d., Serafini, L., Spranger, M.: Logic tensor networks. Artificial Intelligence303, 103649 (2022)

  4. [4]

    In: 2025 International Joint Conference on Neural Networks (IJCNN)

    Bergamin, L., Dimitri, G.M., Aiolli, F.: Integrating background knowledge in medical semantic segmentation with logic tensor networks. In: 2025 International Joint Conference on Neural Networks (IJCNN). pp. 1–7. IEEE (2025)

  5. [5]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers make strong encoders for medical image segmentation. CoRR abs/2102.04306(2021)

  6. [6]

    IEEE Transactions on Pattern Analysis and Machine Intelligence40(4), 834–848 (2018)

    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence40(4), 834–848 (2018)

  7. [7]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chen, Z., Tian, Z., Zhu, J., Li, C., Du, S.: C-cam: Causal cam for weakly supervised semantic segmentation on medical image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11676–11685 (2022)

  8. [8]

    ACM Computing Surveys57, 1 – 29 (2023)

    Chen, Z., Sun, Q.: Weakly-supervised semantic segmentation with image-level labels: From traditional models to foundation models. ACM Computing Surveys57, 1 – 29 (2023)

  9. [9]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022) 13

  10. [10]

    CoRRabs/2308.16184 (2023)

    Cheng, J., Ye, J., Deng, Z., Chen, J., Li, T., Wang, H., Su, Y., Huang, Z., Chen, J., Jiang, L., Sun, H., He, J., Zhang, S., Zhu, M., Qiao, Y.: SAM-Med2D. CoRRabs/2308.16184 (2023)

  11. [11]

    CoRRabs/2506.16129(2025)

    Colamonaco, S., Debot, D., Marra, G.: Neurosymbolic object-centric learning with distant supervision. CoRRabs/2506.16129(2025)

  12. [12]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Deng, G., Zou, K., Ren, K., Wang, M., Yuan, X., Ying, S., Fu, H.: SAM-U: Multi- box prompts triggered uncertainty estimation for reliable SAM in medical image. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 368–377. Springer (2023)

  13. [13]

    arXiv preprint arXiv:2508.13697 (2025)

    Derkinderen, V., Manhaeve, R., Adriaensen, R., Van Praet, L., De Smet, L., Marra, G., De Raedt, L.: The deeplog neurosymbolic machine. arXiv preprint arXiv:2508.13697 (2025)

  14. [14]

    Artificial Intelligence244, 143–165 (2017)

    Diligenti, M., Gori, M., Sacca, C.: Semantic-based regularization for learning and inference. Artificial Intelligence244, 143–165 (2017)

  15. [15]

    In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    Diligenti, M., Gori, M., Scoca, V.: Learning efficiently in semantic based regularization. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 33–46. Springer (2016)

  16. [16]

    International journal of computer vision88(2), 303–338 (2010)

    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. International journal of computer vision88(2), 303–338 (2010)

  17. [17]

    Refuge2 challenge: A treasure trove for multi-dimension analysis and evaluation in glaucoma screening.arXiv preprint arXiv:2202.08994, 2022

    Fang, H., Li, F., Wu, J., Fu, H., Sun, X., Son, J., Yu, S., Zhang, M., Yuan, C., Bian, C., et al.: REFUGE2 challenge: A treasure trove for multi-dimension analysis and evaluation in glaucoma screening. arXiv preprint arXiv:2202.08994 (2022)

  18. [18]

    In: International Conference on Machine Learning

    Fischer, M., Balunovic, M., Drachsler-Cohen, D., Gehr, T., Zhang, C., Vechev, M.: DL2: training and querying neural networks with logic. In: International Conference on Machine Learning. pp. 1931–1941. PMLR (2019)

  19. [19]

    In: CVPR

    Gupta, T., Kembhavi, A.: Visual programming: Compositional visual reasoning without training. In: CVPR. pp. 14953–14962. IEEE (2023)

  20. [20]

    Nature methods 18(2), 203–211 (2021)

    Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self- configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)

  21. [21]

    In: 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25,

    Ji, Z., Veksler, O.: Weakly supervised semantic segmentation: From box to tag and back. In: 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25,

  22. [22]

    p. 385. BMVA Press (2021)

  23. [23]

    arXiv preprint arXiv:2305.01275 (2023)

    Jiang, P.T., Yang, Y.: Segment anything is a good pseudo-label generator for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.01275 (2023)

  24. [24]

    In: European Conference on Computer Vision

    Jo, S., Pan, F., Yu, I.J., Kim, K.: Dhr: Dual features-driven hierarchical rebalancing in inter-and intra-class regions for weakly-supervised semantic segmentation. In: European Conference on Computer Vision. pp. 231–248. Springer (2024)

  25. [25]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023) 14

  26. [26]

    In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K

    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. vol. 24. Curran Associates, Inc. (2011)

  27. [27]

    In: Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A

    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems. vol. 23. Curran Associates, Inc. (2010)

  28. [28]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liang, Z., Wang, T., Zhang, X., Sun, J., Shen, J.: Tree energy loss: Towards sparsely annotated semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16907–16916 (2022)

  29. [29]

    In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

    Lin, C.S., Wang, C.Y., Wang, Y.C.F., Chen, M.H.: Semantic prompt learning for weakly- supervised semantic segmentation. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 8764–8774. IEEE (2025)

  30. [30]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Lin, D., Dai, J., Jia, J., He, K., Sun, J.: ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3159–3167 (2016)

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Lin, Y., Chen, M., Wang, W., Wu, B., Li, K., Lin, B., Liu, H., He, X.: Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15305–15314 (2023)

  32. [32]

    Nature communications15(1), 654 (2024)

    Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature communications15(1), 654 (2024)

  33. [33]

    In: Proceedings of the 41st International Conference on Machine Learning

    Maene, J., Derkinderen, V., De Raedt, L.: On the hardness of probabilistic neurosymbolic learning. In: Proceedings of the 41st International Conference on Machine Learning. pp. 34203–34218 (2024)

  34. [34]

    DeepLog: A Software Framework for Modular Neurosymbolic AI

    Manhaeve, R., Colamonaco, S., Derkinderen, V., Adriaensen, R., Praet, L.V., Raedt, L.D., Marra, G.: Deeplog: A software framework for modular neurosymbolic ai. arXiv preprint arXiv:2605.10279 (2026)

  35. [35]

    Advances in neural information processing systems 31(2018)

    Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., De Raedt, L.: DeepProbLog: Neural probabilistic logic programming. Advances in neural information processing systems 31(2018)

  36. [36]

    In: International Conference on Artificial Neural Networks

    Manigrasso, F., Miro, F.D., Morra, L., Lamberti, F.: Faster-ltn: a neuro-symbolic, end- to-end object detection architecture. In: International Conference on Artificial Neural Networks. pp. 40–52. Springer (2021)

  37. [37]

    Artificial Intelligence328, 104062 (2024)

    Marra, G., Dumančić, S., Manhaeve, R., De Raedt, L.: From statistical relational to neurosymbolic artificial intelligence: A survey. Artificial Intelligence328, 104062 (2024)

  38. [38]

    Medical Image Analysis89, 102918 (2023)

    Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N., Zhang, Y.: Segment anything model for medical image analysis: an experimental study. Medical Image Analysis89, 102918 (2023)

  39. [39]

    Novák, V., Perfilieva, I., Mockor, J.: Mathematical principles of fuzzy logic, vol. 517. Springer Science & Business Media (1999)

  40. [40]

    International Journal of Online & Biomedical Engineering 18(10) (2022) 15

    Ouassit, Y., Ardchir, S., Yassine El Ghoumari, M., Azouazi, M.: A brief survey on weakly supervised semantic segmentation. International Journal of Online & Biomedical Engineering 18(10) (2022) 15

  41. [41]

    IEEE transactions on medical imaging36(2), 674–683 (2016)

    Rajchl, M., Lee, M.C., Oktay, O., Kamnitsas, K., Passerat-Palmbach, J., Bai, W., Damodaram, M., Rutherford, M.A., Hajnal, J.V., Kainz, B., et al.: DeepCut: Object segmentation from bounding box annotations using convolutional neural networks. IEEE transactions on medical imaging36(2), 674–683 (2016)

  42. [42]

    In: Proceedings ninth IEEE international conference on computer vision

    Ren, Malik: Learning a classification model for segmentation. In: Proceedings ninth IEEE international conference on computer vision. pp. 10–17. IEEE (2003)

  43. [43]

    Advances in Neural Information Processing Systems37, 52266–52295 (2024)

    Shindo, H., Brack, M., Sudhakaran, G., Dhami, D.S., Schramowski, P., Kersting, K.: Deisam: Segment anything with deictic prompting. Advances in Neural Information Processing Systems37, 52266–52295 (2024)

  44. [44]

    In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

    Shukla, V., Zeng, Z., Ahmed, K., Van den Broeck, G.: A unified approach to count-based weakly supervised learning. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 38709–38722. Curran Associates, Inc. (2023)

  45. [45]

    arXiv preprint arXiv:2305.01586 (2023)

    Sun, W., Liu, Z., Zhang, Y., Zhong, Y., Barnes, N.: An alternative to WSSS? an empirical study of the segment anything model (SAM) on weakly-supervised semantic segmentation problems. arXiv preprint arXiv:2305.01586 (2023)

  46. [46]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Tang, F., Xu, Z., Qu, Z., Feng, W., Jiang, X., Ge, Z.: Hunting attributes: Context prototype-aware learning for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3324–3334 (2024)

  47. [47]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Tian, Z., Shen, C., Wang, X., Chen, H.: BoxInst: High-performance instance segmentation with box annotations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5443–5452 (2021)

  48. [48]

    siam Journal on Computing8(3), 410–421 (1979)

    Valiant, L.G.: The complexity of enumeration and reliability problems. siam Journal on Computing8(3), 410–421 (1979)

  49. [49]

    Artificial Intelligence302, 103602 (2022)

    van Krieken, E., Acar, E., van Harmelen, F.: Analyzing differentiable fuzzy logic operators. Artificial Intelligence302, 103602 (2022)

  50. [50]

    In: MICCAI (1)

    Wang, S., Yu, L., Li, K., Yang, X., Fu, C., Heng, P.: Boundary and entropy-driven adversarial learning for fundus image segmentation. In: MICCAI (1). Lecture Notes in Computer Science, vol. 11764, pp. 102–110. Springer (2019)

  51. [51]

    Medical image analysis 102, 103547 (2025)

    Wu, J., Wang, Z., Hong, M., Ji, W., Fu, H., Xu, Y., Xu, M., Jin, Y.: Medical sam adapter: Adapting segment anything model for medical image segmentation. Medical image analysis 102, 103547 (2025)

  52. [52]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Wu, L., Zhong, Z., Fang, L., He, X., Liu, Q., Ma, J., Chen, H.: Sparsely annotated semantic segmentation with adaptive gaussian mixtures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15454–15464 (2023)

  53. [53]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Xu, J., Schwing, A.G., Urtasun, R.: Learning to segment under various forms of weak supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3781–3790 (2015)

  54. [54]

    In: Dy, J., Krause, A

    Xu, J., Zhang, Z., Friedman, T., Liang, Y., Van den Broeck, G.: A semantic loss function for deep learning with symbolic knowledge. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5502–5511. PMLR (10–15 Jul 2018) 16

  55. [55]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Xu, Z., Tang, F., Chen, Z., Su, Y., Zhao, Z., Zhang, G., Su, J., Ge, Z.: Toward modality gap: Vision prototype learning for weakly-supervised semantic segmentation with clip. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 9023–9031 (2025)

  56. [56]

    In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

    Yang, X., Gong, X.: Foundation model assisted weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 523–532 (2024)

  57. [57]

    In: European Conference on Computer Vision

    Yang, X., Rahmani, H., Black, S., Williams, B.M.: Weakly supervised co-training with swapping assignments for semantic segmentation. In: European Conference on Computer Vision. pp. 459–478. Springer (2024)

  58. [58]

    Yu, S., Xiao, D., Frost, S., Kanagasingam, Y.: Robust optic disc and cup segmentation with deep learning for glaucoma detection. Comput. Medical Imaging Graph.74, 61–71 (2019)

  59. [59]

    arXiv preprint arXiv:2304.13785 (2023)

    Zhang, K., Liu, D.: Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785 (2023)

  60. [60]

    Self-Training

    Zhu, L., Li, Y., Fang, J., Liu, Y., Xin, H., Liu, W., Wang, X.: WeakTr: Exploring plain vision transformer for weakly-supervised semantic segmentation. arXiv preprint arXiv:2304.01184 (2023) 17 A. Additional Explanations and Validations A.1. Conditional Independence Assumption We approximatep(ϕ|x)by introducing conditional independence assumptions. More s...