pith. sign in

arxiv: 2303.10894 · v3 · submitted 2023-03-20 · 💻 cs.CV

M²SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation

Pith reviewed 2026-05-24 09:13 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical image segmentationsubtraction networkmulti-scale feature fusionU-shaped architecturecolonoscopy segmentationCT segmentationultrasound segmentationOCT segmentation
0
0 comments X

The pith

Subtraction of adjacent-level features in a U-shaped network reduces redundancy and improves medical image segmentation accuracy across modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that element-wise addition and concatenation in decoder fusion create redundant information that weakens complementarity between encoder levels, leading to poor localization and blurred lesion edges. It replaces those operations with a subtraction unit that produces difference features, then scales the idea to intra-layer multi-scale subtraction and inter-layer pyramidal aggregation. A training-free LossNet supervises task-aware features from bottom to top layers. The resulting M²SNet is evaluated on eleven datasets spanning colonoscopy, ultrasound, CT, and OCT segmentation tasks. A reader would care because the approach claims stronger boundary definition and localization for diagnosis without extra modules or training overhead.

Core claim

M²SNet builds a basic subtraction unit to compute difference features between adjacent encoder levels, expands it to an intra-layer multi-scale version that supplies both pixel-level and structure-level differences to the decoder, and arranges these units pyramidally across levels for inter-layer multi-scale aggregation; a training-free LossNet then supervises the resulting features so that the network captures detailed and structural cues simultaneously.

What carries the argument

The subtraction unit (SU) that produces difference features between adjacent levels, extended to intra-layer multi-scale SU and inter-layer pyramidal multi-scale SUs.

If this is right

  • Lesion boundaries become sharper because redundant signals are removed at each fusion step.
  • The same architecture works across four modalities without modality-specific redesign.
  • No auxiliary fusion blocks or attention modules are required to reach competitive accuracy.
  • Supervision from bottom to top layers occurs without training an extra network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The subtraction principle could be inserted into existing U-Net variants with minimal code change to test immediate gains.
  • Performance on non-medical dense-prediction tasks such as semantic segmentation of natural scenes remains untested but follows directly from the mechanism.
  • The multi-scale subtraction might interact with specific boundary-sensitive loss terms in ways not explored here.
  • Extension to volumetric 3D data would require only replacing 2D convolutions while preserving the subtraction logic.

Load-bearing premise

Subtracting adjacent-level features yields more complementary information than addition or concatenation without discarding signals needed for the segmentation task.

What would settle it

If M²SNet does not match or exceed the reported metrics of leading addition- or concatenation-based methods when re-evaluated on the same eleven datasets under identical protocols, the claimed advantage of subtraction would not hold.

Figures

Figures reproduced from arXiv: 2303.10894 by Feng Tian, Hongpeng Jia, Huchuan Lu, Lihe Zhang, Long Lv, Weibing Sun, Xiaoqi Zhao, Youwei Pang.

Figure 7
Figure 7. Figure 7: In Tab. VIII, we thoroughly compare both the efficiency [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Accurate medical image segmentation is critical for early medical diagnosis. Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder. However, both the two operations easily generate plenty of redundant information, which will weaken the complementarity between different level features, resulting in inaccurate localization and blurred edges of lesions. To address this challenge, we propose a general multi-scale in multi-scale subtraction network (M$^{2}$SNet) to finish diverse segmentation from medical image. Specifically, we first design a basic subtraction unit (SU) to produce the difference features between adjacent levels in encoder. Next, we expand the single-scale SU to the intra-layer multi-scale SU, which can provide the decoder with both pixel-level and structure-level difference information. Then, we pyramidally equip the multi-scale SUs at different levels with varying receptive fields, thereby achieving the inter-layer multi-scale feature aggregation and obtaining rich multi-scale difference information. In addition, we build a training-free network ``LossNet'' to comprehensively supervise the task-aware features from bottom layer to top layer, which drives our multi-scale subtraction network to capture the detailed and structural cues simultaneously. Without bells and whistles, our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks of diverse image modalities, including color colonoscopy imaging, ultrasound imaging, computed tomography (CT), and optical coherence tomography (OCT). The source code can be available at https://github.com/Xiaoqi-Zhao-DLUT/MSNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes M²SNet, a U-shaped architecture for medical image segmentation that replaces standard addition/concatenation feature fusion with a basic subtraction unit (SU) between adjacent encoder levels. This SU is extended to intra-layer multi-scale and inter-layer pyramidal multi-scale variants to capture pixel- and structure-level difference information, supplemented by a training-free LossNet for multi-level supervision. The central claim is that this subtraction-based design reduces redundancy and improves complementarity, yielding favorable performance against most SOTA methods on 11 datasets spanning colonoscopy, ultrasound, CT, and OCT modalities without additional components.

Significance. If the subtraction mechanism is shown to be the source of gains, the approach offers a lightweight, generalizable alternative to conventional fusion operations that could improve localization and boundary accuracy across modalities. The breadth of evaluation (11 datasets, 4 tasks) would support claims of robustness if ablations confirm attribution to the core design choice.

major comments (2)
  1. [§3.1] §3.1 (Basic Subtraction Unit): The claim that subtraction yields more complementary information than addition or concatenation is introduced as the motivating premise but is never isolated via controlled ablation (e.g., identical backbone with SU vs. add vs. concat). Without this, performance gains on the 11 datasets cannot be attributed specifically to the SU rather than the multi-scale extensions or LossNet.
  2. [§4] §4 (Experiments): No ablation tables or figures directly compare the subtraction operation against addition/concatenation baselines while holding all other components fixed; reported SOTA comparisons therefore leave open the possibility that gains arise from architecture scale or supervision rather than the difference-feature hypothesis.
minor comments (2)
  1. [Abstract] Abstract: Quantitative tables, statistical significance, and per-dataset metrics are absent, making the headline claim difficult to evaluate from the summary alone.
  2. [§3.2] Notation: The distinction between intra-layer and inter-layer multi-scale SUs is described in prose but would benefit from an explicit diagram or equation set showing receptive-field differences.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. The concerns about isolating the subtraction unit's contribution are valid and will be addressed through added experiments in revision.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (Basic Subtraction Unit): The claim that subtraction yields more complementary information than addition or concatenation is introduced as the motivating premise but is never isolated via controlled ablation (e.g., identical backbone with SU vs. add vs. concat). Without this, performance gains on the 11 datasets cannot be attributed specifically to the SU rather than the multi-scale extensions or LossNet.

    Authors: We acknowledge that the motivating premise for the basic subtraction unit would be strengthened by a controlled ablation isolating SU against addition and concatenation on an identical backbone. The current manuscript presents the SU as the core operation before extending it to multi-scale variants and adding LossNet, with overall results compared to SOTA. To directly address attribution, we will add a dedicated ablation table in the revised §3 and §4 that fixes the encoder-decoder backbone and varies only the fusion operation (SU vs. add vs. concat) on representative datasets from the evaluation suite. revision: yes

  2. Referee: [§4] §4 (Experiments): No ablation tables or figures directly compare the subtraction operation against addition/concatenation baselines while holding all other components fixed; reported SOTA comparisons therefore leave open the possibility that gains arise from architecture scale or supervision rather than the difference-feature hypothesis.

    Authors: We agree that the experimental section lacks the specific controlled comparison requested. The reported results demonstrate favorable performance of the full M²SNet, but do not hold all other elements fixed while swapping only the fusion operator. In the revision we will insert new ablation experiments in §4 that perform exactly this isolation, allowing clearer attribution to the difference-feature design rather than scale or the multi-level supervision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical performance on external public datasets

full rationale

The paper introduces an architectural design (subtraction units and multi-scale extensions) motivated by an assumption about feature complementarity, then reports measured performance on eleven external public datasets across four modalities. No equations, parameters, or predictions are shown to reduce to their own inputs by construction. The central claims are empirical comparisons rather than first-principles derivations. No load-bearing self-citations, fitted-input-as-prediction, or self-definitional steps appear in the provided text. This matches the default case of a self-contained empirical ML paper whose results are falsifiable against held-out benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the untested hypothesis that subtraction yields less redundant features than addition/concatenation and on the empirical effectiveness of the newly designed SU and LossNet components.

axioms (2)
  • domain assumption U-shaped encoder-decoder is a suitable base architecture for the target segmentation tasks
    Invoked by building the network on top of this structure
  • ad hoc to paper Difference features between adjacent encoder levels are more complementary than summed or concatenated features
    Core motivation stated in the abstract for introducing the subtraction unit
invented entities (2)
  • Subtraction Unit (SU) no independent evidence
    purpose: Produce difference features between adjacent encoder levels at multiple scales
    New architectural block introduced to replace addition/concatenation
  • LossNet no independent evidence
    purpose: Provide comprehensive supervision of task-aware features from bottom to top layer without training
    New supervision module invented for the multi-scale subtraction network

pith-pipeline@v0.9.0 · 5841 in / 1357 out tokens · 21828 ms · 2026-05-24T09:13:21.508099+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sharpening Lightweight Models for Generalized Polyp Segmentation: A Boundary Guided Distillation from Foundation Models

    cs.CV 2026-04 unverdicted novelty 6.0

    LiteBounD distills complementary semantic and boundary priors from multiple vision foundation models into compact segmentation backbones via dual-path and frequency-aware mechanisms, improving performance on both seen...

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 2117–2125

  2. [2]

    U-Net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” in MICCAI, 2015, pp. 234– 241

  3. [3]

    Resunet++: An advanced architecture for medical image segmentation,

    D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in IEEE ISM , 2019, pp. 225–2255

  4. [4]

    Pranet: Parallel reverse attention network for polyp segmentation,

    D.-P. Fan, G.-P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Pranet: Parallel reverse attention network for polyp segmentation,” in MICCAI, 2020, pp. 263–273

  5. [5]

    A bi-directional message passing model for salient object detection,

    L. Zhang, J. Dai, H. Lu, Y . He, and G. Wang, “A bi-directional message passing model for salient object detection,” in CVPR, 2018, pp. 1741– 1750

  6. [6]

    Suppress and balance: A simple gated network for salient object detection,

    X. Zhao, Y . Pang, L. Zhang, H. Lu, and L. Zhang, “Suppress and balance: A simple gated network for salient object detection,” in ECCV, 2020, pp. 35–51

  7. [7]

    Utnet: a hybrid transformer architecture for medical image segmentation,

    Y . Gao, M. Zhou, and D. N. Metaxas, “Utnet: a hybrid transformer architecture for medical image segmentation,” in MICCAI, 2021, pp. 61–71

  8. [8]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306 , 2021

  9. [9]

    Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

    Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE TMI , vol. 39, no. 6, pp. 1856–1867, 2019

  10. [10]

    U2-net: Going deeper with nested u-structure for salient object detection,

    X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jager- sand, “U2-net: Going deeper with nested u-structure for salient object detection,” Pattern Recognition, vol. 106, p. 107404, 2020

  11. [11]

    Multi-scale interactive network for salient object detection,

    Y . Pang, X. Zhao, L. Zhang, and H. Lu, “Multi-scale interactive network for salient object detection,” in CVPR, 2020, pp. 9413–9422

  12. [12]

    A single stream network for robust and real-time rgb-d salient object detection,

    X. Zhao, L. Zhang, Y . Pang, H. Lu, and L. Zhang, “A single stream network for robust and real-time rgb-d salient object detection,” in ECCV, 2020, pp. 646–662

  13. [13]

    R3net: Recurrent residual refinement network for saliency detection,

    Z. Deng, X. Hu, L. Zhu, X. Xu, J. Qin, G. Han, and P.-A. Heng, “R3net: Recurrent residual refinement network for saliency detection,” in IJCAI, 2018, pp. 684–690

  14. [14]

    Accurate rgb-d salient object detection via collaborative learning,

    W. Ji, J. Li, M. Zhang, Y . Piao, and H. Lu, “Accurate rgb-d salient object detection via collaborative learning,” in ECCV, 2020, pp. 52–69

  15. [15]

    Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection,

    K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao, “Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection,” in CVPR, 2020, pp. 3052–3062

  16. [16]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE TPAMI, vol. 40, pp. 834–848, 2017

  17. [17]

    Denseaspp for semantic segmentation in street scenes,

    M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, “Denseaspp for semantic segmentation in street scenes,” in CVPR, 2018, pp. 3684–3692

  18. [18]

    Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders,

    J. Zhang, D.-P. Fan, Y . Dai, S. Anwar, F. S. Saleh, T. Zhang, and N. Barnes, “Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders,” in CVPR, 2020, pp. 8582–8591

  19. [19]

    Camouflaged object segmentation with distraction mining,

    H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining,” in CVPR, 2021, pp. 8772– 8781

  20. [20]

    Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection,

    L. Zhu, Z. Deng, X. Hu, C.-W. Fu, X. Xu, J. Qin, and P.-A. Heng, “Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection,” in ECCV, 2018, pp. 121–136

  21. [21]

    Attentive feedback feature pyramid network for shadow detection,

    J. Kim and W. Kim, “Attentive feedback feature pyramid network for shadow detection,” IEEE SPL , vol. 27, pp. 1964–1968, 2020

  22. [22]

    Depth-induced multi-scale recurrent attention network for saliency detection,

    Y . Piao, W. Ji, J. Li, M. Zhang, and H. Lu, “Depth-induced multi-scale recurrent attention network for saliency detection,” in ICCV, 2019, pp. 7254–7263

  23. [23]

    Simultaneously localize, segment and rank the camouflaged objects,

    Y . Lv, J. Zhang, Y . Dai, A. Li, B. Liu, N. Barnes, and D.-P. Fan, “Simultaneously localize, segment and rank the camouflaged objects,” in CVPR, 2021, pp. 11 591–11 601

  24. [24]

    Fully convolutional networks for semantic segmentation,

    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440

  25. [25]

    Multiscale structural similarity for image quality assessment,

    Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers , vol. 2, 2003, pp. 1398– 1402

  26. [26]

    Zoom in and out: A mixed-scale triplet network for camouflaged object detection,

    Y . Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” in CVPR, 2022, pp. 2160–2170. 10 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2023

  27. [27]

    Automatic polyp segmentation via multi- scale subtraction network,

    X. Zhao, L. Zhang, and H. Lu, “Automatic polyp segmentation via multi- scale subtraction network,” in MICCAI, 2021, pp. 120–130

  28. [28]

    Attention U-Net: Learning Where to Look for the Pancreas

    O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y . Hammerla, B. Kainz et al. , “Atten- tion u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018

  29. [29]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, p. 5998–6008

  30. [30]

    Selective feature aggre- gation network with area-boundary constraints for polyp segmentation,

    Y . Fang, C. Chen, Y . Yuan, and K.-y. Tong, “Selective feature aggre- gation network with area-boundary constraints for polyp segmentation,” in MICCAI, 2019, pp. 302–310

  31. [31]

    Progressively normalized self-attention network for video polyp seg- mentation,

    G.-P. Ji, Y .-C. Chou, D.-P. Fan, G. Chen, H. Fu, D. Jha, and L. Shao, “Progressively normalized self-attention network for video polyp seg- mentation,” in MICCAI, 2021, pp. 142–152

  32. [32]

    Anam-net: Anamorphic depth embedding-based lightweight cnn for segmentation of anomalies in covid-19 chest ct images,

    N. Paluru, A. Dayal, H. B. Jenssen, T. Sakinis, L. R. Cenkera- maddi, J. Prakash, and P. K. Yalavarthy, “Anam-net: Anamorphic depth embedding-based lightweight cnn for segmentation of anomalies in covid-19 chest ct images,” IEEE TNNLS , vol. 32, no. 3, pp. 932–946, 2021

  33. [33]

    Inf-net: Automatic covid-19 lung infection segmentation from ct images,

    D.-P. Fan, T. Zhou, G.-P. Ji, Y . Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-net: Automatic covid-19 lung infection segmentation from ct images,” IEEE TMI , vol. 39, no. 8, pp. 2626–2637, 2020

  34. [34]

    Bcs-net: Boundary, context, and semantic for automatic covid-19 lung infection segmentation from ct images,

    R. Cong, H. Yang, Q. Jiang, W. Gao, H. Li, C. Wang, Y . Zhao, and S. Kwong, “Bcs-net: Boundary, context, and semantic for automatic covid-19 lung infection segmentation from ct images,” IEEE TIM , vol. 71, pp. 1–11, 2022

  35. [35]

    Breast mass segmentation in ultrasound with selective kernel u-net convolutional neural network,

    M. Byra, P. Jarosik, A. Szubert, M. Galperin, H. Ojeda-Fournier, L. Olson, M. O’Boyle, C. Comstock, and M. Andre, “Breast mass segmentation in ultrasound with selective kernel u-net convolutional neural network,” BSPC, vol. 61, p. 102027, 2020

  36. [36]

    Nu-net: An unpre- tentious nested u-net for breast tumor segmentation,

    G.-P. Chen, L. Li, Y . Dai, and J.-X. Zhang, “Nu-net: An unpre- tentious nested u-net for breast tumor segmentation,” arXiv preprint arXiv:2209.07193, 2022

  37. [37]

    Adaptive context selection for polyp segmentation,

    R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y . Yu, “Adaptive context selection for polyp segmentation,” in MICCAI, 2020, pp. 253–262

  38. [38]

    Deeply supervised salient object detection with short connections,

    Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in CVPR, 2017, pp. 3203–3212

  39. [39]

    Focal loss for dense object detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” in ICCV, 2017, pp. 2980–2988

  40. [40]

    V-net: Fully convolutional neural networks for volumetric medical image segmentation,

    F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D vision (3DV) . IEEE, 2016, pp. 565–571

  41. [41]

    Tversky loss function for image segmentation using 3d fully convolutional deep networks,

    S. S. M. Salehi, D. Erdogmus, and A. Gholipour, “Tversky loss function for image segmentation using 3d fully convolutional deep networks,” in International workshop on machine learning in medical imaging , 2017, pp. 379–387

  42. [42]

    3d segmentation with exponential logarithmic loss for highly unbalanced object sizes,

    K. C. Wong, M. Moradi, H. Tang, and T. Syeda-Mahmood, “3d segmentation with exponential logarithmic loss for highly unbalanced object sizes,” in MICCAI, 2018, pp. 612–619

  43. [43]

    Combo loss: Handling input and output imbalance in multi-organ segmentation,

    S. A. Taghanaki, Y . Zheng, S. K. Zhou, B. Georgescu, P. Sharma, D. Xu, D. Comaniciu, and G. Hamarneh, “Combo loss: Handling input and output imbalance in multi-organ segmentation,” CMIG, vol. 75, pp. 24– 33, 2019

  44. [44]

    F 3net: Fusion, feedback and focus for salient object detection,

    J. Wei, S. Wang, and Q. Huang, “F 3net: Fusion, feedback and focus for salient object detection,” in AAAI, 2020, pp. 12 321–12 328

  45. [45]

    Basnet: Boundary-aware salient object detection,

    X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, “Basnet: Boundary-aware salient object detection,” in CVPR, 2019, pp. 7479–7489

  46. [46]

    Automated polyp detection in colonoscopy videos using shape and context information,

    N. Tajbakhsh, S. R. Gurudu, and J. Liang, “Automated polyp detection in colonoscopy videos using shape and context information,” IEEE TMI, vol. 35, no. 2, pp. 630–644, 2015

  47. [47]

    Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,

    J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, “Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,” IJCARS, vol. 9, no. 2, pp. 283–293, 2014

  48. [48]

    Kvasir-seg: A segmented polyp dataset,

    D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” in MMM, 2020, pp. 451–462

  49. [49]

    A benchmark for endoluminal scene segmentation of colonoscopy images,

    D. V ´azquez, J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, A. M. L´opez, A. Romero, M. Drozdzal, and A. Courville, “A benchmark for endoluminal scene segmentation of colonoscopy images,” JHE, vol. 2017, 2017

  50. [50]

    Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

    J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, D. Gil, C. Rodr ´ıguez, and F. Vilari ˜no, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,” CMIG, vol. 43, pp. 99–111, 2015

  51. [51]

    Towards automatic polyp detection with a polyp appearance model,

    J. Bernal, J. S ´anchez, and F. Vilarino, “Towards automatic polyp detection with a polyp appearance model,” Pattern Recognition, vol. 45, no. 9, pp. 3166–3182, 2012

  52. [52]

    Covid-19 ct segmentation dataset,

    “Covid-19 ct segmentation dataset,” in https://medicalsegmentation.com/COVID19/, accessed April, 2020

  53. [53]

    Covid-19 ct lung and infection segmentation dataset,

    “Covid-19 ct lung and infection segmentation dataset,” in https://zenodo.org/record/3757476, accessed April 20, 2020

  54. [54]

    Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images,

    S. Y . Shin, S. Lee, I. D. Yun, S. M. Kim, and K. M. Lee, “Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images,” IEEE TMI , vol. 38, no. 3, pp. 762–774, 2018

  55. [55]

    Dataset of breast ultrasound images,

    W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in brief , vol. 28, p. 104863, 2020

  56. [56]

    Dataset and evaluation algorithm design for goals challenge,

    H. Fang, F. Li, H. Fu, J. Wu, X. Zhang, and Y . Xu, “Dataset and evaluation algorithm design for goals challenge,” arXiv preprint arXiv:2207.14447, 2022

  57. [57]

    How to evaluate foreground maps?

    R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” in CVPR, 2014, pp. 248–255

  58. [58]

    Structure-measure: A new way to evaluate foreground maps,

    D.-P. Fan, M.-M. Cheng, Y . Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in ICCV, 2017, pp. 4548– 4557

  59. [59]

    Enhanced-alignment Measure for Binary Foreground Map Evaluation,

    D.-P. Fan, C. Gong, Y . Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment Measure for Binary Foreground Map Evaluation,” in IJCAI, 2018

  60. [60]

    Lung infection quantification of covid-19 in ct images with deep learning,

    F. Shan, Y . Gao, J. Wang, W. Shi, N. Shi, M. Han, Z. Xue, D. Shen, and Y . Shi, “Lung infection quantification of covid-19 in ct images with deep learning,” arXiv preprint arXiv:2003.04655 , 2020

  61. [61]

    Aau-net: An adaptive attention u-net for breast lesions segmentation in ultrasound images,

    G. Chen, L. Li, Y . Dai, J. Zhang, and M. H. Yap, “Aau-net: An adaptive attention u-net for breast lesions segmentation in ultrasound images,” IEEE TMI , 2022

  62. [62]

    Global context-aware progressive aggregation network for salient object detection,

    Z. Chen, Q. Xu, R. Cong, and Q. Huang, “Global context-aware progressive aggregation network for salient object detection,” in AAAI, 2020, pp. 10 599–10 606

  63. [63]

    Specificity- preserving rgb-d saliency detection,

    T. Zhou, H. Fu, G. Chen, Y . Zhou, D.-P. Fan, and L. Shao, “Specificity- preserving rgb-d saliency detection,” in ICCV, 2021, pp. 4681–4691

  64. [64]

    Perceptual losses for real-time style transfer and super-resolution,

    J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016, pp. 694–711