pith. sign in

arxiv: 1907.07449 · v1 · pith:K736HRKZnew · submitted 2019-07-17 · 💻 cs.CV

OGNet: Salient Object Detection with Output-guided Attention Module

Pith reviewed 2026-05-24 20:35 UTC · model grok-4.3

classification 💻 cs.CV
keywords salient object detectionattention mechanismoutput-guided attentionF-measure lossdeep learningcomputer visionimage segmentationedge detection
0
0 comments X

The pith

An output-guided attention module using multi-scale outputs addresses blind overconfidence in salient object detection models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix a problem in attention-based salient object detection where self-attention on feature maps leads to blind overconfidence. It introduces an output-guided attention module that incorporates multi-scale outputs to guide the attention process instead. Additionally, a new intractable area F-measure loss function targets improvements in edge and confusing areas of images. A sympathetic reader would care because this could lead to more accurate and reliable detection in challenging image regions while keeping the model lightweight. Experiments on multiple datasets support the effectiveness of these changes.

Core claim

Instead of applying the widely used self-attention module, the paper presents an output-guided attention module built with multi-scale outputs to overcome the problem of blind overconfidence. It also constructs a new loss function, the intractable area F-measure loss function, which is based on the F-measure of the hard-to-handle area to improve the detection effect of the model in the edge areas and confusing areas of an image.

What carries the argument

The output-guided attention module, which is built with multi-scale outputs to guide attention and avoid blind overconfidence from using processed feature maps as input.

Load-bearing premise

That guiding attention with multi-scale model outputs will overcome blind overconfidence without introducing new biases or requiring extensive tuning.

What would settle it

An experiment measuring overconfidence levels in self-attention versus output-guided attention on the same backbone, or ablation results showing no improvement in edge areas with the new loss.

Figures

Figures reproduced from arXiv: 1907.07449 by Lanyun Zhu, Shiping Zhu.

Figure 1
Figure 1. Figure 1: Some examples of output in different layers. (a) Input image; (b) ground truth; (c) output of layer 1; (d) output of layer 2; (e) output of layer 3; (f) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The structure of channel attention. FC is the fully connected layer. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The structure of spatial attention. CAT is the concatenation of some [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The detailed structure of the output-guided model. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The detailed structure of a layer of the decoder. Feature maps from [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Some examples of multi output and the difference map. The output of different layers in the network is different in some areas. From the difference [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a)-(e) are P-R curves on various datasets, including ECSSD, HKU-IS, DUTS-T, DUT-O and SOD. (f)-(i) are F-measure curves on various datasets, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison with 9 state-of-the-art methods. (a) Input image; (b) ground truth; (c) ours; (d) PAGRN [ [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Memory comparison with some methods, including Amulet[], DSS+ [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison between output obtained by models applying IAF loss [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of MAE scores on five datasets of the original DSS and [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
read the original abstract

Attention mechanisms are widely used in salient object detection models based on deep learning, which can effectively promote the extraction and utilization of useful information by neural networks. However, most of the existing attention modules used in salient object detection are input with the processed feature map itself, which easily leads to the problem of `blind overconfidence'. In this paper, instead of applying the widely used self-attention module, we present an output-guided attention module built with multi-scale outputs to overcome the problem of `blind overconfidence'. We also construct a new loss function, the intractable area F-measure loss function, which is based on the F-measure of the hard-to-handle area to improve the detection effect of the model in the edge areas and confusing areas of an image. Extensive experiments and abundant ablation studies are conducted to evaluate the effect of our methods and to explore the most suitable structure for the model. Tests on several data sets show that our model performs very well, even though it is very lightweight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes OGNet for salient object detection. It replaces standard self-attention with an output-guided attention module constructed from multi-scale outputs to address 'blind overconfidence', and introduces an intractable area F-measure loss based on F-measure of hard-to-handle regions to improve edge and confusing area detection. The paper states that extensive experiments and ablation studies confirm strong performance on several datasets while the model remains lightweight.

Significance. If the empirical results and module effectiveness are substantiated with quantitative evidence, the work could provide a lightweight alternative attention design and specialized loss for salient object detection, with potential relevance for efficient models handling difficult image regions.

major comments (3)
  1. [Abstract] Abstract: the central performance claims ('performs very well', 'extensive experiments') are unsupported by any quantitative results, baselines, error bars, dataset names, or numerical comparisons, so the improvements asserted for the output-guided module and new loss cannot be evaluated.
  2. [Abstract] Abstract: the output-guided attention module is described only at a high level with no equations, formulation, or derivation showing why multi-scale outputs (rather than processed feature maps) overcome blind overconfidence; this is the load-bearing structural claim but lacks any supporting analysis or isolation of the effect.
  3. [Abstract] Abstract: the intractable area F-measure loss is introduced without a mathematical definition, computation details, or comparison to standard losses, leaving its claimed benefit for edge/confusing areas unassessable.
minor comments (1)
  1. The phrase 'intractable area' is used without definition or motivation; clarify its meaning and relation to the F-measure computation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for these focused comments on the abstract. We agree that the abstract as submitted is insufficiently specific and will revise it to include quantitative support, a concise formulation of the attention module, and a definition of the loss function.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims ('performs very well', 'extensive experiments') are unsupported by any quantitative results, baselines, error bars, dataset names, or numerical comparisons, so the improvements asserted for the output-guided module and new loss cannot be evaluated.

    Authors: We agree that the abstract lacks concrete numbers and comparisons. In the revised version we will add the principal quantitative results (e.g., mean F-measure and MAE on the standard benchmarks), the main baselines, and the dataset names so that the performance claims can be directly assessed. revision: yes

  2. Referee: [Abstract] Abstract: the output-guided attention module is described only at a high level with no equations, formulation, or derivation showing why multi-scale outputs (rather than processed feature maps) overcome blind overconfidence; this is the load-bearing structural claim but lacks any supporting analysis or isolation of the effect.

    Authors: The abstract indeed presents the module at a summary level. We will insert a compact equation or formulation of the output-guided attention together with a one-sentence explanation of why multi-scale outputs reduce blind overconfidence, referencing the analysis already present in the body of the paper. revision: yes

  3. Referee: [Abstract] Abstract: the intractable area F-measure loss is introduced without a mathematical definition, computation details, or comparison to standard losses, leaving its claimed benefit for edge/confusing areas unassessable.

    Authors: We acknowledge the absence of any mathematical detail on the loss in the abstract. The revised abstract will contain a brief mathematical definition of the intractable-area F-measure loss and a short statement of how it differs from standard losses in its emphasis on hard regions. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical architecture proposal with experimental validation

full rationale

The paper proposes an output-guided attention module and intractable area F-measure loss for salient object detection, claiming these address blind overconfidence and edge issues. No mathematical derivations, equations, or first-principles predictions appear in the provided text. Claims rest entirely on architectural description plus extensive experiments and ablations on datasets, with no reduction to fitted inputs, self-citations, or self-definitional steps. The derivation chain is self-contained as an empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, parameters, or assumptions to audit; no free parameters, axioms, or invented entities identifiable.

pith-pipeline@v0.9.0 · 5695 in / 1052 out tokens · 14912 ms · 2026-05-24T20:35:30.468499+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 5 internal anchors

  1. [1]

    Rapid biologically-inspired scene classification using features shared with visual attention,

    C. Siagian and L. Itti, “Rapid biologically-inspired scene classification using features shared with visual attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 29, no. 2, pp. 300–312, 2007

  2. [2]

    Region-based saliency detection and its application in object recognition,

    Z. Ren, S. Gao, L.-T. Chia, and I. W.-H. Tsang, “Region-based saliency detection and its application in object recognition,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 24, no. 5, pp. 769– 779, 2014

  3. [3]

    3-d object retrieval and recognition with hypergraph analysis,

    Y . Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-d object retrieval and recognition with hypergraph analysis,” IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 4290–4303, 2012

  4. [4]

    Mobile product search with bag of hash bits and boundary reranking,

    J. He, J. Feng, X. Liu, T. Cheng, T.-H. Lin, H. Chung, and S.-F. Chang, “Mobile product search with bag of hash bits and boundary reranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3005–3012, 2012

  5. [5]

    Online tracking by learning discriminative saliency map with convolutional neural network,

    S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learning discriminative saliency map with convolutional neural network,” in International Conference on Machine Learning , pp. 597–606, 2015

  6. [6]

    Saliency-based discriminant track- ing,

    V . Mahadevan and N. Vasconcelos, “Saliency-based discriminant track- ing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1007–1013, 2009

  7. [7]

    A saliency prior context model for real-time object tracking,

    C. Ma, Z. Miao, X.-P. Zhang, and M. Li, “A saliency prior context model for real-time object tracking,” IEEE Transactions on Multimedia, vol. 19, no. 11, pp. 2415–2424, 2017

  8. [8]

    Saliency filters: Contrast based filtering for salient region detection,

    F. Perazzi, P. Kr ¨ahenb¨uhl, Y . Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–740, 2012

  9. [9]

    Graph-based visual saliency,

    J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in Neural Information Processing Systems, pp. 545–552, 2007

  10. [10]

    A model of saliency-based visual at- tention for rapid scene analysis,

    L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual at- tention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 20, no. 11, pp. 1254–1259, 1998

  11. [11]

    Hierarchical saliency detection,

    Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1155–1162, 2013

  12. [12]

    Salient object detection: A discriminative regional feature integration approach,

    H. Jiang, J. Wang, Z. Yuan, Y . Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2083–2090, 2013

  13. [13]

    Salient object detection via two-stage graphs,

    Y . Liu, J. Han, Q. Zhang, and L. Wang, “Salient object detection via two-stage graphs,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 4, pp. 1023–1037, 2019

  14. [14]

    Dhsnet: Deep hierarchical saliency network for salient object detection,

    N. Liu and J. Han, “Dhsnet: Deep hierarchical saliency network for salient object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 678–686, 2016

  15. [15]

    Reverse attention for salient ob- ject detection,

    S. Chen, X. Tan, B. Wang, and X. Hu, “Reverse attention for salient ob- ject detection,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 234–250, 2018

  16. [16]

    Deep networks for saliency detection via local estimation and global search,

    L. Wang, H. Lu, X. Ruan, and M.-H. Yang, “Deep networks for saliency detection via local estimation and global search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 3183–3192, 2015

  17. [17]

    Learning uncertain convolutional features for accurate saliency detection,

    P. Zhang, D. Wang, H. Lu, H. Wang, and B. Yin, “Learning uncertain convolutional features for accurate saliency detection,” in IEEE Inter- national Conference on Computer Vision (ICCV) , pp. 212–221, 2017

  18. [18]

    Amulet: Aggre- gating multi-level convolutional features for salient object detection,

    P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan, “Amulet: Aggre- gating multi-level convolutional features for salient object detection,” in Proceedings of the IEEE International Conference on Computer Vision(CVPR), pp. 202–211, 2017

  19. [19]

    Non- local deep features for salient object detection,

    Z. Luo, A. Mishra, A. Achkar, J. Eichel, S. Li, and P.-M. Jodoin, “Non- local deep features for salient object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2017

  20. [20]

    Deep saliency with encoded low level distance map and high level features,

    G. Lee, Y .-W. Tai, and J. Kim, “Deep saliency with encoded low level distance map and high level features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 660–668, 2016

  21. [21]

    Saliency detection with recurrent fully convolutional networks,

    L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan, “Saliency detection with recurrent fully convolutional networks,” in European Conference on Computer Vision(ECCV) , pp. 825–841, Springer, 2016

  22. [22]

    Deep contrast learning for salient object detection,

    G. Li and Y . Yu, “Deep contrast learning for salient object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 478–487, 2016

  23. [23]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Infor- mation Processing Systems , pp. 1097–1105, 2012

  24. [24]

    Going deeper with convolutions,

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–9, 2015

  25. [25]

    Faster r-cnn: Towards real-time object detection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems , pp. 91–99, 2015

  26. [26]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 779–788, 2016

  27. [27]

    Mask r-cnn,

    K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” in IEEE International Conference on Computer Vision (ICCV) , pp. 2980–2988, 2017

  28. [28]

    Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,

    G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern recognition (CVPR), vol. 1, p. 3, 2017

  29. [29]

    Large kernel matter- simprove semantic segmentation by global convolutional network,

    C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matter- simprove semantic segmentation by global convolutional network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1743–1751, 2017

  30. [30]

    Progressive attention guided recurrent network for salient object detection,

    X. Zhang, T. Wang, J. Qi, H. Lu, and G. Wang, “Progressive attention guided recurrent network for salient object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion(CVPR), pp. 714–722, 2018

  31. [31]

    PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

    N. Liu, J. Han, and M.-H. Yang, “Picanet: Learning pixel-wise contex- tual attention for saliency detection,” arXiv preprint arXiv:1708.06433 , 2017

  32. [32]

    Fully convolutional networks for semantic segmentation,

    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3431–3440, 2015

  33. [33]

    Deeply supervised salient object detection with short connections,

    Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3203–3212, 2017

  34. [34]

    Icnet for real-time semantic segmentation on high-resolution images,

    H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 405–420, 2018

  35. [35]

    Parallel feature pyramid network for object detection,

    S.-W. Kim, H.-K. Kook, J.-Y . Sun, M.-C. Kang, and S.-J. Ko, “Parallel feature pyramid network for object detection,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 234–250, 2018

  36. [36]

    Deep salient object detection with dense connections and distraction diagnosis,

    H. Xiao, J. Feng, Y . Wei, M. Zhang, and S. Yan, “Deep salient object detection with dense connections and distraction diagnosis,” IEEE Transactions on Multimedia , vol. 20, no. 12, pp. 3239–3251, 2018

  37. [38]

    Direction selective contour detection for salient ob- jects,

    A. Manno-Kovacs, “Direction selective contour detection for salient ob- jects,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 2, pp. 375–389, 2019

  38. [39]

    Visual saliency based on multiscale deep features,

    G. Li and Y . Yu, “Visual saliency based on multiscale deep features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5455–5463, 2015

  39. [40]

    Saliency detection by multi-context deep learning,

    R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 1265–1274, 2015

  40. [41]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition(CVPR) , pp. 4700– 4708, 2017

  41. [42]

    A stagewise refinement model for detecting salient objects in images,

    T. Wang, A. Borji, L. Zhang, P. Zhang, and H. Lu, “A stagewise refinement model for detecting salient objects in images,” in IEEE International Conference on Computer Vision(ICCV) , pp. 4019–4028, 2017

  42. [43]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems , pp. 5998–6008, 2017

  43. [44]

    Effective Approaches to Attention-based Neural Machine Translation

    M.-T. Luong, H. Pham, and C. D. Manning, “Effective ap- proaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015

  44. [45]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7132–7141, 2018

  45. [46]

    Learning a Discriminative Feature Network for Semantic Segmentation

    C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a discriminative feature network for semantic segmentation,” arXiv preprint arXiv:1804.09337, 2018

  46. [47]

    Dual Attention Network for Scene Segmentation

    J. Fu, J. Liu, H. Tian, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” arXiv preprint arXiv:1809.02983 , 2018

  47. [48]

    Attention-based multiview re- observation fusion network for skeletal action recognition,

    Z. Fan, X. Zhao, T. Lin, and H. Su, “Attention-based multiview re- observation fusion network for skeletal action recognition,” IEEE Trans- actions on Multimedia , vol. 21, no. 2, pp. 363–374, 2019

  48. [49]

    Cbam: Convolutional block attention module,

    S. Woo, J. Park, J.-Y . Lee, and I. So Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 3–19, 2018

  49. [50]

    Rectified linear units improve restricted boltz- mann machines,

    V . Nair and G. E. Hinton, “Rectified linear units improve restricted boltz- mann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10) , pp. 807–814, 2010

  50. [51]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

  51. [52]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 770–778, 2016

  52. [53]

    Xception: Deep learning with depthwise separable convo- lutions,

    F. Chollet, “Xception: Deep learning with depthwise separable convo- lutions,” arXiv preprint, pp. 1610–02357, 2017

  53. [54]

    Improving object localization with fitness nms and bounded iou loss,

    L. Tychsen-Smith and L. Petersson, “Improving object localization with fitness nms and bounded iou loss,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 6877– 6885, 2018

  54. [55]

    Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,

    C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep learning in medical image analysis and multimodal learning for clinical decision support , pp. 240–248, Springer, 2017

  55. [56]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 4, pp. 834–848, 2018

  56. [57]

    Kernelized subspace ranking for saliency detection,

    T. Wang, L. Zhang, H. Lu, C. Sun, and J. Qi, “Kernelized subspace ranking for saliency detection,” in European Conference on Computer Vision(ECCV), pp. 450–466, Springer, 2016

  57. [58]

    Learning to detect a salient object,

    T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.-Y . Shum, “Learning to detect a salient object,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 33, no. 2, pp. 353–367, 2011

  58. [59]

    Learning to detect salient objects with image-level supervision,

    L. Wang, H. Lu, Y . Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, “Learning to detect salient objects with image-level supervision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 136–145, 2017

  59. [60]

    Saliency detection via graph-based manifold ranking,

    C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , pp. 3166–3173, 2013

  60. [61]

    A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,

    D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in IEEE International Conference on Computer Vision(ICCV) , vol. 2, pp. 416–423, 2001

  61. [62]

    Structure-measure: A new way to evaluate foreground maps,

    D.-P. Fan, M.-M. Cheng, Y . Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proceedings of the IEEE international conference on computer vision , pp. 4548–4557, 2017

  62. [63]

    Contour knowledge transfer for salient object detection,

    X. Li, F. Yang, H. Cheng, W. Liu, and D. Shen, “Contour knowledge transfer for salient object detection,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 355–370, 2018

  63. [64]

    Instance-level salient object segmen- tation,

    G. Li, Y . Xie, L. Lin, and Y . Yu, “Instance-level salient object segmen- tation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 247–256, 2017

  64. [65]

    Deepsaliency: Multi-task deep neural network model for salient object detection,

    X. Li, L. Zhao, L. Wei, M.-H. Yang, F. Wu, Y . Zhuang, H. Ling, and J. Wang, “Deepsaliency: Multi-task deep neural network model for salient object detection,” IEEE Transactions on Image Processing , vol. 25, no. 8, pp. 3919–3930, 2016. Shiping Zhu (M05) received the B.Sc. and M.Sc. degrees in measuring and testing technologies and instruments from Xian U...