pith. sign in

arxiv: 1907.07581 · v2 · pith:D52NUJNSnew · submitted 2019-07-17 · 💻 cs.CV

News Cover Assessment via Multi-task Learning

Pith reviewed 2026-05-24 20:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords news cover assessmentmulti-task learningimage clarity assessmentsemantic segmentationDeepLabv3+CIA datasetobject salience
0
0 comments X

The pith

A multi-task network assesses news covers by jointly evaluating image clarity and semantic segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an end-to-end network that trains image clarity assessment and semantic segmentation together to judge the suitability of news cover images. The approach targets the difficulty of subjectivity by combining signals about visual sharpness and prominent objects within one model. A shared backbone extracts multi-scale features before the tasks split into separate branches, and the outputs together inform the cover assessment. Experiments on a custom dataset built from game content show gains over models trained on each task in isolation.

Core claim

The proposed end-to-end multi-task learning network, based on a modified DeepLabv3+ model, uses its backbone for multiple scale spatial feature extraction followed by two branches for image clarity assessment and semantic segmentation respectively. The combined results guide news cover assessment. The network captures important content in images and performs better than single-task learning baselines on the proposed game content based CIA dataset.

What carries the argument

Modified DeepLabv3+ backbone with shared multi-scale spatial feature extraction followed by two task-specific branches for clarity assessment and semantic segmentation.

If this is right

  • The network simultaneously performs clarity assessment and semantic segmentation to inform cover selection.
  • Joint training yields better results than single-task baselines on the CIA dataset.
  • The model identifies salient objects while judging image quality within the same forward pass.
  • The shared feature extractor supplies information useful to both tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same joint-training structure might apply to other image-selection tasks that involve both quality and salience.
  • If the two tasks share low-level features, the approach could reduce the total labeled examples needed for effective cover assessment.
  • Performance on game-derived images leaves open whether the same gains appear on general news photography.

Load-bearing premise

That simultaneous training on clarity assessment and semantic segmentation produces meaningful gains over separate training and that these two signals adequately capture the subjective quality of a news cover.

What would settle it

A direct comparison in which single-task versions of the identical backbone achieve equal or higher accuracy than the multi-task model when both are evaluated on the CIA dataset for cover assessment.

Figures

Figures reproduced from arXiv: 1907.07581 by Chengwei Zhu, Shuang Zhao, Xiao Chen, Zixun Sun.

Figure 1
Figure 1. Figure 1: Multi-task learning network architecture. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Some example images from CIA dataset with image cla [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Some example images from CIA dataset using our mult [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Online personalized news product needs a suitable cover for the article. The news cover demands to be with high image quality, and draw readers' attention at same time, which is extraordinary challenging due to the subjectivity of the task. In this paper, we assess the news cover from image clarity and object salience perspective. We propose an end-to-end multi-task learning network for image clarity assessment and semantic segmentation simultaneously, the results of which can be guided for news cover assessment. The proposed network is based on a modified DeepLabv3+ model. The network backbone is used for multiple scale spatial features exaction, followed by two branches for image clarity assessment and semantic segmentation, respectively. The experiment results show that the proposed model is able to capture important content in images and performs better than single-task learning baselines on our proposed game content based CIA dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes an end-to-end multi-task network based on a modified DeepLabv3+ backbone with separate branches for image clarity assessment and semantic segmentation. The results of these tasks are intended to guide news cover assessment, with the claim that the multi-task model captures important image content and outperforms single-task baselines on a proposed game-content CIA dataset.

Significance. If the empirical claims were supported by quantitative results and validated proxies, the work could contribute a multi-task approach to automated media quality assessment. However, the complete absence of metrics, ablations, or correlation studies with human judgments means the significance cannot be assessed from the current manuscript.

major comments (3)
  1. [Abstract] Abstract: the assertion that the multi-task model 'performs better than single-task learning baselines' supplies no quantitative metrics, dataset size, statistical tests, or ablation details, so the central empirical claim cannot be evaluated.
  2. [Abstract] Abstract / proposed method: no human rating study, correlation analysis, or matched-capacity control is reported to establish that clarity assessment plus semantic segmentation outputs align with subjective news-cover suitability judgments.
  3. [Abstract] Abstract: the CIA dataset is described only as 'game content based' with no size, annotation protocol, or transfer experiments to real news photographs, leaving generalizability to the target domain untested.
minor comments (1)
  1. [Abstract] Abstract contains minor grammatical awkwardness ('demands to be with high image quality, and draw readers' attention at same time') that should be revised for readability.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive feedback on the abstract and the need for stronger empirical support. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the multi-task model 'performs better than single-task learning baselines' supplies no quantitative metrics, dataset size, statistical tests, or ablation details, so the central empirical claim cannot be evaluated.

    Authors: We agree that the abstract lacks specific quantitative support for the performance claim. The manuscript includes experimental comparisons on the CIA dataset, but we will revise the abstract to report key metrics (e.g., performance deltas versus single-task baselines), dataset size, ablation summaries, and any statistical tests to make the central claim directly evaluable. revision: yes

  2. Referee: [Abstract] Abstract / proposed method: no human rating study, correlation analysis, or matched-capacity control is reported to establish that clarity assessment plus semantic segmentation outputs align with subjective news-cover suitability judgments.

    Authors: The manuscript does not include a human rating study, correlation analysis, or matched-capacity controls. The tasks were selected because clarity and object salience are established factors in visual appeal for news covers; we will expand the motivation and discussion sections to clarify this rationale. No direct human validation study was performed. revision: partial

  3. Referee: [Abstract] Abstract: the CIA dataset is described only as 'game content based' with no size, annotation protocol, or transfer experiments to real news photographs, leaving generalizability to the target domain untested.

    Authors: We will revise the manuscript to include the size of the CIA dataset and its annotation protocol. The current experiments are confined to the game-content domain as a controlled starting point; no transfer experiments to real news photographs were conducted, and we will add an explicit limitations discussion on generalizability. revision: partial

standing simulated objections not resolved
  • No human rating study or correlation analysis with subjective news-cover judgments was conducted.
  • No transfer experiments from the game-content CIA dataset to real news photographs were performed.

Circularity Check

0 steps flagged

No circularity; empirical ML comparison on held-out data

full rationale

The manuscript describes a standard multi-task CNN (modified DeepLabv3+) trained end-to-end on a new dataset and reports that it outperforms single-task baselines. No equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems appear in the provided text. The central claim is an empirical performance delta, which is independent of the model definition and does not reduce to any input by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claim depends on the unstated assumption that the two tasks share useful features and that the custom dataset is representative; no external benchmarks or formal derivations are supplied.

free parameters (1)
  • task-specific loss weights
    Balance between clarity and segmentation losses must be chosen or tuned on the training data.
axioms (1)
  • domain assumption Multi-task learning improves feature learning when the tasks are related
    Invoked by the decision to share the DeepLabv3+ backbone between the two branches.

pith-pipeline@v0.9.0 · 5669 in / 1029 out tokens · 18967 ms · 2026-05-24T20:24:01.305210+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 3 internal anchors

  1. [1]

    Sebastian Bosse, Dominique Maniry, Klaus-Robert Mülle r, Thomas Wiegand, and Wojciech Samek. 2018. Deep neural networks for no-refer ence and full- reference image quality assessment. IEEE Transactions on Image Processing 27, 1 (2018), 206–219

  2. [2]

    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep c onvolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)

  3. [3]

    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation w ith deep convolu- tional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834–848

  4. [4]

    Liang-Chieh Chen, George Papandreou, Florian Schroff, a nd Hartwig Adam

  5. [5]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    Rethinking atrous convolution for semantic image seg mentation. arXiv preprint arXiv:1706.05587 (2017)

  6. [6]

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Floria n Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Com- puter Vision (ECCV) . 801–818

  7. [7]

    François Chollet. 2017. Xception: Deep learning with de pthwise separable con- volutions. In Proceedings of the IEEE conference on computer vision and pat tern recognition. 1251–1258

  8. [8]

    Erez Cohen and Yitzhak Yitzhaky. 2010. No-reference ass essment of blur and noise impacts on image quality. Signal, image and video processing 4, 3 (2010), 289–302

  9. [9]

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Reh feld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Ber nt Schiele. 2016. The cityscapes dataset for semantic urban scene understand ing. In Proceedings of the IEEE conference on computer vision and pattern recogn ition. 3213–3223

  10. [10]

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and An- drew Zisserman. 2010. The pascal visual object classes (voc ) challenge. Interna- tional journal of computer vision 88, 2 (2010), 303–338

  11. [11]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2 016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

  12. [12]

    Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convol utional neural net- works for no-reference image quality assessment. In Proceedings of the IEEE con- ference on computer vision and pattern recognition . 1733–1740

  13. [13]

    Iasonas Kokkinos. 2017. Ubernet: Training a universal convolutional neural net- work for low-, mid-, and high-level vision using diverse dat asets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 6129–6138

  14. [14]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi- cation with deep convolutional neural networks. In Advances in neural informa- tion processing systems . 1097–1105

  15. [15]

    Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. 2 017. Rankiqa: Learn- ing from rankings for no-reference image quality assessmen t. In Proceedings of the IEEE International Conference on Computer Vision . 1040–1049

  16. [16]

    Maria G Martini, Chaminda TER Hewage, and Barbara Villa rini. 2012. Image quality assessment based on edge preservation. Signal Processing: Image Com- munication 27, 8 (2012), 875–882

  17. [17]

    Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and M artial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition . 3994–4003

  18. [18]

    Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu C ho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The rol e of context for ob- ject detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 891–898

  19. [19]

    Zhangkai Ni, Lin Ma, Huanqiang Zeng, Canhui Cai, and Kai -Kuang Ma. 2016. Screen content image quality assessment using edge model. I n 2016 IEEE Inter- national Conference on Image Processing (ICIP) . IEEE, 81–85

  20. [20]

    Zijia Niu, Wen Liu, Jingyi Zhao, and Guoqian Jiang. 2019. DeepLab-Based Spatial Feature Extraction for Hyperspectral Image Classification . IEEE Geoscience and Remote Sensing Letters 16, 2 (2019), 251–255

  21. [21]

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sa njeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al

  22. [22]

    International journal of computer vision 115, 3 (2015), 211–252

    Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252

  23. [23]

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmo ginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and line ar bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 4510–4520

  24. [24]

    Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision . Springer, 746–760

  25. [25]

    Karen Simonyan and Andrew Zisserman. 2014. Very deep co nvolutional net- works for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  26. [26]

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Serma net, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rab inovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on com- puter vision and pattern recognition . 1–9

  27. [27]

    Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998–4011