News Cover Assessment via Multi-task Learning

Chengwei Zhu; Shuang Zhao; Xiao Chen; Zixun Sun

arxiv: 1907.07581 · v2 · pith:D52NUJNSnew · submitted 2019-07-17 · 💻 cs.CV

News Cover Assessment via Multi-task Learning

Zixun Sun , Shuang Zhao , Chengwei Zhu , Xiao Chen This is my paper

Pith reviewed 2026-05-24 20:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords news cover assessmentmulti-task learningimage clarity assessmentsemantic segmentationDeepLabv3+CIA datasetobject salience

0 comments

The pith

A multi-task network assesses news covers by jointly evaluating image clarity and semantic segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an end-to-end network that trains image clarity assessment and semantic segmentation together to judge the suitability of news cover images. The approach targets the difficulty of subjectivity by combining signals about visual sharpness and prominent objects within one model. A shared backbone extracts multi-scale features before the tasks split into separate branches, and the outputs together inform the cover assessment. Experiments on a custom dataset built from game content show gains over models trained on each task in isolation.

Core claim

The proposed end-to-end multi-task learning network, based on a modified DeepLabv3+ model, uses its backbone for multiple scale spatial feature extraction followed by two branches for image clarity assessment and semantic segmentation respectively. The combined results guide news cover assessment. The network captures important content in images and performs better than single-task learning baselines on the proposed game content based CIA dataset.

What carries the argument

Modified DeepLabv3+ backbone with shared multi-scale spatial feature extraction followed by two task-specific branches for clarity assessment and semantic segmentation.

If this is right

The network simultaneously performs clarity assessment and semantic segmentation to inform cover selection.
Joint training yields better results than single-task baselines on the CIA dataset.
The model identifies salient objects while judging image quality within the same forward pass.
The shared feature extractor supplies information useful to both tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-training structure might apply to other image-selection tasks that involve both quality and salience.
If the two tasks share low-level features, the approach could reduce the total labeled examples needed for effective cover assessment.
Performance on game-derived images leaves open whether the same gains appear on general news photography.

Load-bearing premise

That simultaneous training on clarity assessment and semantic segmentation produces meaningful gains over separate training and that these two signals adequately capture the subjective quality of a news cover.

What would settle it

A direct comparison in which single-task versions of the identical backbone achieve equal or higher accuracy than the multi-task model when both are evaluated on the CIA dataset for cover assessment.

Figures

Figures reproduced from arXiv: 1907.07581 by Chengwei Zhu, Shuang Zhao, Xiao Chen, Zixun Sun.

**Figure 2.** Figure 2: Some example images from CIA dataset with image cla [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Some example images from CIA dataset using our mult [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Online personalized news product needs a suitable cover for the article. The news cover demands to be with high image quality, and draw readers' attention at same time, which is extraordinary challenging due to the subjectivity of the task. In this paper, we assess the news cover from image clarity and object salience perspective. We propose an end-to-end multi-task learning network for image clarity assessment and semantic segmentation simultaneously, the results of which can be guided for news cover assessment. The proposed network is based on a modified DeepLabv3+ model. The network backbone is used for multiple scale spatial features exaction, followed by two branches for image clarity assessment and semantic segmentation, respectively. The experiment results show that the proposed model is able to capture important content in images and performs better than single-task learning baselines on our proposed game content based CIA dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-task claim for news covers rests on an unshown performance gain and an unvalidated proxy for subjective quality.

read the letter

The paper's main point is that a modified DeepLabv3+ trained jointly on clarity assessment and semantic segmentation beats single-task baselines on a new game-content CIA dataset for scoring news covers. That is the concrete result they report in the abstract. The combination itself is the novelty: applying these two signals together to the narrow problem of picking article thumbnails rather than a new architecture or loss. The shared backbone plus separate heads is a standard multi-task pattern, and the choice of DeepLabv3+ for the segmentation side is sensible given its multi-scale features. The work is coherent on its own terms and shows clear thinking about a commercial use case without overclaiming new theory. The soft spots are the missing pieces that matter most. The abstract supplies no numbers, no dataset size, no ablation on loss weights, and no statistical comparison, so the superiority claim cannot be checked. The dataset is restricted to game imagery, which leaves open whether the model would behave the same on real news photographs. Most critically, there is no human rating study or correlation analysis showing that the clarity-plus-salience scores actually track what people judge as a good cover. The stress-test note is accurate on this point. This is for engineers at news platforms who might want a quick starting implementation for cover ranking. Readers looking for reproducible CV results, general multi-task insights, or validated proxies for subjective quality will not find enough here. I would not bring it to reading group, would not cite it, and would not send it to peer review.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes an end-to-end multi-task network based on a modified DeepLabv3+ backbone with separate branches for image clarity assessment and semantic segmentation. The results of these tasks are intended to guide news cover assessment, with the claim that the multi-task model captures important image content and outperforms single-task baselines on a proposed game-content CIA dataset.

Significance. If the empirical claims were supported by quantitative results and validated proxies, the work could contribute a multi-task approach to automated media quality assessment. However, the complete absence of metrics, ablations, or correlation studies with human judgments means the significance cannot be assessed from the current manuscript.

major comments (3)

[Abstract] Abstract: the assertion that the multi-task model 'performs better than single-task learning baselines' supplies no quantitative metrics, dataset size, statistical tests, or ablation details, so the central empirical claim cannot be evaluated.
[Abstract] Abstract / proposed method: no human rating study, correlation analysis, or matched-capacity control is reported to establish that clarity assessment plus semantic segmentation outputs align with subjective news-cover suitability judgments.
[Abstract] Abstract: the CIA dataset is described only as 'game content based' with no size, annotation protocol, or transfer experiments to real news photographs, leaving generalizability to the target domain untested.

minor comments (1)

[Abstract] Abstract contains minor grammatical awkwardness ('demands to be with high image quality, and draw readers' attention at same time') that should be revised for readability.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive feedback on the abstract and the need for stronger empirical support. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the multi-task model 'performs better than single-task learning baselines' supplies no quantitative metrics, dataset size, statistical tests, or ablation details, so the central empirical claim cannot be evaluated.

Authors: We agree that the abstract lacks specific quantitative support for the performance claim. The manuscript includes experimental comparisons on the CIA dataset, but we will revise the abstract to report key metrics (e.g., performance deltas versus single-task baselines), dataset size, ablation summaries, and any statistical tests to make the central claim directly evaluable. revision: yes
Referee: [Abstract] Abstract / proposed method: no human rating study, correlation analysis, or matched-capacity control is reported to establish that clarity assessment plus semantic segmentation outputs align with subjective news-cover suitability judgments.

Authors: The manuscript does not include a human rating study, correlation analysis, or matched-capacity controls. The tasks were selected because clarity and object salience are established factors in visual appeal for news covers; we will expand the motivation and discussion sections to clarify this rationale. No direct human validation study was performed. revision: partial
Referee: [Abstract] Abstract: the CIA dataset is described only as 'game content based' with no size, annotation protocol, or transfer experiments to real news photographs, leaving generalizability to the target domain untested.

Authors: We will revise the manuscript to include the size of the CIA dataset and its annotation protocol. The current experiments are confined to the game-content domain as a controlled starting point; no transfer experiments to real news photographs were conducted, and we will add an explicit limitations discussion on generalizability. revision: partial

standing simulated objections not resolved

No human rating study or correlation analysis with subjective news-cover judgments was conducted.
No transfer experiments from the game-content CIA dataset to real news photographs were performed.

Circularity Check

0 steps flagged

No circularity; empirical ML comparison on held-out data

full rationale

The manuscript describes a standard multi-task CNN (modified DeepLabv3+) trained end-to-end on a new dataset and reports that it outperforms single-task baselines. No equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems appear in the provided text. The central claim is an empirical performance delta, which is independent of the model definition and does not reduce to any input by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claim depends on the unstated assumption that the two tasks share useful features and that the custom dataset is representative; no external benchmarks or formal derivations are supplied.

free parameters (1)

task-specific loss weights
Balance between clarity and segmentation losses must be chosen or tuned on the training data.

axioms (1)

domain assumption Multi-task learning improves feature learning when the tasks are related
Invoked by the decision to share the DeepLabv3+ backbone between the two branches.

pith-pipeline@v0.9.0 · 5669 in / 1029 out tokens · 18967 ms · 2026-05-24T20:24:01.305210+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 3 internal anchors

[1]

Sebastian Bosse, Dominique Maniry, Klaus-Robert Mülle r, Thomas Wiegand, and Wojciech Samek. 2018. Deep neural networks for no-refer ence and full- reference image quality assessment. IEEE Transactions on Image Processing 27, 1 (2018), 206–219

work page 2018
[2]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep c onvolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation w ith deep convolu- tional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834–848

work page 2018
[4]

Liang-Chieh Chen, George Papandreou, Florian Schroﬀ, a nd Hartwig Adam

work page
[5]

Rethinking Atrous Convolution for Semantic Image Segmentation

Rethinking atrous convolution for semantic image seg mentation. arXiv preprint arXiv:1706.05587 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Floria n Schroﬀ, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Com- puter Vision (ECCV) . 801–818

work page 2018
[7]

François Chollet. 2017. Xception: Deep learning with de pthwise separable con- volutions. In Proceedings of the IEEE conference on computer vision and pat tern recognition. 1251–1258

work page 2017
[8]

Erez Cohen and Yitzhak Yitzhaky. 2010. No-reference ass essment of blur and noise impacts on image quality. Signal, image and video processing 4, 3 (2010), 289–302

work page 2010
[9]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Reh feld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Ber nt Schiele. 2016. The cityscapes dataset for semantic urban scene understand ing. In Proceedings of the IEEE conference on computer vision and pattern recogn ition. 3213–3223

work page 2016
[10]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and An- drew Zisserman. 2010. The pascal visual object classes (voc ) challenge. Interna- tional journal of computer vision 88, 2 (2010), 303–338

work page 2010
[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2 016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page
[12]

Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convol utional neural net- works for no-reference image quality assessment. In Proceedings of the IEEE con- ference on computer vision and pattern recognition . 1733–1740

work page 2014
[13]

Iasonas Kokkinos. 2017. Ubernet: Training a universal convolutional neural net- work for low-, mid-, and high-level vision using diverse dat asets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 6129–6138

work page 2017
[14]

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. 2012. Imagenet classiﬁ- cation with deep convolutional neural networks. In Advances in neural informa- tion processing systems . 1097–1105

work page 2012
[15]

Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. 2 017. Rankiqa: Learn- ing from rankings for no-reference image quality assessmen t. In Proceedings of the IEEE International Conference on Computer Vision . 1040–1049

work page
[16]

Maria G Martini, Chaminda TER Hewage, and Barbara Villa rini. 2012. Image quality assessment based on edge preservation. Signal Processing: Image Com- munication 27, 8 (2012), 875–882

work page 2012
[17]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and M artial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition . 3994–4003

work page 2016
[18]

Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu C ho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The rol e of context for ob- ject detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 891–898

work page 2014
[19]

Zhangkai Ni, Lin Ma, Huanqiang Zeng, Canhui Cai, and Kai -Kuang Ma. 2016. Screen content image quality assessment using edge model. I n 2016 IEEE Inter- national Conference on Image Processing (ICIP) . IEEE, 81–85

work page 2016
[20]

Zijia Niu, Wen Liu, Jingyi Zhao, and Guoqian Jiang. 2019. DeepLab-Based Spatial Feature Extraction for Hyperspectral Image Classiﬁcation . IEEE Geoscience and Remote Sensing Letters 16, 2 (2019), 251–255

work page 2019
[21]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sa njeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al

work page
[22]

International journal of computer vision 115, 3 (2015), 211–252

Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252

work page 2015
[23]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmo ginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and line ar bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 4510–4520

work page 2018
[24]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision . Springer, 746–760

work page 2012
[25]

Karen Simonyan and Andrew Zisserman. 2014. Very deep co nvolutional net- works for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Serma net, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rab inovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on com- puter vision and pattern recognition . 1–9

work page 2015
[27]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998–4011

work page 2018

[1] [1]

Sebastian Bosse, Dominique Maniry, Klaus-Robert Mülle r, Thomas Wiegand, and Wojciech Samek. 2018. Deep neural networks for no-refer ence and full- reference image quality assessment. IEEE Transactions on Image Processing 27, 1 (2018), 206–219

work page 2018

[2] [2]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep c onvolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[3] [3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos , Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation w ith deep convolu- tional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834–848

work page 2018

[4] [4]

Liang-Chieh Chen, George Papandreou, Florian Schroﬀ, a nd Hartwig Adam

work page

[5] [5]

Rethinking Atrous Convolution for Semantic Image Segmentation

Rethinking atrous convolution for semantic image seg mentation. arXiv preprint arXiv:1706.05587 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Floria n Schroﬀ, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Com- puter Vision (ECCV) . 801–818

work page 2018

[7] [7]

François Chollet. 2017. Xception: Deep learning with de pthwise separable con- volutions. In Proceedings of the IEEE conference on computer vision and pat tern recognition. 1251–1258

work page 2017

[8] [8]

Erez Cohen and Yitzhak Yitzhaky. 2010. No-reference ass essment of blur and noise impacts on image quality. Signal, image and video processing 4, 3 (2010), 289–302

work page 2010

[9] [9]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Reh feld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Ber nt Schiele. 2016. The cityscapes dataset for semantic urban scene understand ing. In Proceedings of the IEEE conference on computer vision and pattern recogn ition. 3213–3223

work page 2016

[10] [10]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and An- drew Zisserman. 2010. The pascal visual object classes (voc ) challenge. Interna- tional journal of computer vision 88, 2 (2010), 303–338

work page 2010

[11] [11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2 016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page

[12] [12]

Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convol utional neural net- works for no-reference image quality assessment. In Proceedings of the IEEE con- ference on computer vision and pattern recognition . 1733–1740

work page 2014

[13] [13]

Iasonas Kokkinos. 2017. Ubernet: Training a universal convolutional neural net- work for low-, mid-, and high-level vision using diverse dat asets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 6129–6138

work page 2017

[14] [14]

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. 2012. Imagenet classiﬁ- cation with deep convolutional neural networks. In Advances in neural informa- tion processing systems . 1097–1105

work page 2012

[15] [15]

Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. 2 017. Rankiqa: Learn- ing from rankings for no-reference image quality assessmen t. In Proceedings of the IEEE International Conference on Computer Vision . 1040–1049

work page

[16] [16]

Maria G Martini, Chaminda TER Hewage, and Barbara Villa rini. 2012. Image quality assessment based on edge preservation. Signal Processing: Image Com- munication 27, 8 (2012), 875–882

work page 2012

[17] [17]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and M artial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition . 3994–4003

work page 2016

[18] [18]

Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu C ho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The rol e of context for ob- ject detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 891–898

work page 2014

[19] [19]

Zhangkai Ni, Lin Ma, Huanqiang Zeng, Canhui Cai, and Kai -Kuang Ma. 2016. Screen content image quality assessment using edge model. I n 2016 IEEE Inter- national Conference on Image Processing (ICIP) . IEEE, 81–85

work page 2016

[20] [20]

Zijia Niu, Wen Liu, Jingyi Zhao, and Guoqian Jiang. 2019. DeepLab-Based Spatial Feature Extraction for Hyperspectral Image Classiﬁcation . IEEE Geoscience and Remote Sensing Letters 16, 2 (2019), 251–255

work page 2019

[21] [21]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sa njeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al

work page

[22] [22]

International journal of computer vision 115, 3 (2015), 211–252

Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252

work page 2015

[23] [23]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmo ginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and line ar bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Patt ern Recognition. 4510–4520

work page 2018

[24] [24]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision . Springer, 746–760

work page 2012

[25] [25]

Karen Simonyan and Andrew Zisserman. 2014. Very deep co nvolutional net- works for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Serma net, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rab inovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on com- puter vision and pattern recognition . 1–9

work page 2015

[27] [27]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998–4011

work page 2018