Aesthetic Attributes Assessment of Images

Bin Zhou; Dongqing Zou; Geng Zhao; Le Wu; Shiming Ge; Xiaodong Li; Xiaokun Zhang; Xinghui Zhou; Xin Jin

arxiv: 1907.04983 · v2 · pith:XTXUHMEBnew · submitted 2019-07-11 · 💻 cs.CV

Aesthetic Attributes Assessment of Images

Xin Jin , Le Wu , Geng Zhao , Xiaodong Li , Xiaokun Zhang , Shiming Ge , Dongqing Zou , Bin Zhou

show 1 more author

Xinghui Zhou

This is my paper

Pith reviewed 2026-05-24 23:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords image aesthetic assessmentaesthetic attributesimage captioningmulti-attribute predictionknowledge transferattention modelDPC-Captions datasetAMAN network

0 comments

The pith

Aesthetic assessment of images now predicts both captions and numerical scores for five separate attributes using a single network trained on mixed labeled data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces aesthetic attributes assessment as a new task that generates text captions describing up to five qualities of an image while also assigning a numerical score to each. It creates the large DPC-Captions dataset by transferring labels from the smaller fully annotated PCCD set, then trains the Aesthetic Multi-Attribute Network on the combined data. The network combines transfer learning with an attention mechanism to handle both fully and weakly labeled examples in one framework. Experiments show the model produces attribute captions and scores together and beats standard CNN-LSTM and SCA-CNN captioning baselines on standard image-caption metrics.

Core claim

Aesthetic Attributes Assessment predicts captions of five aesthetic attributes together with a numerical score for each attribute; the AMAN model, trained on a mixture of the small fully-annotated PCCD dataset and the large weakly-annotated DPC-Captions dataset obtained via knowledge transfer, jointly performs both tasks and outperforms traditional CNN-LSTM and modern SCA-CNN models on image caption evaluation criteria.

What carries the argument

Aesthetic Multi-Attribute Network (AMAN), a single framework that applies transfer learning and attention to predict multiple attribute captions and scores from mixed fully and weakly labeled image data.

If this is right

Images receive richer feedback than a single overall score, with separate text and numeric output for each of five attributes.
Large-scale weakly labeled data can be leveraged without requiring full annotation of every image.
Attention inside the network focuses on relevant image regions when generating each attribute's caption and score.
The same architecture handles both caption generation and regression in one forward pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tools that edit photos could use the per-attribute outputs to suggest targeted changes such as adjusting composition or color balance.
The approach may extend to other visual domains where both descriptive text and quantitative ratings are desired, such as product photography or architectural images.
If the transferred labels contain hidden biases, downstream applications risk amplifying those biases in generated captions.

Load-bearing premise

The knowledge transfer process that builds the large DPC-Captions dataset from the small PCCD dataset preserves the intended aesthetic attribute labels without adding substantial noise or bias.

What would settle it

Run the trained AMAN model on the held-out PCCD images and compare its generated attribute captions against human-written ones using BLEU, METEOR, or CIDEr; if scores fall below the CNN-LSTM baseline or if the predicted numerical scores deviate systematically from the original PCCD ground-truth ratings, the joint prediction claim does not hold.

Figures

Figures reproduced from arXiv: 1907.04983 by Bin Zhou, Dongqing Zou, Geng Zhao, Le Wu, Shiming Ge, Xiaodong Li, Xiaokun Zhang, Xinghui Zhou, Xin Jin.

**Figure 1.** Figure 1: Aesthetic Attributes Assessment of Images. We predict caption and score of each aesthetic attribute of an image. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: The knowledge transfer method from PCCD to our DPC-Captions. The PCCD dataset includes 7 aesthetic attributes [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Aesthetic Multi-Attribute Network (AMAN) contains Multi-Attribute Feature Network (MAFN), Channel and Spatial [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The results of aesthetic multi-attribute network on DPC-Captions dataset. The predicted captions and score each [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Image aesthetic quality assessment has been a relatively hot topic during the last decade. Most recently, comments type assessment (aesthetic captions) has been proposed to describe the general aesthetic impression of an image using text. In this paper, we propose Aesthetic Attributes Assessment of Images, which means the aesthetic attributes captioning. This is a new formula of image aesthetic assessment, which predicts aesthetic attributes captions together with the aesthetic score of each attribute. We introduce a new dataset named \emph{DPC-Captions} which contains comments of up to 5 aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset. Then, we propose Aesthetic Multi-Attribute Network (AMAN), which is trained on a mixture of fully-annotated small-scale PCCD dataset and weakly-annotated large-scale DPC-Captions dataset. Our AMAN makes full use of transfer learning and attention model in a single framework. The experimental results on our DPC-Captions and PCCD dataset reveal that our method can predict captions of 5 aesthetic attributes together with numerical score assessment of each attribute. We use the evaluation criteria used in image captions to prove that our specially designed AMAN model outperforms traditional CNN-LSTM model and modern SCA-CNN model of image captions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a joint caption-plus-score task for five aesthetic attributes and builds DPC-Captions via transfer from PCCD, but supplies no check on label quality in the transferred data.

read the letter

The core move is to treat aesthetic assessment as simultaneous caption generation and per-attribute scoring for five properties, trained on a mix of the small fully-labeled PCCD set and the larger DPC-Captions set produced by knowledge transfer. The AMAN model simply stacks standard transfer learning with attention, which is not novel in itself but is applied to this combined output format. That framing is the actual addition; prior work had either scores or general captions, not both tied to the same five attributes. The experiments show the model beating CNN-LSTM and SCA-CNN on BLEU, METEOR, etc., which is consistent with the architecture but does not demonstrate that those metrics are the right ones for aesthetic attributes. The main gap is exactly the one flagged in the stress test: there is no reported agreement study, noise analysis, or held-out human check on whether the transferred labels in DPC-Captions preserve the original attribute semantics. Without that, any claimed improvement rests on an unverified training signal. The paper also gives no error bars, significance tests, or ablation on the transfer step itself. This is a narrow but coherent extension of captioning methods into a specialized domain. It will mainly interest people already working on photo curation or aesthetic tools who need multi-attribute text output. The thinking is straightforward and the citations are appropriate for the baselines used. I would send it to review so the authors can address the label-validation question and justify the metrics; the work is solid enough on its own terms to deserve that step rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The paper introduces aesthetic attributes assessment as a new formulation of image aesthetic quality assessment: jointly predicting captions for up to five aesthetic attributes and a numerical score for each attribute. It constructs the large-scale DPC-Captions dataset via knowledge transfer from the small fully-annotated PCCD dataset, proposes the Aesthetic Multi-Attribute Network (AMAN) trained on the mixture of fully- and weakly-annotated data, and reports that AMAN outperforms CNN-LSTM and SCA-CNN baselines on standard image-caption metrics.

Significance. If the transferred labels prove reliable, the joint caption-plus-score formulation and the use of attention within a transfer-learning framework could support more fine-grained, multi-attribute aesthetic analysis at scale. The approach directly addresses the data scarcity problem common in aesthetic assessment by leveraging weak supervision.

major comments (3)

[Dataset section (DPC-Captions construction)] The construction and use of DPC-Captions is load-bearing for all training and evaluation claims, yet the manuscript provides no quantitative validation (agreement metrics, noise analysis, or held-out human verification) that the knowledge-transfer process preserves the original five-attribute semantics without systematic bias or label noise.
[Experiments section] The experimental claims of outperformance rest on standard caption metrics, but the manuscript supplies no statistical significance tests, error bars, cross-validation details, or ablation studies on the contribution of the attention or transfer components.
[Evaluation criteria paragraph] Suitability of BLEU/CIDEr-style metrics for evaluating aesthetic-attribute captions (as opposed to general scene descriptions) is not justified or compared against attribute-specific alternatives.

minor comments (2)

[Model section] Notation for the five attributes and the precise form of the joint loss (caption + score) should be defined explicitly with equations rather than prose.
[Tables and figures] Figure captions and table headers should clarify whether reported scores are on the PCCD test set, DPC-Captions, or both.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: The construction and use of DPC-Captions is load-bearing for all training and evaluation claims, yet the manuscript provides no quantitative validation (agreement metrics, noise analysis, or held-out human verification) that the knowledge-transfer process preserves the original five-attribute semantics without systematic bias or label noise.

Authors: We agree that quantitative validation of the knowledge-transfer process used to construct DPC-Captions would strengthen the claims. In the revised manuscript we will add agreement metrics (e.g., Cohen's kappa on a held-out subset), a noise analysis comparing transferred labels to the original PCCD annotations, and a brief discussion of potential semantic drift across the five attributes. revision: yes
Referee: The experimental claims of outperformance rest on standard caption metrics, but the manuscript supplies no statistical significance tests, error bars, cross-validation details, or ablation studies on the contribution of the attention or transfer components.

Authors: We acknowledge the need for greater statistical rigor. The revised version will include paired statistical significance tests on the reported metrics, error bars computed over multiple random seeds, details of the train/validation splits, and ablation studies isolating the attention mechanism and the mixed fully/weakly supervised training regime. revision: yes
Referee: Suitability of BLEU/CIDEr-style metrics for evaluating aesthetic-attribute captions (as opposed to general scene descriptions) is not justified or compared against attribute-specific alternatives.

Authors: We will add an explicit justification paragraph noting that BLEU and CIDEr remain the de-facto standards for comparing against the CNN-LSTM and SCA-CNN baselines in the captioning literature; we will also discuss their limitations for attribute-specific text and note that attribute-specific alternatives (e.g., attribute-wise precision) could be explored in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; conventional supervised training on external datasets

full rationale

The paper's core claims rest on training the AMAN model via standard supervised learning on the PCCD dataset and the derived DPC-Captions dataset, then evaluating with conventional image captioning metrics against baselines like CNN-LSTM and SCA-CNN. No equations, predictions, or uniqueness claims reduce reported performance to quantities defined by the fitted parameters themselves. The knowledge transfer step for creating DPC-Captions is a preprocessing choice whose fidelity is an external validity concern rather than a self-referential derivation. Any self-citations to prior dataset work are not load-bearing for the performance results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim depends on the correctness of the knowledge-transfer labeling process for DPC-Captions and on the assumption that standard captioning metrics are appropriate proxies for aesthetic attribute quality. No explicit free parameters or invented physical entities are introduced beyond the new dataset and model architecture.

invented entities (1)

DPC-Captions dataset no independent evidence
purpose: Large-scale weakly annotated training data for the five aesthetic attributes
Constructed by knowledge transfer from the smaller PCCD dataset; no independent verification of label quality is described.

pith-pipeline@v0.9.0 · 5768 in / 1182 out tokens · 19873 ms · 2026-05-24T23:31:53.817051+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose Aesthetic Multi-Attribute Network (AMAN), which is trained on a mixture of fully-annotated small-scale PCCD dataset and weakly-annotated large-scale DPC-Captions dataset... channel and spatial attention network, and language generation network.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DPC-Captions which contains comments of up to 5 aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. SPICE: Semantic Propositional Image Caption Evaluation. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V (Lecture Notes in Computer Science) , Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.), V...

work page doi:10.1007/978-3-319-46454-1_24 2016
[2]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018
[3]

Jyoti Aneja, Aditya Deshpande, and Alexander G. Schwing. 2018. Convolu- tional Image Captioning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018
[4]

Kuang-Yu Chang, Kung-Hung Lu, and Chu-Song Chen. 2017. Aesthetic Critiques Generation for Photos. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017 . IEEE Computer Society, 3534–3543. https://doi.org/10.1109/ICCV.2017.380

work page doi:10.1109/iccv.2017.380 2017
[5]

Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, and Jinsong Su. 2018. GroupCap: Group-Based Image Captioning With Structured Relevance and Di- versity Constraints. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018
[6]

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. SCA-CNN: Spatial and Channel-Wise Attention in Convo- lutional Networks for Image Captioning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 . 6298–6306. https://doi.org/10.1109/CVPR.2017.667

work page doi:10.1109/cvpr.2017.667 2017
[7]

Xiaowu Chen, Xin Jin, Hongyu Wu, and Qinping Zhao. 2015. Learning Templates for Artistic Portrait Lighting Analysis. IEEE Trans. Image Processing 24, 2 (2015), 608–618

work page 2015
[8]

C. Cui, H. Liu, T. Lian, L. Nie, L. Zhu, and Y. Yin. 2018. Distribution-oriented Aesthetics Assessment with Semantic-Aware Hybrid Network. IEEE Transactions on Multimedia (2018), 1–1. https://doi.org/10.1109/TMM.2018.2875357

work page doi:10.1109/tmm.2018.2875357 2018
[9]

Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assess- ment: An experimental survey. IEEE Signal Process. Mag. 34, 4 (2017), 80–106. https://doi.org/10.1109/MSP.2017.2696576

work page doi:10.1109/msp.2017.2696576 2017
[10]

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 677–691. https://doi.org/10.1109/TPAMI. 2016.2599174

work page doi:10.1109/tpami 2017
[11]

Zhe Dong and Xinmei Tian. 2015. Multi-level photo quality assessment with multi-view features. Neurocomputing 168 (2015), 308–319

work page 2015
[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778

work page 2016
[13]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger

work page
[14]

In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017

Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. 2261–2269. https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017
[15]

X. Jin, J. Chi, S. Peng, Y. Tian, C. Ye, and X. Li. 2016. Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer. In The 8th International Conference on Wireless Communications and Signal Processing (WCSP). 1–6

work page 2016
[16]

Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, and Geng Zhao. 2018. Predicting Aesthetic Score Distribution Through Cumulative Jensen-Shannon Divergence. In Proceedings of the Thirty- Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018. https://www.aaai.org/ocs/in...

work page 2018
[17]

Xin Jin, Mingtian Zhao, Xiaowu Chen, Qinping Zhao, and Song Chun Zhu. 2010. Learning Artistic Lighting Template from Portrait Photographs. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV . 101–114

work page 2010
[18]

Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep Aesthetic Quality Assessment With Semantic Information. IEEE Trans. Image Processing 26, 3 (2017), 1482–1495. https://doi.org/10.1109/TIP.2017.2651399

work page doi:10.1109/tip.2017.2651399 2017
[19]

Yueying Kao, Kaiqi Huang, and Steve J. Maybank. 2016. Hierarchical aesthetic quality assessment using deep convolutional neural networks. Sig. Proc.: Image Comm. 47 (2016), 500–510. https://doi.org/10.1016/j.image.2016.05.004

work page doi:10.1016/j.image.2016.05.004 2016
[20]

Andrej Karpathy and Li Fei-Fei. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 664–676. https://doi.org/10.1109/TPAMI.2016.2598339

work page doi:10.1109/tpami.2016.2598339 2017
[21]

Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In European Conference on Computer Vision (ECCV)

work page 2016
[22]

Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Zijun Wang. 2014. RAPID: Rating Pictorial Aesthetics using Deep Learning. In Proceedings of the ACM International Conference on Multimedia, MM’14, Orlando, FL, USA, November 03 - 07, 2014. 457–466

work page 2014
[23]

Ruotian Luo, Brian Price, Scott Cohen, and Gregory Shakhnarovich. 2018. Dis- criminability Objective for Training Descriptive Captions. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018
[24]

Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-Lamp: Adaptive Layout- Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment. In CVPR. IEEE Computer Society, 722–731

work page 2017
[25]

Long Mai, Hailin Jin, and Feng Liu. 2016. Composition-Preserving Deep Photo Aesthetics Assessment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016
[26]

Yuille, and Kevin Murphy

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, and Kevin Murphy. 2016. Generation and Comprehension of Unambiguous Object Descriptions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 . 11–20. https: //doi.org/10.1109/CVPR.2016.9

work page doi:10.1109/cvpr.2016.9 2016
[27]

Alexander Mathews, Lexing Xie, and Xuming He. 2018. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018
[28]

Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012 . 2408–2415

work page 2012
[29]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural Image Assessment. IEEE Trans. Image Processing 27, 8 (2018), 3998–4011. https://doi.org/10.1109/TIP. 2018.2831899

work page doi:10.1109/tip 2018
[30]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 . 3156–3164. https://doi.org/10.1109/CVPR.2015.7298935

work page doi:10.1109/cvpr.2015.7298935 2015
[31]

Wenshan Wang, Su Yang, Weishan Zhang, and Jiulong Zhang. 2018. Neural Aesthetic Image Reviewer. CoRR abs/1802.10240 (2018). arXiv:1802.10240 http: //arxiv.org/abs/1802.10240

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Weining Wang, Mingquan Zhao, Li Wang, Jiexiong Huang, Chengjia Cai, and Xiangmin Xu. 2016. A multi-scene deep learning model for image aesthetic evaluation. Sig. Proc.: Image Comm. 47 (2016), 511–518

work page 2016
[33]

Ye Zhou, Xin Lu, Junping Zhang, and James Z. Wang. 2016. Joint Image and Text Representation for Aesthetics Analysis. InProceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016, Alan Hanjalic, Cees Snoek, Marcel Worring, Dick C. A. Bulterman, Benoit Huet, Aisling Kelliher, Yiannis Kompatsiar...

work page doi:10.1145/2964284.2967223 2016

[1] [1]

Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. SPICE: Semantic Propositional Image Caption Evaluation. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V (Lecture Notes in Computer Science) , Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.), V...

work page doi:10.1007/978-3-319-46454-1_24 2016

[2] [2]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018

[3] [3]

Jyoti Aneja, Aditya Deshpande, and Alexander G. Schwing. 2018. Convolu- tional Image Captioning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018

[4] [4]

Kuang-Yu Chang, Kung-Hung Lu, and Chu-Song Chen. 2017. Aesthetic Critiques Generation for Photos. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017 . IEEE Computer Society, 3534–3543. https://doi.org/10.1109/ICCV.2017.380

work page doi:10.1109/iccv.2017.380 2017

[5] [5]

Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, and Jinsong Su. 2018. GroupCap: Group-Based Image Captioning With Structured Relevance and Di- versity Constraints. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018

[6] [6]

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. SCA-CNN: Spatial and Channel-Wise Attention in Convo- lutional Networks for Image Captioning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 . 6298–6306. https://doi.org/10.1109/CVPR.2017.667

work page doi:10.1109/cvpr.2017.667 2017

[7] [7]

Xiaowu Chen, Xin Jin, Hongyu Wu, and Qinping Zhao. 2015. Learning Templates for Artistic Portrait Lighting Analysis. IEEE Trans. Image Processing 24, 2 (2015), 608–618

work page 2015

[8] [8]

C. Cui, H. Liu, T. Lian, L. Nie, L. Zhu, and Y. Yin. 2018. Distribution-oriented Aesthetics Assessment with Semantic-Aware Hybrid Network. IEEE Transactions on Multimedia (2018), 1–1. https://doi.org/10.1109/TMM.2018.2875357

work page doi:10.1109/tmm.2018.2875357 2018

[9] [9]

Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assess- ment: An experimental survey. IEEE Signal Process. Mag. 34, 4 (2017), 80–106. https://doi.org/10.1109/MSP.2017.2696576

work page doi:10.1109/msp.2017.2696576 2017

[10] [10]

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 677–691. https://doi.org/10.1109/TPAMI. 2016.2599174

work page doi:10.1109/tpami 2017

[11] [11]

Zhe Dong and Xinmei Tian. 2015. Multi-level photo quality assessment with multi-view features. Neurocomputing 168 (2015), 308–319

work page 2015

[12] [12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770–778

work page 2016

[13] [13]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger

work page

[14] [14]

In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017

Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. 2261–2269. https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017

[15] [15]

X. Jin, J. Chi, S. Peng, Y. Tian, C. Ye, and X. Li. 2016. Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer. In The 8th International Conference on Wireless Communications and Signal Processing (WCSP). 1–6

work page 2016

[16] [16]

Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, and Geng Zhao. 2018. Predicting Aesthetic Score Distribution Through Cumulative Jensen-Shannon Divergence. In Proceedings of the Thirty- Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018. https://www.aaai.org/ocs/in...

work page 2018

[17] [17]

Xin Jin, Mingtian Zhao, Xiaowu Chen, Qinping Zhao, and Song Chun Zhu. 2010. Learning Artistic Lighting Template from Portrait Photographs. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV . 101–114

work page 2010

[18] [18]

Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep Aesthetic Quality Assessment With Semantic Information. IEEE Trans. Image Processing 26, 3 (2017), 1482–1495. https://doi.org/10.1109/TIP.2017.2651399

work page doi:10.1109/tip.2017.2651399 2017

[19] [19]

Yueying Kao, Kaiqi Huang, and Steve J. Maybank. 2016. Hierarchical aesthetic quality assessment using deep convolutional neural networks. Sig. Proc.: Image Comm. 47 (2016), 500–510. https://doi.org/10.1016/j.image.2016.05.004

work page doi:10.1016/j.image.2016.05.004 2016

[20] [20]

Andrej Karpathy and Li Fei-Fei. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 664–676. https://doi.org/10.1109/TPAMI.2016.2598339

work page doi:10.1109/tpami.2016.2598339 2017

[21] [21]

Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In European Conference on Computer Vision (ECCV)

work page 2016

[22] [22]

Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Zijun Wang. 2014. RAPID: Rating Pictorial Aesthetics using Deep Learning. In Proceedings of the ACM International Conference on Multimedia, MM’14, Orlando, FL, USA, November 03 - 07, 2014. 457–466

work page 2014

[23] [23]

Ruotian Luo, Brian Price, Scott Cohen, and Gregory Shakhnarovich. 2018. Dis- criminability Objective for Training Descriptive Captions. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018

[24] [24]

Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-Lamp: Adaptive Layout- Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment. In CVPR. IEEE Computer Society, 722–731

work page 2017

[25] [25]

Long Mai, Hailin Jin, and Feng Liu. 2016. Composition-Preserving Deep Photo Aesthetics Assessment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016

[26] [26]

Yuille, and Kevin Murphy

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, and Kevin Murphy. 2016. Generation and Comprehension of Unambiguous Object Descriptions. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 . 11–20. https: //doi.org/10.1109/CVPR.2016.9

work page doi:10.1109/cvpr.2016.9 2016

[27] [27]

Alexander Mathews, Lexing Xie, and Xuming He. 2018. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2018

[28] [28]

Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012 . 2408–2415

work page 2012

[29] [29]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural Image Assessment. IEEE Trans. Image Processing 27, 8 (2018), 3998–4011. https://doi.org/10.1109/TIP. 2018.2831899

work page doi:10.1109/tip 2018

[30] [30]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 . 3156–3164. https://doi.org/10.1109/CVPR.2015.7298935

work page doi:10.1109/cvpr.2015.7298935 2015

[31] [31]

Wenshan Wang, Su Yang, Weishan Zhang, and Jiulong Zhang. 2018. Neural Aesthetic Image Reviewer. CoRR abs/1802.10240 (2018). arXiv:1802.10240 http: //arxiv.org/abs/1802.10240

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

Weining Wang, Mingquan Zhao, Li Wang, Jiexiong Huang, Chengjia Cai, and Xiangmin Xu. 2016. A multi-scene deep learning model for image aesthetic evaluation. Sig. Proc.: Image Comm. 47 (2016), 511–518

work page 2016

[33] [33]

Ye Zhou, Xin Lu, Junping Zhang, and James Z. Wang. 2016. Joint Image and Text Representation for Aesthetics Analysis. InProceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016, Alan Hanjalic, Cees Snoek, Marcel Worring, Dick C. A. Bulterman, Benoit Huet, Aisling Kelliher, Yiannis Kompatsiar...

work page doi:10.1145/2964284.2967223 2016