Blind Image Quality Assessment Using A Deep Bilinear Convolutional Neural Network

Dexiang Deng; Jia Yan; Kede Ma; Weixia Zhang; Zhou Wang

arxiv: 1907.02665 · v1 · pith:SDLZA43Cnew · submitted 2019-07-05 · 📡 eess.IV · cs.CV· cs.MM

Blind Image Quality Assessment Using A Deep Bilinear Convolutional Neural Network

Weixia Zhang , Kede Ma , Jia Yan , Dexiang Deng , Zhou Wang This is my paper

Pith reviewed 2026-05-25 02:21 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.MM

keywords blind image quality assessmentbilinear poolingconvolutional neural networkssynthetic distortionsauthentic distortionsfine-tuningdeep learning

0 comments

The pith

A bilinear pooling of features from two pre-trained CNNs enables superior blind image quality prediction for both synthetic and authentic distortions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a model with one CNN pre-trained to classify synthetic distortion types and levels and a second pre-trained CNN for general image classification to handle authentic distortions. Features from both networks are combined through bilinear pooling into a single representation used for quality score prediction. The full model is then fine-tuned on subject-rated image databases using stochastic gradient descent. This design targets the challenge of building one system that works across distortion categories that differ in their statistics and generation processes. If the approach holds, it would allow more reliable automated quality scoring without needing a clean reference image in applications from consumer photography to medical imaging.

Core claim

The central claim is that bilinear pooling of features from a distortion-classification CNN and an image-classification CNN produces a unified representation that, after fine-tuning on target databases, yields higher correlation with human quality judgments than prior blind image quality methods on both synthetic and authentic distortion sets, with additional verification of generalizability on the Waterloo Exploration Database via group maximum differentiation competition.

What carries the argument

Bilinear pooling of feature maps from two separate convolutional neural networks, one pre-trained on large-scale synthetic distortion classification and the other on image classification, to form a quality prediction representation.

Load-bearing premise

The features from the two pre-trained CNNs remain informative enough for accurate quality prediction once combined by bilinear pooling and adjusted on human-rated images.

What would settle it

A head-to-head evaluation in which the model fails to exceed the best competing blind methods on correlation metrics across the LIVE, TID2013, CSIQ, and LIVE Challenge databases would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 1907.02665 by Dexiang Deng, Jia Yan, Kede Ma, Weixia Zhang, Zhou Wang.

**Figure 1.** Figure 1: Sample distorted images synthesized from a reference image in the Waterloo Exploration Database [18]. (a) Gaussian blur. (b) White Gaussian noise. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the five new distortion types with increasing degradation levels from left to right. (a)-(e) Contrast stretching. (f)-(j) Pink noise. (k)-(o) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The architecture of S-CNN for synthetic distortions. We follow the style and convention in [2], and denote the parameterization of the convolution [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The structure of the proposed DB-CNN. can be computed by ∂` ∂Y1 = Y2 ∂` ∂B T (6) and ∂` ∂Y2 = Y1 ∂` ∂B . (7) Bilinear pooling summarizes the spatial information and enables DB-CNN to accept an input image of arbitrary size. As a result, we can feed the whole image directly instead of patches cropped from it to DB-CNN during both training and testing. IV. EXPERIMENTS In this section, we first descri… view at source ↗

**Figure 5.** Figure 5: Images with different distortion types may share similar visual appearances. (a) Additive Gaussian noise. (b) Additive noise in color components. (c) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: gMAD competition results between DB-CNN and deepIQA [24]. (a) Fixed deepIQA at the low-quality level. (b) Fixed deepIQA at the high-quality [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: gMAD competition results between DB-CNN and MEON [10]. (a) Fixed MEON at the low-quality level. (b) Fixed MEON at the high-quality level. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

We propose a deep bilinear model for blind image quality assessment (BIQA) that handles both synthetic and authentic distortions. Our model consists of two convolutional neural networks (CNN), each of which specializes in one distortion scenario. For synthetic distortions, we pre-train a CNN to classify image distortion type and level, where we enjoy large-scale training data. For authentic distortions, we adopt a pre-trained CNN for image classification. The features from the two CNNs are pooled bilinearly into a unified representation for final quality prediction. We then fine-tune the entire model on target subject-rated databases using a variant of stochastic gradient descent. Extensive experiments demonstrate that the proposed model achieves superior performance on both synthetic and authentic databases. Furthermore, we verify the generalizability of our method on the Waterloo Exploration Database using the group maximum differentiation competition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The bilinear model fuses a distortion-classification CNN with an ImageNet CNN via pooling to handle both synthetic and authentic BIQA in one architecture, and the experiments back the performance claim without load-bearing flaws.

read the letter

The main thing here is a bilinear CNN that pairs a network pre-trained to classify distortion type and level with a standard image-classification CNN, pools their features bilinearly, and fine-tunes the whole thing on subject-rated data. This gives one model that works on both synthetic and authentic distortions instead of needing separate pipelines. The approach is new in its specific pairing and pre-training split for this task. It does well by using large-scale data where it exists for the distortion branch and leveraging an off-the-shelf ImageNet model for the authentic branch, then showing results across standard databases plus the Waterloo Exploration set with the group maximum differentiation competition. That last part adds a useful check on generalizability. The soft spots are minor. The superiority claim depends on the fine-tuning step and the choice of the two base networks, so the gains could shrink if those are swapped; the paper does not appear to include an ablation that isolates the bilinear pooling itself from just running two CNNs. No data leakage, circular fitting, or missing controls show up in the setup. The math is standard transfer learning plus bilinear pooling, which is reproducible. This paper is for people building or benchmarking BIQA systems who want a single trainable model for mixed distortion types. A reader who needs a practical architecture with broad database coverage will get value from it. It deserves a serious referee because the method is straightforward, the evaluation is on relevant benchmarks, and the central claim holds up on the evidence presented.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a deep bilinear convolutional neural network for blind image quality assessment (BIQA) handling both synthetic and authentic distortions. One CNN is pre-trained to classify distortion type and level on large-scale synthetic data; a second uses a pre-trained image-classification CNN for authentic distortions. Features from both are combined via bilinear pooling into a unified representation, after which the full model is fine-tuned on subject-rated databases via a variant of SGD. The central claim is that this architecture achieves superior performance on both synthetic and authentic databases, with additional verification of generalizability on the Waterloo Exploration Database via the group maximum differentiation competition.

Significance. If the reported empirical superiority holds under proper controls, the work supplies a unified BIQA framework that avoids separate pipelines for synthetic versus authentic distortions. The combination of large-scale pre-training, bilinear pooling, and targeted fine-tuning is a direct and testable extension of transfer-learning ideas already common in the field. The explicit use of the Waterloo Exploration Database and group maximum differentiation competition provides an independent check on generalizability that is stronger than cross-database testing alone.

minor comments (2)

[Abstract] Abstract: the claim of 'superior performance' is stated without any numerical values, baseline names, or dataset sizes. Adding the key SRCC/PLCC figures and the main competing methods would make the abstract self-contained and allow readers to assess the strength of the central claim immediately.
The bilinear pooling step is described only at a high level; a short equation or explicit reference to the original bilinear CNN formulation (e.g., the outer-product operation and subsequent pooling) would clarify exactly how the two feature streams are fused.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report accurately reflects the paper's contributions and we will incorporate any minor suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical architecture for blind image quality assessment: two pre-trained CNNs (one for distortion classification on synthetic data, one for image classification on authentic data), bilinear pooling of their features, and fine-tuning via SGD on subject-rated databases. The central claim is experimental superiority on held-out test sets from synthetic and authentic databases, plus a generalizability check on Waterloo Exploration. No derivation chain exists; there are no equations defining a quantity in terms of itself, no parameters fitted to a subset then renamed as predictions of a related quantity, and no load-bearing self-citations to uniqueness theorems or ansatzes. The method is self-contained transfer learning whose performance claims rest on external data splits and standard evaluation protocols rather than internal redefinition.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger records the explicit modeling choices described there. The central claim depends on the transferability of the two pre-trained feature extractors and on the effectiveness of bilinear pooling for quality regression.

free parameters (1)

fine-tuning hyperparameters
A variant of stochastic gradient descent is used to fine-tune the entire model on target databases; specific learning rates, batch sizes, and stopping criteria are not stated.

axioms (2)

domain assumption Features from a CNN pre-trained on distortion-type and level classification remain useful for perceptual quality prediction after bilinear pooling.
The model architecture is built on this transfer assumption.
domain assumption Features from a CNN pre-trained on image classification remain useful for perceptual quality prediction after bilinear pooling.
The model architecture is built on this transfer assumption.

pith-pipeline@v0.9.0 · 5675 in / 1324 out tokens · 24250 ms · 2026-05-25T02:21:55.282182+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

[1]

A. C. Bovik, Handbook of Image and Video Processing . Academic Press, 2010

work page 2010
[2]

End-to-end optimized image compression,

J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” CoRR, vol. abs/1611.01704, 2016. [Online]. Available: http://arxiv.org/abs/1611.01704

work page arXiv 2016
[3]

Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,

Z. Duanmu, K. Ma, and Z. Wang, “Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,” in ACM Multime- dia, 2017, pp. 1752–1760

work page 2017
[4]

Wang and A

Z. Wang and A. C. Bovik, Modern Image Quality Assessment . Morgan & Claypool, 2006

work page 2006
[5]

Display device-adapted video quality-of-experience assessment,

A. Rehman, K. Zeng, and Z. Wang, “Display device-adapted video quality-of-experience assessment,” in Human Vision and Electronic Imaging, 2015, pp. 1–11

work page 2015
[6]

Reduced-and no-reference image quality assessment: The natural scene statistic model approach,

Z. Wang and A. C. Bovik, “Reduced-and no-reference image quality assessment: The natural scene statistic model approach,” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 29–40, Nov. 2011

work page 2011
[7]

No-reference image quality assessment in the spatial domain,

A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, Dec. 2012

work page 2012
[8]

Unsupervised feature learning framework for no-reference image quality assessment,

P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2012, pp. 1098–1105

work page 2012
[9]

Convolutional neural networks for no-reference image quality assessment,

L. Kang, P. Ye, Y . Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1733–1740

work page 2014
[10]

End-to- end blind image quality assessment using deep neural networks,

K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- end blind image quality assessment using deep neural networks,” IEEE Transactions on Image Processing , vol. 27, no. 3, pp. 1202–1213, Mar. 2018

work page 2018
[11]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition , 2009, pp. 248–255

work page 2009
[12]

Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,

J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, and A. C. Bovik, “Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,” IEEE Signal Processing Magazine , vol. 34, no. 6, pp. 130–141, Nov. 2017

work page 2017
[13]

Massive online crowdsourced study of subjective and objective picture quality,

D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, Jan. 2016

work page 2016
[14]

A statistical evaluation of recent full reference image quality assessment algorithms,

H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, Nov. 2006

work page 2006
[15]

Image database TID2013: Peculiarities, results and perspectives,

N. Ponomarenko, L. Jin, O. Ieremeiev, V . Lukin, K. Egiazarian, J. Astola, B. V ozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. J. Kuo, “Image database TID2013: Peculiarities, results and perspectives,” Signal Pro- cessing: Image Communication , vol. 30, pp. 57–77, Jan. 2015

work page 2015
[16]

Fully deep blind image quality predictor,

J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE Journal of Selected Topics in Signal Processing , vol. 11, no. 1, pp. 206–220, Feb. 2017

work page 2017
[17]

Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,

L. Kang, P. Ye, Y . Li, and D. Doermann, “Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,” in IEEE International Conference on Image Processing , 2015, pp. 2791–2795

work page 2015
[18]

Waterloo Exploration Database: New challenges for image quality assessment models,

K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang, “Waterloo Exploration Database: New challenges for image quality assessment models,” IEEE Transactions on Image Processing , vol. 26, no. 2, pp. 1004–1016, Feb. 2017

work page 2017
[19]

The Pascal Visual Object Classes (VOC) Challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zis- serman, “The Pascal Visual Object Classes (VOC) Challenge,” Interna- tional Journal of Computer Vision , vol. 88, no. 2, pp. 303–338, Jun. 2010

work page 2010
[20]

Perceptual quality prediction on authentically distorted images using a bag of features approach,

D. Ghadiyaram and A. C. Bovik, “Perceptual quality prediction on authentically distorted images using a bag of features approach,” Journal of Vision, vol. 17, no. 1, pp. 32–32, Jan. 2017. 11

work page 2017
[21]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015

work page 2015
[22]

Bilinear CNN models for ﬁne-grained visual recognition,

T.-Y . Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for ﬁne-grained visual recognition,” in IEEE International Conference on Computer Vision, 2015, pp. 1449–1457

work page 2015
[23]

Group MAD competition − a new methodology to compare objective image quality models,

K. Ma, Q. Wu, Z. Wang, Z. Duanmu, H. Yong, H. Li, and L. Zhang, “Group MAD competition − a new methodology to compare objective image quality models,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1664–1673

work page 2016
[24]

Deep neural networks for no-reference and full-reference image quality as- sessment,

S. Bosse, D. Maniry, K. R. Mller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality as- sessment,” IEEE Transactions on Image Processing , vol. 27, no. 1, pp. 206–219, Jan. 2018

work page 2018
[25]

dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,

K. Ma, W. Liu, T. Liu, Z. Wang, and D. Tao, “dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,” IEEE Transactions on Image Processing , vol. 26, no. 8, pp. 3951–3964, Aug. 2017

work page 2017
[26]

Blind image quality assessment using semi-supervised rectiﬁer networks,

H. Tang, N. Joshi, and A. Kapoor, “Blind image quality assessment using semi-supervised rectiﬁer networks,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 2877–2884

work page 2014
[27]

On the Use of Deep Learning for Blind Image Quality Assessment

S. Bianco, L. Celona, P. Napoletano, and R. Schettini, “On the use of deep learning for blind image quality assessment,” CoRR, vol. abs/1602.05531, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

FSIM: A feature similarity index for image quality assessment,

L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug. 2011

work page 2011
[29]

Perceptual quality assessment for multi-exposure image fusion,

K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Transactions on Image Processing , vol. 24, no. 11, pp. 3345–3356, Nov. 2015

work page 2015
[30]

Separating style and content,

J. B. Tenenbaum and W. T. Freeman, “Separating style and content,” in Advances in Neural Information Processing Systems , 1997, pp. 662–668

work page 1997
[31]

Two-stream convolutional networks for action recognition in videos,

K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014, pp. 568–576

work page 2014
[32]

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach, “Multimodal compact bilinear pooling for visual question answering and visual grounding,” CoRR, vol. abs/1606.01847, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

work page 2016
[34]

A Riemannian framework for tensor computing,

X. Pennec, P. Fillard, and N. Ayache, “A Riemannian framework for tensor computing,” International Journal of Computer Vision , vol. 66, no. 1, pp. 41–66, Jan. 2006

work page 2006
[35]

Most apparent distortion: Full- reference image quality assessment and the role of strategy,

E. C. Larson and D. M. Chandler, “Most apparent distortion: Full- reference image quality assessment and the role of strategy,” Journal of Electronic Imaging , vol. 19, no. 1, pp. 1–21, Jan. 2010

work page 2010
[36]

Objective quality assessment of multiply distorted images,

D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik, “Objective quality assessment of multiply distorted images,” in Signals, Systems and Computers , 2013, pp. 1693–1697

work page 2013
[37]

Final report from the video quality experts group on the validation of objective models of video quality assessment,

VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment,” 2000. [Online]. Available: http://www.vqeg.org

work page 2000
[38]

Delving deep into rectiﬁers: Surpassing human-level performance on ImageNet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on ImageNet classiﬁcation,” in IEEE International Conference on Computer Vision , 2015, pp. 1026– 1034

work page 2015
[39]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[40]

Batch normalization: Accelerating deep network training by reducing internal covariate shift,

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning , 2015, pp. 448–456

work page 2015
[41]

MatConvNet: Convolutional neural networks for Matlab,

A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks for Matlab,” in ACM International Conference on Multimedia , 2015, pp. 689–692

work page 2015
[42]

Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,

W. Xue, X. Mou, L. Zhang, A. C. Bovik, and X. Feng, “Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,” IEEE Transactions on Image Processing , vol. 23, no. 11, pp. 4850–4862, Nov. 2014

work page 2014
[43]

Blind image quality assessment based on high order statistics aggregation,

J. Xu, P. Ye, Q. Li, H. Du, Y . Liu, and D. Doermann, “Blind image quality assessment based on high order statistics aggregation,” IEEE Transactions on Image Processing , vol. 25, no. 9, pp. 4444–4457, Sep. 2016

work page 2016
[44]

Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,

Z. Wang and E. P. Simoncelli, “Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,” Journal of Vision , vol. 8, no. 12, pp. 8.1–8.13, Sep. 2008

work page 2008
[45]

Compact bilinear pool- ing,

Y . Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pool- ing,” in IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 317–326. Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY , USA, in 2013...

work page 2016

[1] [1]

A. C. Bovik, Handbook of Image and Video Processing . Academic Press, 2010

work page 2010

[2] [2]

End-to-end optimized image compression,

J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” CoRR, vol. abs/1611.01704, 2016. [Online]. Available: http://arxiv.org/abs/1611.01704

work page arXiv 2016

[3] [3]

Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,

Z. Duanmu, K. Ma, and Z. Wang, “Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,” in ACM Multime- dia, 2017, pp. 1752–1760

work page 2017

[4] [4]

Wang and A

Z. Wang and A. C. Bovik, Modern Image Quality Assessment . Morgan & Claypool, 2006

work page 2006

[5] [5]

Display device-adapted video quality-of-experience assessment,

A. Rehman, K. Zeng, and Z. Wang, “Display device-adapted video quality-of-experience assessment,” in Human Vision and Electronic Imaging, 2015, pp. 1–11

work page 2015

[6] [6]

Reduced-and no-reference image quality assessment: The natural scene statistic model approach,

Z. Wang and A. C. Bovik, “Reduced-and no-reference image quality assessment: The natural scene statistic model approach,” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 29–40, Nov. 2011

work page 2011

[7] [7]

No-reference image quality assessment in the spatial domain,

A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, Dec. 2012

work page 2012

[8] [8]

Unsupervised feature learning framework for no-reference image quality assessment,

P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2012, pp. 1098–1105

work page 2012

[9] [9]

Convolutional neural networks for no-reference image quality assessment,

L. Kang, P. Ye, Y . Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1733–1740

work page 2014

[10] [10]

End-to- end blind image quality assessment using deep neural networks,

K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- end blind image quality assessment using deep neural networks,” IEEE Transactions on Image Processing , vol. 27, no. 3, pp. 1202–1213, Mar. 2018

work page 2018

[11] [11]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition , 2009, pp. 248–255

work page 2009

[12] [12]

Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,

J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, and A. C. Bovik, “Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,” IEEE Signal Processing Magazine , vol. 34, no. 6, pp. 130–141, Nov. 2017

work page 2017

[13] [13]

Massive online crowdsourced study of subjective and objective picture quality,

D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, Jan. 2016

work page 2016

[14] [14]

A statistical evaluation of recent full reference image quality assessment algorithms,

H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, Nov. 2006

work page 2006

[15] [15]

Image database TID2013: Peculiarities, results and perspectives,

N. Ponomarenko, L. Jin, O. Ieremeiev, V . Lukin, K. Egiazarian, J. Astola, B. V ozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. J. Kuo, “Image database TID2013: Peculiarities, results and perspectives,” Signal Pro- cessing: Image Communication , vol. 30, pp. 57–77, Jan. 2015

work page 2015

[16] [16]

Fully deep blind image quality predictor,

J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE Journal of Selected Topics in Signal Processing , vol. 11, no. 1, pp. 206–220, Feb. 2017

work page 2017

[17] [17]

Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,

L. Kang, P. Ye, Y . Li, and D. Doermann, “Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,” in IEEE International Conference on Image Processing , 2015, pp. 2791–2795

work page 2015

[18] [18]

Waterloo Exploration Database: New challenges for image quality assessment models,

K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang, “Waterloo Exploration Database: New challenges for image quality assessment models,” IEEE Transactions on Image Processing , vol. 26, no. 2, pp. 1004–1016, Feb. 2017

work page 2017

[19] [19]

The Pascal Visual Object Classes (VOC) Challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zis- serman, “The Pascal Visual Object Classes (VOC) Challenge,” Interna- tional Journal of Computer Vision , vol. 88, no. 2, pp. 303–338, Jun. 2010

work page 2010

[20] [20]

Perceptual quality prediction on authentically distorted images using a bag of features approach,

D. Ghadiyaram and A. C. Bovik, “Perceptual quality prediction on authentically distorted images using a bag of features approach,” Journal of Vision, vol. 17, no. 1, pp. 32–32, Jan. 2017. 11

work page 2017

[21] [21]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015

work page 2015

[22] [22]

Bilinear CNN models for ﬁne-grained visual recognition,

T.-Y . Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for ﬁne-grained visual recognition,” in IEEE International Conference on Computer Vision, 2015, pp. 1449–1457

work page 2015

[23] [23]

Group MAD competition − a new methodology to compare objective image quality models,

K. Ma, Q. Wu, Z. Wang, Z. Duanmu, H. Yong, H. Li, and L. Zhang, “Group MAD competition − a new methodology to compare objective image quality models,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1664–1673

work page 2016

[24] [24]

Deep neural networks for no-reference and full-reference image quality as- sessment,

S. Bosse, D. Maniry, K. R. Mller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality as- sessment,” IEEE Transactions on Image Processing , vol. 27, no. 1, pp. 206–219, Jan. 2018

work page 2018

[25] [25]

dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,

K. Ma, W. Liu, T. Liu, Z. Wang, and D. Tao, “dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,” IEEE Transactions on Image Processing , vol. 26, no. 8, pp. 3951–3964, Aug. 2017

work page 2017

[26] [26]

Blind image quality assessment using semi-supervised rectiﬁer networks,

H. Tang, N. Joshi, and A. Kapoor, “Blind image quality assessment using semi-supervised rectiﬁer networks,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 2877–2884

work page 2014

[27] [27]

On the Use of Deep Learning for Blind Image Quality Assessment

S. Bianco, L. Celona, P. Napoletano, and R. Schettini, “On the use of deep learning for blind image quality assessment,” CoRR, vol. abs/1602.05531, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[28] [28]

FSIM: A feature similarity index for image quality assessment,

L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug. 2011

work page 2011

[29] [29]

Perceptual quality assessment for multi-exposure image fusion,

K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Transactions on Image Processing , vol. 24, no. 11, pp. 3345–3356, Nov. 2015

work page 2015

[30] [30]

Separating style and content,

J. B. Tenenbaum and W. T. Freeman, “Separating style and content,” in Advances in Neural Information Processing Systems , 1997, pp. 662–668

work page 1997

[31] [31]

Two-stream convolutional networks for action recognition in videos,

K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014, pp. 568–576

work page 2014

[32] [32]

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach, “Multimodal compact bilinear pooling for visual question answering and visual grounding,” CoRR, vol. abs/1606.01847, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [33]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

work page 2016

[34] [34]

A Riemannian framework for tensor computing,

X. Pennec, P. Fillard, and N. Ayache, “A Riemannian framework for tensor computing,” International Journal of Computer Vision , vol. 66, no. 1, pp. 41–66, Jan. 2006

work page 2006

[35] [35]

Most apparent distortion: Full- reference image quality assessment and the role of strategy,

E. C. Larson and D. M. Chandler, “Most apparent distortion: Full- reference image quality assessment and the role of strategy,” Journal of Electronic Imaging , vol. 19, no. 1, pp. 1–21, Jan. 2010

work page 2010

[36] [36]

Objective quality assessment of multiply distorted images,

D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik, “Objective quality assessment of multiply distorted images,” in Signals, Systems and Computers , 2013, pp. 1693–1697

work page 2013

[37] [37]

Final report from the video quality experts group on the validation of objective models of video quality assessment,

VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment,” 2000. [Online]. Available: http://www.vqeg.org

work page 2000

[38] [38]

Delving deep into rectiﬁers: Surpassing human-level performance on ImageNet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on ImageNet classiﬁcation,” in IEEE International Conference on Computer Vision , 2015, pp. 1026– 1034

work page 2015

[39] [39]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[40] [40]

Batch normalization: Accelerating deep network training by reducing internal covariate shift,

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning , 2015, pp. 448–456

work page 2015

[41] [41]

MatConvNet: Convolutional neural networks for Matlab,

A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks for Matlab,” in ACM International Conference on Multimedia , 2015, pp. 689–692

work page 2015

[42] [42]

Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,

W. Xue, X. Mou, L. Zhang, A. C. Bovik, and X. Feng, “Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,” IEEE Transactions on Image Processing , vol. 23, no. 11, pp. 4850–4862, Nov. 2014

work page 2014

[43] [43]

Blind image quality assessment based on high order statistics aggregation,

J. Xu, P. Ye, Q. Li, H. Du, Y . Liu, and D. Doermann, “Blind image quality assessment based on high order statistics aggregation,” IEEE Transactions on Image Processing , vol. 25, no. 9, pp. 4444–4457, Sep. 2016

work page 2016

[44] [44]

Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,

Z. Wang and E. P. Simoncelli, “Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,” Journal of Vision , vol. 8, no. 12, pp. 8.1–8.13, Sep. 2008

work page 2008

[45] [45]

Compact bilinear pool- ing,

Y . Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pool- ing,” in IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 317–326. Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY , USA, in 2013...

work page 2016