pith. sign in

arxiv: 1907.02665 · v1 · pith:SDLZA43Cnew · submitted 2019-07-05 · 📡 eess.IV · cs.CV· cs.MM

Blind Image Quality Assessment Using A Deep Bilinear Convolutional Neural Network

Pith reviewed 2026-05-25 02:21 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.MM
keywords blind image quality assessmentbilinear poolingconvolutional neural networkssynthetic distortionsauthentic distortionsfine-tuningdeep learning
0
0 comments X

The pith

A bilinear pooling of features from two pre-trained CNNs enables superior blind image quality prediction for both synthetic and authentic distortions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a model with one CNN pre-trained to classify synthetic distortion types and levels and a second pre-trained CNN for general image classification to handle authentic distortions. Features from both networks are combined through bilinear pooling into a single representation used for quality score prediction. The full model is then fine-tuned on subject-rated image databases using stochastic gradient descent. This design targets the challenge of building one system that works across distortion categories that differ in their statistics and generation processes. If the approach holds, it would allow more reliable automated quality scoring without needing a clean reference image in applications from consumer photography to medical imaging.

Core claim

The central claim is that bilinear pooling of features from a distortion-classification CNN and an image-classification CNN produces a unified representation that, after fine-tuning on target databases, yields higher correlation with human quality judgments than prior blind image quality methods on both synthetic and authentic distortion sets, with additional verification of generalizability on the Waterloo Exploration Database via group maximum differentiation competition.

What carries the argument

Bilinear pooling of feature maps from two separate convolutional neural networks, one pre-trained on large-scale synthetic distortion classification and the other on image classification, to form a quality prediction representation.

Load-bearing premise

The features from the two pre-trained CNNs remain informative enough for accurate quality prediction once combined by bilinear pooling and adjusted on human-rated images.

What would settle it

A head-to-head evaluation in which the model fails to exceed the best competing blind methods on correlation metrics across the LIVE, TID2013, CSIQ, and LIVE Challenge databases would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 1907.02665 by Dexiang Deng, Jia Yan, Kede Ma, Weixia Zhang, Zhou Wang.

Figure 1
Figure 1. Figure 1: Sample distorted images synthesized from a reference image in the Waterloo Exploration Database [18]. (a) Gaussian blur. (b) White Gaussian noise. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the five new distortion types with increasing degradation levels from left to right. (a)-(e) Contrast stretching. (f)-(j) Pink noise. (k)-(o) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of S-CNN for synthetic distortions. We follow the style and convention in [2], and denote the parameterization of the convolution [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The structure of the proposed DB-CNN. can be computed by ∂` ∂Y1 = Y2  ∂` ∂B T (6) and ∂` ∂Y2 = Y1  ∂` ∂B  . (7) Bilinear pooling summarizes the spatial information and en￾ables DB-CNN to accept an input image of arbitrary size. As a result, we can feed the whole image directly instead of patches cropped from it to DB-CNN during both training and testing. IV. EXPERIMENTS In this section, we first descri… view at source ↗
Figure 5
Figure 5. Figure 5: Images with different distortion types may share similar visual appearances. (a) Additive Gaussian noise. (b) Additive noise in color components. (c) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: gMAD competition results between DB-CNN and deepIQA [24]. (a) Fixed deepIQA at the low-quality level. (b) Fixed deepIQA at the high-quality [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: gMAD competition results between DB-CNN and MEON [10]. (a) Fixed MEON at the low-quality level. (b) Fixed MEON at the high-quality level. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

We propose a deep bilinear model for blind image quality assessment (BIQA) that handles both synthetic and authentic distortions. Our model consists of two convolutional neural networks (CNN), each of which specializes in one distortion scenario. For synthetic distortions, we pre-train a CNN to classify image distortion type and level, where we enjoy large-scale training data. For authentic distortions, we adopt a pre-trained CNN for image classification. The features from the two CNNs are pooled bilinearly into a unified representation for final quality prediction. We then fine-tune the entire model on target subject-rated databases using a variant of stochastic gradient descent. Extensive experiments demonstrate that the proposed model achieves superior performance on both synthetic and authentic databases. Furthermore, we verify the generalizability of our method on the Waterloo Exploration Database using the group maximum differentiation competition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a deep bilinear convolutional neural network for blind image quality assessment (BIQA) handling both synthetic and authentic distortions. One CNN is pre-trained to classify distortion type and level on large-scale synthetic data; a second uses a pre-trained image-classification CNN for authentic distortions. Features from both are combined via bilinear pooling into a unified representation, after which the full model is fine-tuned on subject-rated databases via a variant of SGD. The central claim is that this architecture achieves superior performance on both synthetic and authentic databases, with additional verification of generalizability on the Waterloo Exploration Database via the group maximum differentiation competition.

Significance. If the reported empirical superiority holds under proper controls, the work supplies a unified BIQA framework that avoids separate pipelines for synthetic versus authentic distortions. The combination of large-scale pre-training, bilinear pooling, and targeted fine-tuning is a direct and testable extension of transfer-learning ideas already common in the field. The explicit use of the Waterloo Exploration Database and group maximum differentiation competition provides an independent check on generalizability that is stronger than cross-database testing alone.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'superior performance' is stated without any numerical values, baseline names, or dataset sizes. Adding the key SRCC/PLCC figures and the main competing methods would make the abstract self-contained and allow readers to assess the strength of the central claim immediately.
  2. The bilinear pooling step is described only at a high level; a short equation or explicit reference to the original bilinear CNN formulation (e.g., the outer-product operation and subsequent pooling) would clarify exactly how the two feature streams are fused.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report accurately reflects the paper's contributions and we will incorporate any minor suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical architecture for blind image quality assessment: two pre-trained CNNs (one for distortion classification on synthetic data, one for image classification on authentic data), bilinear pooling of their features, and fine-tuning via SGD on subject-rated databases. The central claim is experimental superiority on held-out test sets from synthetic and authentic databases, plus a generalizability check on Waterloo Exploration. No derivation chain exists; there are no equations defining a quantity in terms of itself, no parameters fitted to a subset then renamed as predictions of a related quantity, and no load-bearing self-citations to uniqueness theorems or ansatzes. The method is self-contained transfer learning whose performance claims rest on external data splits and standard evaluation protocols rather than internal redefinition.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger records the explicit modeling choices described there. The central claim depends on the transferability of the two pre-trained feature extractors and on the effectiveness of bilinear pooling for quality regression.

free parameters (1)
  • fine-tuning hyperparameters
    A variant of stochastic gradient descent is used to fine-tune the entire model on target databases; specific learning rates, batch sizes, and stopping criteria are not stated.
axioms (2)
  • domain assumption Features from a CNN pre-trained on distortion-type and level classification remain useful for perceptual quality prediction after bilinear pooling.
    The model architecture is built on this transfer assumption.
  • domain assumption Features from a CNN pre-trained on image classification remain useful for perceptual quality prediction after bilinear pooling.
    The model architecture is built on this transfer assumption.

pith-pipeline@v0.9.0 · 5675 in / 1324 out tokens · 24250 ms · 2026-05-25T02:21:55.282182+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    A. C. Bovik, Handbook of Image and Video Processing . Academic Press, 2010

  2. [2]

    End-to-end optimized image compression,

    J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” CoRR, vol. abs/1611.01704, 2016. [Online]. Available: http://arxiv.org/abs/1611.01704

  3. [3]

    Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,

    Z. Duanmu, K. Ma, and Z. Wang, “Quality-of-experience of adaptive video streaming: Exploring the space of adaptations,” in ACM Multime- dia, 2017, pp. 1752–1760

  4. [4]

    Wang and A

    Z. Wang and A. C. Bovik, Modern Image Quality Assessment . Morgan & Claypool, 2006

  5. [5]

    Display device-adapted video quality-of-experience assessment,

    A. Rehman, K. Zeng, and Z. Wang, “Display device-adapted video quality-of-experience assessment,” in Human Vision and Electronic Imaging, 2015, pp. 1–11

  6. [6]

    Reduced-and no-reference image quality assessment: The natural scene statistic model approach,

    Z. Wang and A. C. Bovik, “Reduced-and no-reference image quality assessment: The natural scene statistic model approach,” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 29–40, Nov. 2011

  7. [7]

    No-reference image quality assessment in the spatial domain,

    A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, Dec. 2012

  8. [8]

    Unsupervised feature learning framework for no-reference image quality assessment,

    P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2012, pp. 1098–1105

  9. [9]

    Convolutional neural networks for no-reference image quality assessment,

    L. Kang, P. Ye, Y . Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1733–1740

  10. [10]

    End-to- end blind image quality assessment using deep neural networks,

    K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- end blind image quality assessment using deep neural networks,” IEEE Transactions on Image Processing , vol. 27, no. 3, pp. 1202–1213, Mar. 2018

  11. [11]

    ImageNet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition , 2009, pp. 248–255

  12. [12]

    Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,

    J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, and A. C. Bovik, “Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment,” IEEE Signal Processing Magazine , vol. 34, no. 6, pp. 130–141, Nov. 2017

  13. [13]

    Massive online crowdsourced study of subjective and objective picture quality,

    D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, Jan. 2016

  14. [14]

    A statistical evaluation of recent full reference image quality assessment algorithms,

    H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, Nov. 2006

  15. [15]

    Image database TID2013: Peculiarities, results and perspectives,

    N. Ponomarenko, L. Jin, O. Ieremeiev, V . Lukin, K. Egiazarian, J. Astola, B. V ozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. J. Kuo, “Image database TID2013: Peculiarities, results and perspectives,” Signal Pro- cessing: Image Communication , vol. 30, pp. 57–77, Jan. 2015

  16. [16]

    Fully deep blind image quality predictor,

    J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE Journal of Selected Topics in Signal Processing , vol. 11, no. 1, pp. 206–220, Feb. 2017

  17. [17]

    Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,

    L. Kang, P. Ye, Y . Li, and D. Doermann, “Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks,” in IEEE International Conference on Image Processing , 2015, pp. 2791–2795

  18. [18]

    Waterloo Exploration Database: New challenges for image quality assessment models,

    K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang, “Waterloo Exploration Database: New challenges for image quality assessment models,” IEEE Transactions on Image Processing , vol. 26, no. 2, pp. 1004–1016, Feb. 2017

  19. [19]

    The Pascal Visual Object Classes (VOC) Challenge,

    M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zis- serman, “The Pascal Visual Object Classes (VOC) Challenge,” Interna- tional Journal of Computer Vision , vol. 88, no. 2, pp. 303–338, Jun. 2010

  20. [20]

    Perceptual quality prediction on authentically distorted images using a bag of features approach,

    D. Ghadiyaram and A. C. Bovik, “Perceptual quality prediction on authentically distorted images using a bag of features approach,” Journal of Vision, vol. 17, no. 1, pp. 32–32, Jan. 2017. 11

  21. [21]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015

  22. [22]

    Bilinear CNN models for fine-grained visual recognition,

    T.-Y . Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models for fine-grained visual recognition,” in IEEE International Conference on Computer Vision, 2015, pp. 1449–1457

  23. [23]

    Group MAD competition − a new methodology to compare objective image quality models,

    K. Ma, Q. Wu, Z. Wang, Z. Duanmu, H. Yong, H. Li, and L. Zhang, “Group MAD competition − a new methodology to compare objective image quality models,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1664–1673

  24. [24]

    Deep neural networks for no-reference and full-reference image quality as- sessment,

    S. Bosse, D. Maniry, K. R. Mller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality as- sessment,” IEEE Transactions on Image Processing , vol. 27, no. 1, pp. 206–219, Jan. 2018

  25. [25]

    dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,

    K. Ma, W. Liu, T. Liu, Z. Wang, and D. Tao, “dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs,” IEEE Transactions on Image Processing , vol. 26, no. 8, pp. 3951–3964, Aug. 2017

  26. [26]

    Blind image quality assessment using semi-supervised rectifier networks,

    H. Tang, N. Joshi, and A. Kapoor, “Blind image quality assessment using semi-supervised rectifier networks,” in IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 2877–2884

  27. [27]

    On the Use of Deep Learning for Blind Image Quality Assessment

    S. Bianco, L. Celona, P. Napoletano, and R. Schettini, “On the use of deep learning for blind image quality assessment,” CoRR, vol. abs/1602.05531, 2016

  28. [28]

    FSIM: A feature similarity index for image quality assessment,

    L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug. 2011

  29. [29]

    Perceptual quality assessment for multi-exposure image fusion,

    K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Transactions on Image Processing , vol. 24, no. 11, pp. 3345–3356, Nov. 2015

  30. [30]

    Separating style and content,

    J. B. Tenenbaum and W. T. Freeman, “Separating style and content,” in Advances in Neural Information Processing Systems , 1997, pp. 662–668

  31. [31]

    Two-stream convolutional networks for action recognition in videos,

    K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014, pp. 568–576

  32. [32]

    Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach, “Multimodal compact bilinear pooling for visual question answering and visual grounding,” CoRR, vol. abs/1606.01847, 2016

  33. [33]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

  34. [34]

    A Riemannian framework for tensor computing,

    X. Pennec, P. Fillard, and N. Ayache, “A Riemannian framework for tensor computing,” International Journal of Computer Vision , vol. 66, no. 1, pp. 41–66, Jan. 2006

  35. [35]

    Most apparent distortion: Full- reference image quality assessment and the role of strategy,

    E. C. Larson and D. M. Chandler, “Most apparent distortion: Full- reference image quality assessment and the role of strategy,” Journal of Electronic Imaging , vol. 19, no. 1, pp. 1–21, Jan. 2010

  36. [36]

    Objective quality assessment of multiply distorted images,

    D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik, “Objective quality assessment of multiply distorted images,” in Signals, Systems and Computers , 2013, pp. 1693–1697

  37. [37]

    Final report from the video quality experts group on the validation of objective models of video quality assessment,

    VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment,” 2000. [Online]. Available: http://www.vqeg.org

  38. [38]

    Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,

    K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in IEEE International Conference on Computer Vision , 2015, pp. 1026– 1034

  39. [39]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014

  40. [40]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning , 2015, pp. 448–456

  41. [41]

    MatConvNet: Convolutional neural networks for Matlab,

    A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks for Matlab,” in ACM International Conference on Multimedia , 2015, pp. 689–692

  42. [42]

    Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,

    W. Xue, X. Mou, L. Zhang, A. C. Bovik, and X. Feng, “Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features,” IEEE Transactions on Image Processing , vol. 23, no. 11, pp. 4850–4862, Nov. 2014

  43. [43]

    Blind image quality assessment based on high order statistics aggregation,

    J. Xu, P. Ye, Q. Li, H. Du, Y . Liu, and D. Doermann, “Blind image quality assessment based on high order statistics aggregation,” IEEE Transactions on Image Processing , vol. 25, no. 9, pp. 4444–4457, Sep. 2016

  44. [44]

    Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,

    Z. Wang and E. P. Simoncelli, “Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities,” Journal of Vision , vol. 8, no. 12, pp. 8.1–8.13, Sep. 2008

  45. [45]

    Compact bilinear pool- ing,

    Y . Gao, O. Beijbom, N. Zhang, and T. Darrell, “Compact bilinear pool- ing,” in IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 317–326. Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY , USA, in 2013...