PrISM-IQA: Image Quality Assessment Made Practical for Smartphone Photography

Jiaqi He; Kede Ma; Liang Wang; Shuyan Zhai; Weixia Zhang; Zhenjie Lee; Zufeng Zhang

arxiv: 2606.31626 · v1 · pith:AIQURHBRnew · submitted 2026-06-30 · 💻 cs.CV

PrISM-IQA: Image Quality Assessment Made Practical for Smartphone Photography

Shuyan Zhai , Jiaqi He , Weixia Zhang , Liang Wang , Zhenjie Lee , Zufeng Zhang , Kede Ma This is my paper

Pith reviewed 2026-07-01 06:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords image quality assessmentsmartphone photographyISP tuningordinal diagnosismulti-issue predictionseverity levelsstructured inferenceperceptual quality

0 comments

The pith

PrISM-IQA reformulates smartphone image quality assessment as multi-issue ordinal diagnosis that outputs severity levels for each ISP-relevant defect.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to align image quality assessment with the needs of smartphone image signal processor tuning by replacing a single overall score with separate ordered severity predictions for dozens of specific issues. Engineers can then see which defects are absent, minor, severe, or critical and decide which ones require intervention. The method covers both global artifacts and local, content-dependent problems across a set of 53 issues. It enforces logical consistency among predictions by combining cumulative ordinal encoding with inference rules that respect monotonicity within each issue and subsumption or exclusion relations across issues. Evaluations on a reconstructed SPAQ benchmark and a small expert-annotated real-world set show that the resulting predictions are usable for practical tuning tasks and that the learned representations transfer via linear probing.

Core claim

PrISM-IQA claims that smartphone IQA is better expressed as a multi-issue ordinal diagnosis task in which the model predicts one of four ordered severity levels for each of 53 ISP-relevant quality issues; cumulative ordinal encoding together with structured inference that encodes within-issue monotonicity and cross-issue subsumption/exclusion relations yields logically consistent outputs that match perceptual judgments and directly support actionable ISP adjustments.

What carries the argument

Cumulative ordinal encoding plus structured inference that enforces within-issue monotonicity and cross-issue subsumption and exclusion relations.

If this is right

Predictions supply an ordered severity for each of the 53 issues rather than a scalar score.
The same model handles both global image-level artifacts and local content-dependent defects.
Predictions remain logically consistent because of the monotonicity and cross-issue constraints.
Linear probing of the learned features produces transferable perceptual quality representations.
The outputs can be used directly to guide ISP parameter changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The diagnosis format could be applied to quality assessment in other camera pipelines that also rely on ISP-like processing stages.
If the severity outputs prove stable across devices, they might serve as training targets for automated ISP optimization loops.
Extending the same encoding and inference structure to video sequences would test whether temporal consistency can be added without breaking the per-frame diagnosis.
The approach separates diagnosis from scoring, so downstream systems could weight issues differently depending on the target use case such as portrait versus landscape photography.

Load-bearing premise

The relations among issues that the structured inference encodes must match how human experts actually judge perceptual quality when tuning an ISP.

What would settle it

On the expert-annotated real-world dataset, the model's severity assignments for the 53 issues show no better agreement with human labels than a baseline that ignores the ordinal and relational structure.

Figures

Figures reproduced from arXiv: 2606.31626 by Jiaqi He, Kede Ma, Liang Wang, Shuyan Zhai, Weixia Zhang, Zhenjie Lee, Zufeng Zhang.

**Figure 2.** Figure 2: OpenISP processing pipeline used for PrISM-IQA-guided ISP tuning. The pipeline is [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Representative examples of PrISM-IQA-guided OpenISP tuning. Panels (a) and (c) [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: HEX graph for the reconstructed SPAQ issue taxonomy. Colored panels list issue nodes by [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Representative images from the expert-annotated dataset. Panels illustrate common global [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

read the original abstract

Existing smartphone image quality assessment (IQA) methods commonly reduce perceptual quality to a single score. However, this scalar formulation is poorly aligned with practical image signal processor (ISP) tuning, where engineers must identify specific quality issues, estimate their severities, and determine whether they are acceptable or require intervention. In this work, we introduce a Practical ISP-aware Structured Model for IQA (PrISM-IQA), which reformulates smartphone IQA as a multi-issue ordinal diagnosis problem. Rather than regressing a single quality score, PrISM-IQA predicts an \textit{ordered} severity level -- absent, minor, severe, or critical -- for each ISP-relevant issue, covering both global image-level artifacts and local content-dependent defects. To produce logically consistent predictions, PrISM-IQA combines cumulative ordinal encoding with structured inference that captures within-issue monotonicity as well as cross-issue subsumption and exclusion relations. We evaluate PrISM-IQA on a reconstructed SPAQ benchmark annotated with $53$ ISP-relevant quality issues and on a small-scale expert-annotated real-world dataset. Experimental results demonstrate the effectiveness of PrISM-IQA for practical issue-level diagnosis, reveal transferable perceptual quality representations through linear probing, and further show how its predictions can support actionable and meaningful ISP tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PrISM-IQA reframes smartphone IQA as per-issue ordinal diagnosis for 53 ISP problems with structured consistency constraints, but the abstract supplies no numbers or ablations.

read the letter

PrISM-IQA's main contribution is reformulating image quality assessment for smartphones as predicting ordered severity levels for each of 53 ISP-specific issues rather than producing one overall score. It adds cumulative encoding and structured inference to handle monotonicity within issues and relations across them.

The paper does a good job spotting the disconnect between typical IQA research and what ISP engineers actually do with the output. They need to know which artifacts are present, how severe, and whether they need fixing, and the multi-issue setup with global and local defects addresses that directly. The structured part is a sensible way to avoid inconsistent predictions like saying an issue is both minor and critical.

The soft spots are clear from the abstract and the stress-test note. There are no numbers at all—no accuracy, no comparison to single-score methods or independent ordinal heads, no human study on whether the outputs actually help tuning. The evaluation mentions a reconstructed SPAQ and a small expert set, but without details or ablations isolating the structured inference, it's impossible to know if that component improves anything or if simpler models would do as well. The weakest assumption about logical consistency aligning with perception isn't backed by evidence here.

This is for researchers and engineers working on mobile imaging pipelines who want IQA that maps more directly to actionable decisions. Someone looking for new ways to model perceptual quality in constrained domains might find the formulation useful, but the current version reads more like a proposal than a completed result.

I would send it for peer review. The motivation is solid and the modeling choice is distinct enough that referees could help strengthen the experiments and check if the consistency claims hold.

Referee Report

2 major / 1 minor

Summary. The paper introduces PrISM-IQA, which reformulates smartphone IQA as a multi-issue ordinal diagnosis problem rather than scalar regression. It predicts one of four ordered severity levels (absent, minor, severe, critical) for each of 53 ISP-relevant issues (global and local), using cumulative ordinal encoding combined with structured inference to enforce within-issue monotonicity plus cross-issue subsumption and exclusion relations. Evaluation is described on a reconstructed SPAQ benchmark and a small expert-annotated real-world dataset, with claims that the outputs support actionable ISP tuning and yield transferable representations via linear probing.

Significance. If the central claims hold, the work addresses a genuine practical gap by shifting IQA from single scores to issue-specific, severity-ordered diagnostics that align with ISP engineering workflows. The modeling choice of structured inference to guarantee logical consistency is a clear strength when validated, and the dual evaluation on reconstructed benchmark plus expert data plus the linear-probing transfer result would constitute useful evidence of utility. The absence of any quantitative numbers or ablation results in the abstract, however, makes the magnitude of the advance difficult to gauge from the provided material.

major comments (2)

[Methods / Experiments] Methods / evaluation description: the central claim that structured inference produces logically consistent predictions usable for ISP tuning rests on the combination of cumulative ordinal encoding with the subsumption/exclusion constraints, yet no ablation is reported that isolates this component against independent per-issue ordinal heads. Without such a comparison (e.g., inconsistency rate or human preference on tuning decisions), it remains possible that the reported gains on the reconstructed SPAQ and expert sets are already achieved by the ordinal formulation alone.
[Experiments] Experiments section: the abstract asserts that predictions 'support actionable and meaningful ISP tuning,' but provides neither quantitative metrics (accuracy, consistency rate, correlation with expert tuning decisions) nor a concrete example of how the four-level outputs are mapped to ISP parameter adjustments. A table or figure showing these downstream results is required to substantiate the practical-utility claim.

minor comments (1)

[Abstract] Abstract: the phrase 'reconstructed SPAQ benchmark' is introduced without a citation or brief description of the reconstruction and annotation procedure; a short clause clarifying the data source would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Methods / Experiments] Methods / evaluation description: the central claim that structured inference produces logically consistent predictions usable for ISP tuning rests on the combination of cumulative ordinal encoding with the subsumption/exclusion constraints, yet no ablation is reported that isolates this component against independent per-issue ordinal heads. Without such a comparison (e.g., inconsistency rate or human preference on tuning decisions), it remains possible that the reported gains on the reconstructed SPAQ and expert sets are already achieved by the ordinal formulation alone.

Authors: We agree that an ablation isolating the structured inference component is needed to substantiate its contribution. In the revised manuscript we will add this ablation, comparing the full model against independent per-issue ordinal heads and reporting inconsistency rates on both the reconstructed SPAQ benchmark and the expert-annotated set. revision: yes
Referee: [Experiments] Experiments section: the abstract asserts that predictions 'support actionable and meaningful ISP tuning,' but provides neither quantitative metrics (accuracy, consistency rate, correlation with expert tuning decisions) nor a concrete example of how the four-level outputs are mapped to ISP parameter adjustments. A table or figure showing these downstream results is required to substantiate the practical-utility claim.

Authors: The experiments section already contains qualitative examples mapping severity predictions to ISP adjustments. To strengthen the claim we will add a table with concrete mapping examples and any available quantitative metrics (e.g., consistency rates with expert annotations). The small size of the expert dataset limits the scope of new quantitative validation. revision: partial

Circularity Check

0 steps flagged

No circularity; modeling choices are explicit and non-tautological

full rationale

The paper introduces PrISM-IQA as a reformulation of IQA into multi-issue ordinal diagnosis, using cumulative ordinal encoding plus structured inference to enforce monotonicity, subsumption, and exclusion. No equations, derivations, or fitted parameters are presented that reduce the output predictions to inputs by construction. The structured inference is described as an added modeling component rather than a self-definitional or fitted result. No self-citation chains or uniqueness theorems are invoked as load-bearing for the central claims. Evaluation on SPAQ and expert datasets is presented as external validation. This is a standard non-circular modeling paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger populated from stated modeling choices. The severity ordering and cross-issue relations are treated as given domain structure rather than derived.

free parameters (1)

model parameters for ordinal prediction
Any learned neural network weights or thresholds are fitted during training on the annotated data.

axioms (1)

domain assumption Quality issues admit ordered severity levels with monotonicity, subsumption, and exclusion relations that can be captured by structured inference.
Invoked to justify the cumulative ordinal encoding and structured inference for logical consistency.

pith-pipeline@v0.9.1-grok · 5773 in / 1172 out tokens · 21601 ms · 2026-07-01T06:14:54.827621+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 4 canonical work pages · 4 internal anchors

[1]

TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

2024
[2]

Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

Wei Chu and S Sathiya Keerthi. Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

2007
[3]

Pranking with ranking

Koby Crammer and Yoram Singer. Pranking with ranking. InAdvances in Neural Information Processing Systems, pages 641–647, 2001

2001
[4]

Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021

Mauricio Delbracio, Damien Kelly, Michael S Brown, and Peyman Milanfar. Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021. 10

2021
[5]

Large-scale object classification using label relation graphs

Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classification using label relation graphs. InEuropean Conference on Computer Vision, pages 48–64, 2014

2014
[6]

Perceptual quality assessment of smartphone photography

Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality assessment of smartphone photography. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020

2020
[7]

SQAD: Automatic smartphone camera quality assessment and benchmarking

Zilin Fang, Andrey Ignatov, Eduard Zamfir, and Radu Timofte. SQAD: Automatic smartphone camera quality assessment and benchmarking. InIEEE International Conference on Computer Vision, pages 20532–20542, 2023

2023
[8]

Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

Pedro F Felzenszwalb and Daniel P Huttenlocher. Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

2005
[9]

A simple approach to ordinal classification

Eibe Frank and Mark Hall. A simple approach to ordinal classification. InEuropean Conference on Machine Learning, pages 145–156, 2001

2001
[10]

Deep ordinal regression network for monocular depth estimation

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2002–2011, 2018

2002
[11]

Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

Deepti Ghadiyaram and Alan C Bovik. Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

2016
[12]

No-reference image quality assessment via Transformers, relative ranking, and self-consistency

S Alireza Golestaneh, Saba Dadsetan, and Kris M Kitani. No-reference image quality assessment via Transformers, relative ranking, and self-consistency. InIEEE Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022

2022
[13]

Region-based segmentation and object detection

Stephen Gould, Tianshi Gao, and Daphne Koller. Region-based segmentation and object detection. In Advances in Neural Information Processing Systems, pages 655–663, 2009

2009
[14]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InIEEE Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

2022
[15]

Multiscale conditional random fields for image labeling

Xuming He, Richard S Zemel, and Miguel A Carreira-Perpinán. Multiscale conditional random fields for image labeling. InIEEE Conference on Computer Vision and Pattern Recognition, pages 695–702, 2004

2004
[16]

Large margin rank boundaries for ordinal regression

Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. InAdvances in Large Margin Classifiers, pages 115–132. 2000

2000
[17]

KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

2020
[18]

UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment

Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, and Dietmar Saupe. UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment. InEuropean Conference on Computer Vision, pages 467–482, 2024

2024
[19]

MUSIQ: Multi-scale image quality Transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. MUSIQ: Multi-scale image quality Transformer. InIEEE International Conference on Computer Vision, pages 5148–5157, 2021

2021
[20]

Computing visual correspondence with occlusions using graph cuts

Vladimir Kolmogorov and Ramin Zabih. Computing visual correspondence with occlusions using graph cuts. InIEEE International Conference on Computer Vision, pages 508–515, 2001

2001
[21]

Conditional random fields: Probabilistic models for segmenting and labeling sequence data

John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. InInternational Conference on Machine Learning, pages 282–289, 2001

2001
[22]

Learning ordinal discriminative features for age estimation

Changsheng Li, Qingshan Liu, Jing Liu, and Hanqing Lu. Learning ordinal discriminative features for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2570–2577, 2012

2012
[23]

Ordinal regression by extended binary classification

Ling Li and Hsuan-Tien Lin. Ordinal regression by extended binary classification. InAdvances in Neural Information Processing Systems, pages 865–872, 2006

2006
[24]

Q-Insight: Understanding image quality via visual reinforcement learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-Insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems, pages 36802–36827, 2025. 11

2025
[25]

Ordinal regression with neuron stick-breaking for medical diagnosis

Xiaofeng Liu, Yang Zou, Yuhang Song, Chao Yang, Jane You, and B V K Vijaya Kumar. Ordinal regression with neuron stick-breaking for medical diagnosis. InEuropean Conference on Computer Vision Workshops, pages 335–344, 2018

2018
[26]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

Kede Ma, Wentao Liu, Kai Zhang, Zhengfang Duanmu, Zhou Wang, and Wangmeng Zuo. End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

2018
[29]

Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

Peter McCullagh. Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

1980
[30]

No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

Anish Mittal, Anush K Moorthy, and Alan C Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

2012
[31]

Making a ‘completely blind’ image quality analyzer

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a ‘completely blind’ image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013

2013
[32]

Ordinal regression with multiple output CNN for age estimation

Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. Ordinal regression with multiple output CNN for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 4920–4928, 2016

2016
[33]

Structured learning and prediction in computer vision

Sebastian Nowozin and Christoph H Lampert. Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision, 6(3-4):185–365, 2011

2011
[34]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

2019
[36]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763, 2021

2021
[37]

Blindly assess image quality in the wild guided by a self-adaptive hyper network

Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. Blindly assess image quality in the wild guided by a self-adaptive hyper network. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020

2020
[38]

Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, and Yoram Singer. Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

2005
[39]

CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

Toni Virtanen, Mikko Nuutinen, Mikko Vaahteranoksa, Pirkko Oittinen, and Jukka Häkkinen. CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

2015
[40]

Springer, 2006

Zhou Wang and Alan C Bovik.Modern Image Quality Assessment. Springer, 2006

2006
[41]

VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems, pages 88167–88190, 2025

2025
[42]

Unsupervised feature learning framework for no-reference image quality assessment

Peng Ye, Jayant Kumar, Le Kang, and David Doermann. Unsupervised feature learning framework for no-reference image quality assessment. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1098–1105, 2012

2012
[43]

From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality

Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, and Alan C Bovik. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3575–3585, 2020

2020
[44]

Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020. 12

2020
[45]

Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

Weixia Zhang, Kede Ma, Guangtao Zhai, and Xiaokang Yang. Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

2021
[46]

Blind image quality assessment via vision-language correspondence: A multitask learning perspective

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023

2023
[47]

iBOT: Image BERT Pre-Training with Online Tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. iBOT: Image BERT pre-training with online tokenizer.arXiv preprint arXiv:2111.07832, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[48]

Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022. 13 Appendix This appendix supplements the main paper with practical demonstrations, reproducibility details, and additional empirical comparisons. We first show how PrISM-IQA can be us...

2022

[1] [1]

TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

2024

[2] [2]

Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

Wei Chu and S Sathiya Keerthi. Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

2007

[3] [3]

Pranking with ranking

Koby Crammer and Yoram Singer. Pranking with ranking. InAdvances in Neural Information Processing Systems, pages 641–647, 2001

2001

[4] [4]

Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021

Mauricio Delbracio, Damien Kelly, Michael S Brown, and Peyman Milanfar. Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021. 10

2021

[5] [5]

Large-scale object classification using label relation graphs

Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classification using label relation graphs. InEuropean Conference on Computer Vision, pages 48–64, 2014

2014

[6] [6]

Perceptual quality assessment of smartphone photography

Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality assessment of smartphone photography. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020

2020

[7] [7]

SQAD: Automatic smartphone camera quality assessment and benchmarking

Zilin Fang, Andrey Ignatov, Eduard Zamfir, and Radu Timofte. SQAD: Automatic smartphone camera quality assessment and benchmarking. InIEEE International Conference on Computer Vision, pages 20532–20542, 2023

2023

[8] [8]

Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

Pedro F Felzenszwalb and Daniel P Huttenlocher. Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

2005

[9] [9]

A simple approach to ordinal classification

Eibe Frank and Mark Hall. A simple approach to ordinal classification. InEuropean Conference on Machine Learning, pages 145–156, 2001

2001

[10] [10]

Deep ordinal regression network for monocular depth estimation

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2002–2011, 2018

2002

[11] [11]

Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

Deepti Ghadiyaram and Alan C Bovik. Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

2016

[12] [12]

No-reference image quality assessment via Transformers, relative ranking, and self-consistency

S Alireza Golestaneh, Saba Dadsetan, and Kris M Kitani. No-reference image quality assessment via Transformers, relative ranking, and self-consistency. InIEEE Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022

2022

[13] [13]

Region-based segmentation and object detection

Stephen Gould, Tianshi Gao, and Daphne Koller. Region-based segmentation and object detection. In Advances in Neural Information Processing Systems, pages 655–663, 2009

2009

[14] [14]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InIEEE Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

2022

[15] [15]

Multiscale conditional random fields for image labeling

Xuming He, Richard S Zemel, and Miguel A Carreira-Perpinán. Multiscale conditional random fields for image labeling. InIEEE Conference on Computer Vision and Pattern Recognition, pages 695–702, 2004

2004

[16] [16]

Large margin rank boundaries for ordinal regression

Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. InAdvances in Large Margin Classifiers, pages 115–132. 2000

2000

[17] [17]

KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

2020

[18] [18]

UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment

Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, and Dietmar Saupe. UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment. InEuropean Conference on Computer Vision, pages 467–482, 2024

2024

[19] [19]

MUSIQ: Multi-scale image quality Transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. MUSIQ: Multi-scale image quality Transformer. InIEEE International Conference on Computer Vision, pages 5148–5157, 2021

2021

[20] [20]

Computing visual correspondence with occlusions using graph cuts

Vladimir Kolmogorov and Ramin Zabih. Computing visual correspondence with occlusions using graph cuts. InIEEE International Conference on Computer Vision, pages 508–515, 2001

2001

[21] [21]

Conditional random fields: Probabilistic models for segmenting and labeling sequence data

John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. InInternational Conference on Machine Learning, pages 282–289, 2001

2001

[22] [22]

Learning ordinal discriminative features for age estimation

Changsheng Li, Qingshan Liu, Jing Liu, and Hanqing Lu. Learning ordinal discriminative features for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2570–2577, 2012

2012

[23] [23]

Ordinal regression by extended binary classification

Ling Li and Hsuan-Tien Lin. Ordinal regression by extended binary classification. InAdvances in Neural Information Processing Systems, pages 865–872, 2006

2006

[24] [24]

Q-Insight: Understanding image quality via visual reinforcement learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-Insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems, pages 36802–36827, 2025. 11

2025

[25] [25]

Ordinal regression with neuron stick-breaking for medical diagnosis

Xiaofeng Liu, Yang Zou, Yuhang Song, Chao Yang, Jane You, and B V K Vijaya Kumar. Ordinal regression with neuron stick-breaking for medical diagnosis. InEuropean Conference on Computer Vision Workshops, pages 335–344, 2018

2018

[26] [26]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

Kede Ma, Wentao Liu, Kai Zhang, Zhengfang Duanmu, Zhou Wang, and Wangmeng Zuo. End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

2018

[29] [29]

Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

Peter McCullagh. Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

1980

[30] [30]

No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

Anish Mittal, Anush K Moorthy, and Alan C Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

2012

[31] [31]

Making a ‘completely blind’ image quality analyzer

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a ‘completely blind’ image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013

2013

[32] [32]

Ordinal regression with multiple output CNN for age estimation

Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. Ordinal regression with multiple output CNN for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 4920–4928, 2016

2016

[33] [33]

Structured learning and prediction in computer vision

Sebastian Nowozin and Christoph H Lampert. Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision, 6(3-4):185–365, 2011

2011

[34] [34]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

2019

[36] [36]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763, 2021

2021

[37] [37]

Blindly assess image quality in the wild guided by a self-adaptive hyper network

Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. Blindly assess image quality in the wild guided by a self-adaptive hyper network. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020

2020

[38] [38]

Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, and Yoram Singer. Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

2005

[39] [39]

CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

Toni Virtanen, Mikko Nuutinen, Mikko Vaahteranoksa, Pirkko Oittinen, and Jukka Häkkinen. CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

2015

[40] [40]

Springer, 2006

Zhou Wang and Alan C Bovik.Modern Image Quality Assessment. Springer, 2006

2006

[41] [41]

VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems, pages 88167–88190, 2025

2025

[42] [42]

Unsupervised feature learning framework for no-reference image quality assessment

Peng Ye, Jayant Kumar, Le Kang, and David Doermann. Unsupervised feature learning framework for no-reference image quality assessment. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1098–1105, 2012

2012

[43] [43]

From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality

Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, and Alan C Bovik. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3575–3585, 2020

2020

[44] [44]

Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020. 12

2020

[45] [45]

Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

Weixia Zhang, Kede Ma, Guangtao Zhai, and Xiaokang Yang. Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

2021

[46] [46]

Blind image quality assessment via vision-language correspondence: A multitask learning perspective

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023

2023

[47] [47]

iBOT: Image BERT Pre-Training with Online Tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. iBOT: Image BERT pre-training with online tokenizer.arXiv preprint arXiv:2111.07832, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[48] [48]

Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022. 13 Appendix This appendix supplements the main paper with practical demonstrations, reproducibility details, and additional empirical comparisons. We first show how PrISM-IQA can be us...

2022