pith. sign in

arxiv: 2606.31626 · v1 · pith:AIQURHBRnew · submitted 2026-06-30 · 💻 cs.CV

PrISM-IQA: Image Quality Assessment Made Practical for Smartphone Photography

Pith reviewed 2026-07-01 06:14 UTC · model grok-4.3

classification 💻 cs.CV
keywords image quality assessmentsmartphone photographyISP tuningordinal diagnosismulti-issue predictionseverity levelsstructured inferenceperceptual quality
0
0 comments X

The pith

PrISM-IQA reformulates smartphone image quality assessment as multi-issue ordinal diagnosis that outputs severity levels for each ISP-relevant defect.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to align image quality assessment with the needs of smartphone image signal processor tuning by replacing a single overall score with separate ordered severity predictions for dozens of specific issues. Engineers can then see which defects are absent, minor, severe, or critical and decide which ones require intervention. The method covers both global artifacts and local, content-dependent problems across a set of 53 issues. It enforces logical consistency among predictions by combining cumulative ordinal encoding with inference rules that respect monotonicity within each issue and subsumption or exclusion relations across issues. Evaluations on a reconstructed SPAQ benchmark and a small expert-annotated real-world set show that the resulting predictions are usable for practical tuning tasks and that the learned representations transfer via linear probing.

Core claim

PrISM-IQA claims that smartphone IQA is better expressed as a multi-issue ordinal diagnosis task in which the model predicts one of four ordered severity levels for each of 53 ISP-relevant quality issues; cumulative ordinal encoding together with structured inference that encodes within-issue monotonicity and cross-issue subsumption/exclusion relations yields logically consistent outputs that match perceptual judgments and directly support actionable ISP adjustments.

What carries the argument

Cumulative ordinal encoding plus structured inference that enforces within-issue monotonicity and cross-issue subsumption and exclusion relations.

If this is right

  • Predictions supply an ordered severity for each of the 53 issues rather than a scalar score.
  • The same model handles both global image-level artifacts and local content-dependent defects.
  • Predictions remain logically consistent because of the monotonicity and cross-issue constraints.
  • Linear probing of the learned features produces transferable perceptual quality representations.
  • The outputs can be used directly to guide ISP parameter changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The diagnosis format could be applied to quality assessment in other camera pipelines that also rely on ISP-like processing stages.
  • If the severity outputs prove stable across devices, they might serve as training targets for automated ISP optimization loops.
  • Extending the same encoding and inference structure to video sequences would test whether temporal consistency can be added without breaking the per-frame diagnosis.
  • The approach separates diagnosis from scoring, so downstream systems could weight issues differently depending on the target use case such as portrait versus landscape photography.

Load-bearing premise

The relations among issues that the structured inference encodes must match how human experts actually judge perceptual quality when tuning an ISP.

What would settle it

On the expert-annotated real-world dataset, the model's severity assignments for the 53 issues show no better agreement with human labels than a baseline that ignores the ordinal and relational structure.

Figures

Figures reproduced from arXiv: 2606.31626 by Jiaqi He, Kede Ma, Liang Wang, Shuyan Zhai, Weixia Zhang, Zhenjie Lee, Zufeng Zhang.

Figure 1
Figure 1. Figure 1: Overview of PrISM-IQA. Given a smartphone image [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: OpenISP processing pipeline used for PrISM-IQA-guided ISP tuning. The pipeline is [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative examples of PrISM-IQA-guided OpenISP tuning. Panels (a) and (c) [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: HEX graph for the reconstructed SPAQ issue taxonomy. Colored panels list issue nodes by [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative images from the expert-annotated dataset. Panels illustrate common global [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

Existing smartphone image quality assessment (IQA) methods commonly reduce perceptual quality to a single score. However, this scalar formulation is poorly aligned with practical image signal processor (ISP) tuning, where engineers must identify specific quality issues, estimate their severities, and determine whether they are acceptable or require intervention. In this work, we introduce a Practical ISP-aware Structured Model for IQA (PrISM-IQA), which reformulates smartphone IQA as a multi-issue ordinal diagnosis problem. Rather than regressing a single quality score, PrISM-IQA predicts an \textit{ordered} severity level -- absent, minor, severe, or critical -- for each ISP-relevant issue, covering both global image-level artifacts and local content-dependent defects. To produce logically consistent predictions, PrISM-IQA combines cumulative ordinal encoding with structured inference that captures within-issue monotonicity as well as cross-issue subsumption and exclusion relations. We evaluate PrISM-IQA on a reconstructed SPAQ benchmark annotated with $53$ ISP-relevant quality issues and on a small-scale expert-annotated real-world dataset. Experimental results demonstrate the effectiveness of PrISM-IQA for practical issue-level diagnosis, reveal transferable perceptual quality representations through linear probing, and further show how its predictions can support actionable and meaningful ISP tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PrISM-IQA, which reformulates smartphone IQA as a multi-issue ordinal diagnosis problem rather than scalar regression. It predicts one of four ordered severity levels (absent, minor, severe, critical) for each of 53 ISP-relevant issues (global and local), using cumulative ordinal encoding combined with structured inference to enforce within-issue monotonicity plus cross-issue subsumption and exclusion relations. Evaluation is described on a reconstructed SPAQ benchmark and a small expert-annotated real-world dataset, with claims that the outputs support actionable ISP tuning and yield transferable representations via linear probing.

Significance. If the central claims hold, the work addresses a genuine practical gap by shifting IQA from single scores to issue-specific, severity-ordered diagnostics that align with ISP engineering workflows. The modeling choice of structured inference to guarantee logical consistency is a clear strength when validated, and the dual evaluation on reconstructed benchmark plus expert data plus the linear-probing transfer result would constitute useful evidence of utility. The absence of any quantitative numbers or ablation results in the abstract, however, makes the magnitude of the advance difficult to gauge from the provided material.

major comments (2)
  1. [Methods / Experiments] Methods / evaluation description: the central claim that structured inference produces logically consistent predictions usable for ISP tuning rests on the combination of cumulative ordinal encoding with the subsumption/exclusion constraints, yet no ablation is reported that isolates this component against independent per-issue ordinal heads. Without such a comparison (e.g., inconsistency rate or human preference on tuning decisions), it remains possible that the reported gains on the reconstructed SPAQ and expert sets are already achieved by the ordinal formulation alone.
  2. [Experiments] Experiments section: the abstract asserts that predictions 'support actionable and meaningful ISP tuning,' but provides neither quantitative metrics (accuracy, consistency rate, correlation with expert tuning decisions) nor a concrete example of how the four-level outputs are mapped to ISP parameter adjustments. A table or figure showing these downstream results is required to substantiate the practical-utility claim.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'reconstructed SPAQ benchmark' is introduced without a citation or brief description of the reconstruction and annotation procedure; a short clause clarifying the data source would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Methods / Experiments] Methods / evaluation description: the central claim that structured inference produces logically consistent predictions usable for ISP tuning rests on the combination of cumulative ordinal encoding with the subsumption/exclusion constraints, yet no ablation is reported that isolates this component against independent per-issue ordinal heads. Without such a comparison (e.g., inconsistency rate or human preference on tuning decisions), it remains possible that the reported gains on the reconstructed SPAQ and expert sets are already achieved by the ordinal formulation alone.

    Authors: We agree that an ablation isolating the structured inference component is needed to substantiate its contribution. In the revised manuscript we will add this ablation, comparing the full model against independent per-issue ordinal heads and reporting inconsistency rates on both the reconstructed SPAQ benchmark and the expert-annotated set. revision: yes

  2. Referee: [Experiments] Experiments section: the abstract asserts that predictions 'support actionable and meaningful ISP tuning,' but provides neither quantitative metrics (accuracy, consistency rate, correlation with expert tuning decisions) nor a concrete example of how the four-level outputs are mapped to ISP parameter adjustments. A table or figure showing these downstream results is required to substantiate the practical-utility claim.

    Authors: The experiments section already contains qualitative examples mapping severity predictions to ISP adjustments. To strengthen the claim we will add a table with concrete mapping examples and any available quantitative metrics (e.g., consistency rates with expert annotations). The small size of the expert dataset limits the scope of new quantitative validation. revision: partial

Circularity Check

0 steps flagged

No circularity; modeling choices are explicit and non-tautological

full rationale

The paper introduces PrISM-IQA as a reformulation of IQA into multi-issue ordinal diagnosis, using cumulative ordinal encoding plus structured inference to enforce monotonicity, subsumption, and exclusion. No equations, derivations, or fitted parameters are presented that reduce the output predictions to inputs by construction. The structured inference is described as an added modeling component rather than a self-definitional or fitted result. No self-citation chains or uniqueness theorems are invoked as load-bearing for the central claims. Evaluation on SPAQ and expert datasets is presented as external validation. This is a standard non-circular modeling paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger populated from stated modeling choices. The severity ordering and cross-issue relations are treated as given domain structure rather than derived.

free parameters (1)
  • model parameters for ordinal prediction
    Any learned neural network weights or thresholds are fitted during training on the annotated data.
axioms (1)
  • domain assumption Quality issues admit ordered severity levels with monotonicity, subsumption, and exclusion relations that can be captured by structured inference.
    Invoked to justify the cumulative ordinal encoding and structured inference for logical consistency.

pith-pipeline@v0.9.1-grok · 5773 in / 1172 out tokens · 21601 ms · 2026-07-01T06:14:54.827621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 4 canonical work pages · 4 internal anchors

  1. [1]

    TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

    Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. TOPIQ: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

  2. [2]

    Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

    Wei Chu and S Sathiya Keerthi. Support vector ordinal regression.Neural Computation, 19(3):792–815, 2007

  3. [3]

    Pranking with ranking

    Koby Crammer and Yoram Singer. Pranking with ranking. InAdvances in Neural Information Processing Systems, pages 641–647, 2001

  4. [4]

    Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021

    Mauricio Delbracio, Damien Kelly, Michael S Brown, and Peyman Milanfar. Mobile computational photography: A tour.Annual Review of Vision Science, 7(1):571–604, 2021. 10

  5. [5]

    Large-scale object classification using label relation graphs

    Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classification using label relation graphs. InEuropean Conference on Computer Vision, pages 48–64, 2014

  6. [6]

    Perceptual quality assessment of smartphone photography

    Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality assessment of smartphone photography. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3677–3686, 2020

  7. [7]

    SQAD: Automatic smartphone camera quality assessment and benchmarking

    Zilin Fang, Andrey Ignatov, Eduard Zamfir, and Radu Timofte. SQAD: Automatic smartphone camera quality assessment and benchmarking. InIEEE International Conference on Computer Vision, pages 20532–20542, 2023

  8. [8]

    Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

    Pedro F Felzenszwalb and Daniel P Huttenlocher. Pictorial structures for object recognition.International Journal of Computer Vision, 61(1):55–79, 2005

  9. [9]

    A simple approach to ordinal classification

    Eibe Frank and Mark Hall. A simple approach to ordinal classification. InEuropean Conference on Machine Learning, pages 145–156, 2001

  10. [10]

    Deep ordinal regression network for monocular depth estimation

    Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2002–2011, 2018

  11. [11]

    Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

    Deepti Ghadiyaram and Alan C Bovik. Massive online crowdsourced study of subjective and objective picture quality.IEEE Transactions on Image Processing, 25(1):372–387, 2016

  12. [12]

    No-reference image quality assessment via Transformers, relative ranking, and self-consistency

    S Alireza Golestaneh, Saba Dadsetan, and Kris M Kitani. No-reference image quality assessment via Transformers, relative ranking, and self-consistency. InIEEE Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022

  13. [13]

    Region-based segmentation and object detection

    Stephen Gould, Tianshi Gao, and Daphne Koller. Region-based segmentation and object detection. In Advances in Neural Information Processing Systems, pages 655–663, 2009

  14. [14]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InIEEE Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

  15. [15]

    Multiscale conditional random fields for image labeling

    Xuming He, Richard S Zemel, and Miguel A Carreira-Perpinán. Multiscale conditional random fields for image labeling. InIEEE Conference on Computer Vision and Pattern Recognition, pages 695–702, 2004

  16. [16]

    Large margin rank boundaries for ordinal regression

    Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. InAdvances in Large Margin Classifiers, pages 115–132. 2000

  17. [17]

    KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

    Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020

  18. [18]

    UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment

    Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, and Dietmar Saupe. UHD-IQA bench- mark database: Pushing the boundaries of blind photo quality assessment. InEuropean Conference on Computer Vision, pages 467–482, 2024

  19. [19]

    MUSIQ: Multi-scale image quality Transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. MUSIQ: Multi-scale image quality Transformer. InIEEE International Conference on Computer Vision, pages 5148–5157, 2021

  20. [20]

    Computing visual correspondence with occlusions using graph cuts

    Vladimir Kolmogorov and Ramin Zabih. Computing visual correspondence with occlusions using graph cuts. InIEEE International Conference on Computer Vision, pages 508–515, 2001

  21. [21]

    Conditional random fields: Probabilistic models for segmenting and labeling sequence data

    John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. InInternational Conference on Machine Learning, pages 282–289, 2001

  22. [22]

    Learning ordinal discriminative features for age estimation

    Changsheng Li, Qingshan Liu, Jing Liu, and Hanqing Lu. Learning ordinal discriminative features for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 2570–2577, 2012

  23. [23]

    Ordinal regression by extended binary classification

    Ling Li and Hsuan-Tien Lin. Ordinal regression by extended binary classification. InAdvances in Neural Information Processing Systems, pages 865–872, 2006

  24. [24]

    Q-Insight: Understanding image quality via visual reinforcement learning

    Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-Insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems, pages 36802–36827, 2025. 11

  25. [25]

    Ordinal regression with neuron stick-breaking for medical diagnosis

    Xiaofeng Liu, Yang Zou, Yuhang Song, Chao Yang, Jane You, and B V K Vijaya Kumar. Ordinal regression with neuron stick-breaking for medical diagnosis. InEuropean Conference on Computer Vision Workshops, pages 335–344, 2018

  26. [26]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

  27. [27]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  28. [28]

    End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

    Kede Ma, Wentao Liu, Kai Zhang, Zhengfang Duanmu, Zhou Wang, and Wangmeng Zuo. End-to-end blind image quality assessment using deep neural networks.IEEE Transactions on Image Processing, 27 (3):1202–1213, 2018

  29. [29]

    Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

    Peter McCullagh. Regression models for ordinal data.Journal of the Royal Statistical Society, 42(2): 109–127, 1980

  30. [30]

    No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

    Anish Mittal, Anush K Moorthy, and Alan C Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on Image Processing, 21(12):4695–4708, 2012

  31. [31]

    Making a ‘completely blind’ image quality analyzer

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a ‘completely blind’ image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013

  32. [32]

    Ordinal regression with multiple output CNN for age estimation

    Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. Ordinal regression with multiple output CNN for age estimation. InIEEE Conference on Computer Vision and Pattern Recognition, pages 4920–4928, 2016

  33. [33]

    Structured learning and prediction in computer vision

    Sebastian Nowozin and Christoph H Lampert. Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision, 6(3-4):185–365, 2011

  34. [34]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  35. [35]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  36. [36]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763, 2021

  37. [37]

    Blindly assess image quality in the wild guided by a self-adaptive hyper network

    Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. Blindly assess image quality in the wild guided by a self-adaptive hyper network. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3667–3676, 2020

  38. [38]

    Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

    Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, and Yoram Singer. Large margin methods for structured and interdependent output variables.Journal of Machine Learning Research, 6(9):1453–1484, 2005

  39. [39]

    CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

    Toni Virtanen, Mikko Nuutinen, Mikko Vaahteranoksa, Pirkko Oittinen, and Jukka Häkkinen. CID2013: A database for evaluating no-reference image quality assessment algorithms.IEEE Transactions on Image Processing, 24(1):390–402, 2015

  40. [40]

    Springer, 2006

    Zhou Wang and Alan C Bovik.Modern Image Quality Assessment. Springer, 2006

  41. [41]

    VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank

    Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems, pages 88167–88190, 2025

  42. [42]

    Unsupervised feature learning framework for no-reference image quality assessment

    Peng Ye, Jayant Kumar, Le Kang, and David Doermann. Unsupervised feature learning framework for no-reference image quality assessment. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1098–1105, 2012

  43. [43]

    From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality

    Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, and Alan C Bovik. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. InIEEE Conference on Computer Vision and Pattern Recognition, pages 3575–3585, 2020

  44. [44]

    Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

    Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020. 12

  45. [45]

    Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

    Weixia Zhang, Kede Ma, Guangtao Zhai, and Xiaokang Yang. Uncertainty-aware blind image quality assessment in the laboratory and wild.IEEE Transactions on Image Processing, 30:3474–3486, 2021

  46. [46]

    Blind image quality assessment via vision-language correspondence: A multitask learning perspective

    Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023

  47. [47]

    iBOT: Image BERT Pre-Training with Online Tokenizer

    Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. iBOT: Image BERT pre-training with online tokenizer.arXiv preprint arXiv:2111.07832, 2021

  48. [48]

    Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022. 13 Appendix This appendix supplements the main paper with practical demonstrations, reproducibility details, and additional empirical comparisons. We first show how PrISM-IQA can be us...