PhotIQA: A photoacoustic image data set with image quality ratings
Pith reviewed 2026-05-19 06:23 UTC · model grok-4.3
The pith
The PhotIQA dataset supplies 1134 photoacoustic images with expert ratings on five quality properties to benchmark image quality assessment methods for medical imaging.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors assembled and publicly released PhotIQA, a dataset consisting of 1134 photoacoustic images rated by five experts across five quality properties in a full-reference setting. The images and ratings are available on Zenodo to support development and testing of IQA measures for photoacoustic imaging and other applications where medical images require quality assessment.
What carries the argument
The PhotIQA dataset of photoacoustic images together with the five-expert ratings on five quality properties collected in a full-reference protocol.
If this is right
- New IQA measures can be trained and tested directly against expert judgments on PAI data that contain both acoustic and optical artifacts.
- The five-property rating scheme allows fine-grained evaluation rather than a single overall score.
- Because the protocol is full-reference, the dataset can serve as ground truth for comparing reconstructed images to high-quality references.
- Public release enables direct replication and extension by other groups working on medical image quality.
Where Pith is reading between the lines
- If the ratings correlate with downstream task performance, they could guide optimization of reconstruction algorithms beyond current visual inspection.
- The same expert-rating approach could be applied to other hybrid modalities that combine optical and acoustic information.
- Future studies might test whether IQA measures calibrated on PhotIQA transfer to images from different PAI hardware or reconstruction methods.
Load-bearing premise
Expert ratings collected in the full-reference setting accurately capture the relevant quality properties of photoacoustic images and remain reproducible enough to serve as a reliable benchmark.
What would settle it
A second independent round of expert ratings on the same 1134 images that produces markedly different scores on the five quality properties would show the ratings are not stable enough to benchmark algorithms.
Figures
read the original abstract
Image quality assessment (IQA) is crucial in the evaluation stage of novel algorithms operating on images, including traditional and machine learning based methods. Due to the lack of available quality-rated medical images, most commonly used full-reference IQA measures have been developed and tested for natural images. Reported pitfalls and inconsistencies arising when applying such measures for medical images are not surprising, as they rely on different properties than natural images. In photoacoustic imaging (PAI), especially, standard benchmarking approaches for assessing the quality of image reconstructions are lacking. PAI is a multi-physics imaging modality, in which two inverse problems have to be solved, which makes the application of IQA measures uniquely challenging due to both, acoustic and optical, artifacts. To support the development and testing of IQA measures we assembled PhotIQA, a data set consisting of 1134 photoacoustic images. The images were rated by five experts across five quality properties in a full-reference setting, where the detailed rating enables usage beyond PAI. The data set with the images and corresponding ratings is publicly available on Zenodo.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PhotIQA, a publicly released dataset of 1134 photoacoustic images with quality ratings from five experts on five properties in a full-reference setting. The goal is to provide a benchmark for developing and testing IQA measures specifically for photoacoustic imaging, which faces unique challenges from acoustic and optical artifacts not addressed by natural image IQA methods.
Significance. If the ratings are shown to be reliable, this dataset would be a valuable contribution to medical imaging research by filling the gap in quality-rated PAI data. It supports the evaluation of reconstruction algorithms and could improve the applicability of IQA in clinical and research settings for multi-physics modalities.
major comments (1)
- [Methods (Rating Protocol)] Methods (Rating Protocol): No inter-rater reliability statistics (e.g., ICC, Fleiss' kappa, or percentage agreement) are reported for the five experts across the five quality properties. This is load-bearing for the central claim that the dataset forms a reproducible benchmark for IQA development, as the absence of such metrics leaves open whether the ratings reflect consistent quality signal or idiosyncratic variance.
minor comments (2)
- [Abstract] Abstract: The five quality properties are referenced but not named; explicitly listing them would immediately clarify the dataset's scope for potential users.
- [Dataset Curation] Dataset Curation: Additional detail on image selection criteria and any diversity or representativeness checks would strengthen reproducibility claims without altering the core contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that inter-rater reliability metrics are essential to support the dataset as a reproducible benchmark and will incorporate them in the revision.
read point-by-point responses
-
Referee: Methods (Rating Protocol): No inter-rater reliability statistics (e.g., ICC, Fleiss' kappa, or percentage agreement) are reported for the five experts across the five quality properties. This is load-bearing for the central claim that the dataset forms a reproducible benchmark for IQA development, as the absence of such metrics leaves open whether the ratings reflect consistent quality signal or idiosyncratic variance.
Authors: We agree that the absence of inter-rater reliability statistics weakens the claim of a reproducible benchmark. We have computed the intraclass correlation coefficient (ICC) using a two-way random-effects model for absolute agreement, along with Fleiss' kappa and percentage agreement, for each of the five quality properties. The ICC values range between 0.68 and 0.82, indicating moderate to good reliability. A new paragraph and table will be added to the Methods section reporting these statistics and the computation details. This directly addresses the concern and strengthens the manuscript. revision: yes
Circularity Check
No circularity: dataset release with no derivations or fitted claims
full rationale
The paper's contribution is the assembly and public release of the PhotIQA dataset of 1134 images with expert ratings on five quality properties. No equations, derivations, predictions, or fitted parameters are present in the abstract or described content. The central claim is an empirical resource release that does not reduce to any self-referential construction, fitted input renamed as prediction, or self-citation load-bearing argument. Expert ratings are collected and described without quantitative claims that could be circular by construction. This is a standard non-circular data paper.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
To support the development and testing of IQA measures we assembled PhotIQA, a data set consisting of 1134 photoacoustic images. The images were rated by five experts across five quality properties in a full-reference setting
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
baseline experiments show that HaarPSI med significantly outperforms SSIM in correlating with the quality ratings (SRCC: 0.83 vs. 0.62)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Photoacoustics 32, 100539 (2023)
Assi, H., Cao, R., et al: A review of a strategic roadmapping ex- ercise to advance clinical translation of photoacoustic imaging: From current barriers to future adoption. Photoacoustics 32, 100539 (2023). https://doi.org/https://doi.org/10.1016/j.pacs.2023.100539
-
[2]
IEEE Access 7, 140030–140070 (09 2019)
Athar, S., Wang, Z.: A comprehensive performance evaluation of image quality assessment algorithms. IEEE Access 7, 140030–140070 (09 2019). https://doi.org/10.1109/ACCESS.2019.2943319 10
-
[3]
Journal of Imaging Informatics in Medicine (2025)
Breger, A., Biguri, A., Landman, M.S., Selby, I., Amberg, N., Brunner, E., Gr¨ ohl, J., Hatamikia, S., Karner, C., Ning, L., Dittmer, S., Roberts, M., Sch¨ onlieb, C.B., Collaboration, A.C.: A study of why we need to reassess full reference image quality assessment with medical images. Journal of Imaging Informatics in Medicine (2025). https://doi.org/10....
-
[4]
Breger, A., Karner, C., Selby, I., Gr¨ ohl, J., Dittmer, S., Lilley, E., Babar, J., Beckford, J., Sadler, T.J., Shahipasand, S., Thavakumar, A., Roberts, M., Sch¨ onlieb, C.B.: A study on the adequacy of common iqa measures for medical images. In: Proceedings of 2024 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD), Springe...
work page 2024
-
[5]
Magn Reson Imaging 34(6), 820–831 (Jul 2016)
Chow, L.S., Rajagopal, H., Paramesran, R.: Correlation between subjective and objective assessment of magnetic resonance (mr) images. Magn Reson Imaging 34(6), 820–831 (Jul 2016). https://doi.org/10.1016/j.mri.2016.03.006
-
[6]
J Biomed Opt 17(6), 061202 (Jun 2012)
Cox, B., Laufer, J.G., Arridge, S.R., Beard, P.C.: Quantitative spectroscopic photoacoustic imaging: a review. J Biomed Opt 17(6), 061202 (Jun 2012). https://doi.org/10.1117/1.JBO.17.6.061202
-
[7]
Else, T.R., Loreno, C., Groves, A., Cox, B.T., Gr¨ ohl, J., Modolell, I., Bohndiek, S.E., Roshan, A.: The confounding effects of skin colour in photoacoustic imaging. medRxiv pp. 2025–03 (2025)
work page 2025
-
[8]
IEEE Trans Med Imaging PP (Nov 2023)
Gr¨ ohl, J., Else, T.R., Hacker, L., Bunce, E.V., Sweeney, P.W., Bohndiek, S.E.: Moving beyond simulation: data-driven quantitative photoacoustic imag- ing using tissue-mimicking phantoms. IEEE Trans Med Imaging PP (Nov 2023). https://doi.org/10.1109/TMI.2023.3331198
-
[9]
arXiv preprint arXiv:2505.24514 (2025)
Gr¨ ohl, J., Kunyansky, L., Poimala, J., Else, T.R., Di Cecio, F., Bohndiek, S.E., Cox, B.T., Hauptmann, A.: Digital twins enable full-reference quality assessment of photoacoustic image reconstructions. arXiv preprint arXiv:2505.24514 (2025)
-
[10]
Photoacoustics 22, 100241 (2021)
Gr¨ ohl, J., Schellenberg, M., Dreher, K., Maier-Hein, L.: Deep learning for biomed- ical photoacoustic imaging: A review. Photoacoustics 22, 100241 (2021)
work page 2021
-
[11]
Karner, C., Gr¨ ohl, J., Selby, I., Babar, J., Beckford, J., Else, T.R., Sadler, T.J., Shahipasand, S., Thavakumar, A., Roberts, M., Rudd, J.H., Sch¨ onlieb, C.B., Weir-McCall, J.R., Breger, A.: Parameter choices in haarpsi for iqa with medical images. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2025). https://doi.org...
-
[12]
IEEE Access 11, 14154–14168 (2023)
Kastryulin, S., Zakirov, J., Pezzotti, N., Dylov, D.V.: Image quality assess- ment for magnetic resonance imaging. IEEE Access 11, 14154–14168 (2023). https://doi.org/10.1109/ACCESS.2023.3243466
-
[13]
Medical Image Analysis 99, 103343 (2025)
Lee, W., Wagner, F., Galdran, A., Shi, Y., Xia, W., Wang, G., Mou, X., Ahamed, M.A., Imran, A.A.Z., Oh, J.E., Kim, K., Baek, J.T., Lee, D., Hong, B., Tem- pelman, P., Lyu, D., Kuiper, A., van Blokland, L., Calisto, M.B., Hsieh, S., Han, M., Baek, J., Maier, A., Wang, A., Gold, G.E., Choi, J.H.: Low-dose com- puted tomography perceptual image quality asses...
-
[14]
Journal of Digital Imaging 36(6), 2623–2634 (2023)
Ohashi, K., Nagatani, Y., Yoshigoe, M., Iwai, K., Tsuchiya, K., Hino, A., Kida, Y., Yamazaki, A., Ishida, T.: Applicability evaluation of full-reference image quality assessment methods for computed tomography images. Journal of Digital Imaging 36(6), 2623–2634 (2023). https://doi.org/10.1007/s10278-023-00875-0
-
[15]
Nature Reviews Bioengineering 3(3), 193–212 (Mar 2025)
Park, J., Choi, S., Knieling, F., Clingman, B., Bohndiek, S., Wang, L.V., Kim, C.: Clinical translation of photoacoustic imaging. Nature Reviews Bioengineering 3(3), 193–212 (Mar 2025). https://doi.org/10.1038/s44222-024-00240-y
-
[16]
Pickering, J.W., Prahl, S.A., van Wieringen, N., Beek, J.F., Sterenborg, H.J.C.M., van Gemert, M.J.C.: Double-integrating-sphere system for measur- ing the optical properties of tissue. Appl. Opt. 32(4), 399–410 (Feb 1993). https://doi.org/10.1364/AO.32.000399
-
[17]
Reisenhofer, R., Bosse, S., Kutyniok, G., Wiegand, T.: A haar wavelet-based perceptual similarity index for image quality assessment. Signal Process. Image Commun. 61, 33–43 (2018). https://doi.org/10.1016/j.image.2017.11.001
-
[18]
J Med Imaging (Bellingham) 4(3), 035501 (Jul 2017)
Renieblas, G.P., Nogu´ es, A.T., Gonz´ alez, A.M., G´ omez-Leon, N., Del Castillo, E.G.: Structural similarity index family for image quality assessment in radiological images. J Med Imaging (Bellingham) 4(3), 035501 (Jul 2017). https://doi.org/10.1117/1.JMI.4.3.035501
-
[19]
arXiv preprint arXiv:2504.12772 (2025)
Rietberg, M.T., Gr¨ ohl, J., Else, T.R., Bohndiek, S.E., Manohar, S., Cox, B.T.: Artifacts in photoacoustic imaging: Origins and mitigations. arXiv preprint arXiv:2504.12772 (2025)
-
[20]
Selby, I.: Github repository speedyiqa (March 2024), https://github.com/ selbs/speedy_iqa
work page 2024
-
[21]
IEEE Transactions on Image Processing 15(11), 3440–3451 (2006)
Sheikh, H., Sabir, M., Bovik, A.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing 15(11), 3440–3451 (2006). https://doi.org/10.1109/TIP.2006.881959
-
[23]
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
-
[24]
IEEE Transactions on Image Processing 20(5), 1185–1198 (2011)
Wang, Z., Li, Q.: Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing 20(5), 1185–1198 (2011). https://doi.org/10.1109/TIP.2010.2092435
-
[25]
IEEE Transactions on Im- age Processing 23(2), 684–695 (2014)
Xue, W., Zhang, L., Mou, X., Bovik, A.C.: Gradient magnitude similarity devia- tion: A highly efficient perceptual image quality index. IEEE Transactions on Im- age Processing 23(2), 684–695 (2014). https://doi.org/10.1109/TIP.2013.2293423
-
[26]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreason- able effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068
-
[27]
Zhou Wang, E.P.S., Bovik, A.C.: Multi-scale structural similarity for image qual- ity assessment. In: Proceedings of the 37th IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (2003) 12
work page 2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.