pith. sign in

arxiv: 2507.03478 · v2 · submitted 2025-07-04 · 📡 eess.IV · cs.CV

PhotIQA: A photoacoustic image data set with image quality ratings

Pith reviewed 2026-05-19 06:23 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords photoacoustic imagingimage quality assessmentdatasetexpert ratingsfull-reference IQAmedical imagingPAI
0
0 comments X

The pith

The PhotIQA dataset supplies 1134 photoacoustic images with expert ratings on five quality properties to benchmark image quality assessment methods for medical imaging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Photoacoustic imaging solves two inverse problems and therefore produces both acoustic and optical artifacts that differ from those in natural images. Standard full-reference IQA measures developed on natural scenes therefore perform inconsistently on PAI reconstructions. The authors assembled and released PhotIQA, a set of 1134 images rated by five experts on five distinct quality properties in a full-reference protocol. A sympathetic reader cares because the dataset supplies the missing benchmark that lets researchers develop and validate IQA measures tailored to this multi-physics modality.

Core claim

The authors assembled and publicly released PhotIQA, a dataset consisting of 1134 photoacoustic images rated by five experts across five quality properties in a full-reference setting. The images and ratings are available on Zenodo to support development and testing of IQA measures for photoacoustic imaging and other applications where medical images require quality assessment.

What carries the argument

The PhotIQA dataset of photoacoustic images together with the five-expert ratings on five quality properties collected in a full-reference protocol.

If this is right

  • New IQA measures can be trained and tested directly against expert judgments on PAI data that contain both acoustic and optical artifacts.
  • The five-property rating scheme allows fine-grained evaluation rather than a single overall score.
  • Because the protocol is full-reference, the dataset can serve as ground truth for comparing reconstructed images to high-quality references.
  • Public release enables direct replication and extension by other groups working on medical image quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the ratings correlate with downstream task performance, they could guide optimization of reconstruction algorithms beyond current visual inspection.
  • The same expert-rating approach could be applied to other hybrid modalities that combine optical and acoustic information.
  • Future studies might test whether IQA measures calibrated on PhotIQA transfer to images from different PAI hardware or reconstruction methods.

Load-bearing premise

Expert ratings collected in the full-reference setting accurately capture the relevant quality properties of photoacoustic images and remain reproducible enough to serve as a reliable benchmark.

What would settle it

A second independent round of expert ratings on the same 1134 images that produces markedly different scores on the five quality properties would show the ratings are not stable enough to benchmark algorithms.

Figures

Figures reproduced from arXiv: 2507.03478 by Anna Breger, Carola-Bibiane Sch\"onlieb, Clemens Karner, Ian Selby, Janek Gr\"ohl, Jonathan Weir-McCall, Lara-Sophie Witt, Merle Duch\^ene, Thomas R Else, Tom Rix.

Figure 1
Figure 1. Figure 1: Two examples of images in PhotIQA, references (a) and the reconstructions from the described algorithms (b-d). Algorithm 1 (b) corrects a reconstructed PA image by using the light fluence obtained from simulations. Algorithms 2 and 3 (c-d) are deep-learning models trained to estimate the absorption coefficient. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The speedyIQA annotation app allows setting a task and rating cat [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Four examples of quality ratings from both annotators for the different de￾tailed quality properties. (Top) reference image and (bottom) reconstructed, assessed image. • ”filename”: File name of the distorted image file • TASK+” 1”: Ratings from the first expert • TASK+” 2”: Ratings from the second expert 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Box plot of the absolute differences (top) and the absolute differences of z-scores (bottom) of both raters with the median (green line) and mean (striped green line) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The distribution of the ratings given by both experts for all quality prop￾erties. regarding the intensity values, as it directly computes the pixel-wise difference. In line with previous experiments, besides HaarPSI [17], MS-SSIM [27], IW￾SSIM [24], LPIPS [26], and GMSD [25] show promising behaviors. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Four examples of rating disagreements corresponding to box plot outliers in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Image quality assessment (IQA) is crucial in the evaluation stage of novel algorithms operating on images, including traditional and machine learning based methods. Due to the lack of available quality-rated medical images, most commonly used full-reference IQA measures have been developed and tested for natural images. Reported pitfalls and inconsistencies arising when applying such measures for medical images are not surprising, as they rely on different properties than natural images. In photoacoustic imaging (PAI), especially, standard benchmarking approaches for assessing the quality of image reconstructions are lacking. PAI is a multi-physics imaging modality, in which two inverse problems have to be solved, which makes the application of IQA measures uniquely challenging due to both, acoustic and optical, artifacts. To support the development and testing of IQA measures we assembled PhotIQA, a data set consisting of 1134 photoacoustic images. The images were rated by five experts across five quality properties in a full-reference setting, where the detailed rating enables usage beyond PAI. The data set with the images and corresponding ratings is publicly available on Zenodo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents PhotIQA, a publicly released dataset of 1134 photoacoustic images with quality ratings from five experts on five properties in a full-reference setting. The goal is to provide a benchmark for developing and testing IQA measures specifically for photoacoustic imaging, which faces unique challenges from acoustic and optical artifacts not addressed by natural image IQA methods.

Significance. If the ratings are shown to be reliable, this dataset would be a valuable contribution to medical imaging research by filling the gap in quality-rated PAI data. It supports the evaluation of reconstruction algorithms and could improve the applicability of IQA in clinical and research settings for multi-physics modalities.

major comments (1)
  1. [Methods (Rating Protocol)] Methods (Rating Protocol): No inter-rater reliability statistics (e.g., ICC, Fleiss' kappa, or percentage agreement) are reported for the five experts across the five quality properties. This is load-bearing for the central claim that the dataset forms a reproducible benchmark for IQA development, as the absence of such metrics leaves open whether the ratings reflect consistent quality signal or idiosyncratic variance.
minor comments (2)
  1. [Abstract] Abstract: The five quality properties are referenced but not named; explicitly listing them would immediately clarify the dataset's scope for potential users.
  2. [Dataset Curation] Dataset Curation: Additional detail on image selection criteria and any diversity or representativeness checks would strengthen reproducibility claims without altering the core contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that inter-rater reliability metrics are essential to support the dataset as a reproducible benchmark and will incorporate them in the revision.

read point-by-point responses
  1. Referee: Methods (Rating Protocol): No inter-rater reliability statistics (e.g., ICC, Fleiss' kappa, or percentage agreement) are reported for the five experts across the five quality properties. This is load-bearing for the central claim that the dataset forms a reproducible benchmark for IQA development, as the absence of such metrics leaves open whether the ratings reflect consistent quality signal or idiosyncratic variance.

    Authors: We agree that the absence of inter-rater reliability statistics weakens the claim of a reproducible benchmark. We have computed the intraclass correlation coefficient (ICC) using a two-way random-effects model for absolute agreement, along with Fleiss' kappa and percentage agreement, for each of the five quality properties. The ICC values range between 0.68 and 0.82, indicating moderate to good reliability. A new paragraph and table will be added to the Methods section reporting these statistics and the computation details. This directly addresses the concern and strengthens the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release with no derivations or fitted claims

full rationale

The paper's contribution is the assembly and public release of the PhotIQA dataset of 1134 images with expert ratings on five quality properties. No equations, derivations, predictions, or fitted parameters are present in the abstract or described content. The central claim is an empirical resource release that does not reduce to any self-referential construction, fitted input renamed as prediction, or self-citation load-bearing argument. Expert ratings are collected and described without quantitative claims that could be circular by construction. This is a standard non-circular data paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a data-release contribution with no mathematical derivations, physical models, or fitted parameters; the only implicit premises are standard assumptions about expert rating reliability and image representativeness for the PAI domain.

pith-pipeline@v0.9.0 · 5762 in / 1114 out tokens · 49248 ms · 2026-05-19T06:23:19.759953+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Photoacoustics 32, 100539 (2023)

    Assi, H., Cao, R., et al: A review of a strategic roadmapping ex- ercise to advance clinical translation of photoacoustic imaging: From current barriers to future adoption. Photoacoustics 32, 100539 (2023). https://doi.org/https://doi.org/10.1016/j.pacs.2023.100539

  2. [2]

    IEEE Access 7, 140030–140070 (09 2019)

    Athar, S., Wang, Z.: A comprehensive performance evaluation of image quality assessment algorithms. IEEE Access 7, 140030–140070 (09 2019). https://doi.org/10.1109/ACCESS.2019.2943319 10

  3. [3]

    Journal of Imaging Informatics in Medicine (2025)

    Breger, A., Biguri, A., Landman, M.S., Selby, I., Amberg, N., Brunner, E., Gr¨ ohl, J., Hatamikia, S., Karner, C., Ning, L., Dittmer, S., Roberts, M., Sch¨ onlieb, C.B., Collaboration, A.C.: A study of why we need to reassess full reference image quality assessment with medical images. Journal of Imaging Informatics in Medicine (2025). https://doi.org/10....

  4. [4]

    In: Proceedings of 2024 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD), Springer Lecture Notes in Electrical Engineering (2024)

    Breger, A., Karner, C., Selby, I., Gr¨ ohl, J., Dittmer, S., Lilley, E., Babar, J., Beckford, J., Sadler, T.J., Shahipasand, S., Thavakumar, A., Roberts, M., Sch¨ onlieb, C.B.: A study on the adequacy of common iqa measures for medical images. In: Proceedings of 2024 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD), Springe...

  5. [5]

    Magn Reson Imaging 34(6), 820–831 (Jul 2016)

    Chow, L.S., Rajagopal, H., Paramesran, R.: Correlation between subjective and objective assessment of magnetic resonance (mr) images. Magn Reson Imaging 34(6), 820–831 (Jul 2016). https://doi.org/10.1016/j.mri.2016.03.006

  6. [6]

    J Biomed Opt 17(6), 061202 (Jun 2012)

    Cox, B., Laufer, J.G., Arridge, S.R., Beard, P.C.: Quantitative spectroscopic photoacoustic imaging: a review. J Biomed Opt 17(6), 061202 (Jun 2012). https://doi.org/10.1117/1.JBO.17.6.061202

  7. [7]

    medRxiv pp

    Else, T.R., Loreno, C., Groves, A., Cox, B.T., Gr¨ ohl, J., Modolell, I., Bohndiek, S.E., Roshan, A.: The confounding effects of skin colour in photoacoustic imaging. medRxiv pp. 2025–03 (2025)

  8. [8]

    IEEE Trans Med Imaging PP (Nov 2023)

    Gr¨ ohl, J., Else, T.R., Hacker, L., Bunce, E.V., Sweeney, P.W., Bohndiek, S.E.: Moving beyond simulation: data-driven quantitative photoacoustic imag- ing using tissue-mimicking phantoms. IEEE Trans Med Imaging PP (Nov 2023). https://doi.org/10.1109/TMI.2023.3331198

  9. [9]

    arXiv preprint arXiv:2505.24514 (2025)

    Gr¨ ohl, J., Kunyansky, L., Poimala, J., Else, T.R., Di Cecio, F., Bohndiek, S.E., Cox, B.T., Hauptmann, A.: Digital twins enable full-reference quality assessment of photoacoustic image reconstructions. arXiv preprint arXiv:2505.24514 (2025)

  10. [10]

    Photoacoustics 22, 100241 (2021)

    Gr¨ ohl, J., Schellenberg, M., Dreher, K., Maier-Hein, L.: Deep learning for biomed- ical photoacoustic imaging: A review. Photoacoustics 22, 100241 (2021)

  11. [11]

    Benchmarking transferability of self-supervised pretrain- ingformulti-organsegmentationondifferentmodalities

    Karner, C., Gr¨ ohl, J., Selby, I., Babar, J., Beckford, J., Else, T.R., Sadler, T.J., Shahipasand, S., Thavakumar, A., Roberts, M., Rudd, J.H., Sch¨ onlieb, C.B., Weir-McCall, J.R., Breger, A.: Parameter choices in haarpsi for iqa with medical images. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 1–5 (2025). https://doi.org...

  12. [12]

    IEEE Access 11, 14154–14168 (2023)

    Kastryulin, S., Zakirov, J., Pezzotti, N., Dylov, D.V.: Image quality assess- ment for magnetic resonance imaging. IEEE Access 11, 14154–14168 (2023). https://doi.org/10.1109/ACCESS.2023.3243466

  13. [13]

    Medical Image Analysis 99, 103343 (2025)

    Lee, W., Wagner, F., Galdran, A., Shi, Y., Xia, W., Wang, G., Mou, X., Ahamed, M.A., Imran, A.A.Z., Oh, J.E., Kim, K., Baek, J.T., Lee, D., Hong, B., Tem- pelman, P., Lyu, D., Kuiper, A., van Blokland, L., Calisto, M.B., Hsieh, S., Han, M., Baek, J., Maier, A., Wang, A., Gold, G.E., Choi, J.H.: Low-dose com- puted tomography perceptual image quality asses...

  14. [14]

    Journal of Digital Imaging 36(6), 2623–2634 (2023)

    Ohashi, K., Nagatani, Y., Yoshigoe, M., Iwai, K., Tsuchiya, K., Hino, A., Kida, Y., Yamazaki, A., Ishida, T.: Applicability evaluation of full-reference image quality assessment methods for computed tomography images. Journal of Digital Imaging 36(6), 2623–2634 (2023). https://doi.org/10.1007/s10278-023-00875-0

  15. [15]

    Nature Reviews Bioengineering 3(3), 193–212 (Mar 2025)

    Park, J., Choi, S., Knieling, F., Clingman, B., Bohndiek, S., Wang, L.V., Kim, C.: Clinical translation of photoacoustic imaging. Nature Reviews Bioengineering 3(3), 193–212 (Mar 2025). https://doi.org/10.1038/s44222-024-00240-y

  16. [16]

    Pickering, J.W., Prahl, S.A., van Wieringen, N., Beek, J.F., Sterenborg, H.J.C.M., van Gemert, M.J.C.: Double-integrating-sphere system for measur- ing the optical properties of tissue. Appl. Opt. 32(4), 399–410 (Feb 1993). https://doi.org/10.1364/AO.32.000399

  17. [17]

    Reisenhofer, S

    Reisenhofer, R., Bosse, S., Kutyniok, G., Wiegand, T.: A haar wavelet-based perceptual similarity index for image quality assessment. Signal Process. Image Commun. 61, 33–43 (2018). https://doi.org/10.1016/j.image.2017.11.001

  18. [18]

    J Med Imaging (Bellingham) 4(3), 035501 (Jul 2017)

    Renieblas, G.P., Nogu´ es, A.T., Gonz´ alez, A.M., G´ omez-Leon, N., Del Castillo, E.G.: Structural similarity index family for image quality assessment in radiological images. J Med Imaging (Bellingham) 4(3), 035501 (Jul 2017). https://doi.org/10.1117/1.JMI.4.3.035501

  19. [19]

    arXiv preprint arXiv:2504.12772 (2025)

    Rietberg, M.T., Gr¨ ohl, J., Else, T.R., Bohndiek, S.E., Manohar, S., Cox, B.T.: Artifacts in photoacoustic imaging: Origins and mitigations. arXiv preprint arXiv:2504.12772 (2025)

  20. [20]

    Selby, I.: Github repository speedyiqa (March 2024), https://github.com/ selbs/speedy_iqa

  21. [21]

    IEEE Transactions on Image Processing 15(11), 3440–3451 (2006)

    Sheikh, H., Sabir, M., Bovik, A.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing 15(11), 3440–3451 (2006). https://doi.org/10.1109/TIP.2006.881959

  22. [23]

    Wang, A.C

    Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861

  23. [24]

    IEEE Transactions on Image Processing 20(5), 1185–1198 (2011)

    Wang, Z., Li, Q.: Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing 20(5), 1185–1198 (2011). https://doi.org/10.1109/TIP.2010.2092435

  24. [25]

    IEEE Transactions on Im- age Processing 23(2), 684–695 (2014)

    Xue, W., Zhang, L., Mou, X., Bovik, A.C.: Gradient magnitude similarity devia- tion: A highly efficient perceptual image quality index. IEEE Transactions on Im- age Processing 23(2), 684–695 (2014). https://doi.org/10.1109/TIP.2013.2293423

  25. [26]

    Zhang, P

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreason- able effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068

  26. [27]

    In: Proceedings of the 37th IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (2003) 12

    Zhou Wang, E.P.S., Bovik, A.C.: Multi-scale structural similarity for image qual- ity assessment. In: Proceedings of the 37th IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (2003) 12