pith. sign in

arxiv: 1907.06296 · v1 · pith:36VJ6CO3new · submitted 2019-07-14 · 💻 cs.CV

Perceptually Motivated Method for Image Inpainting Comparison

Pith reviewed 2026-05-24 21:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords image inpaintingsubjective evaluationobjective quality metricsperceptual qualityhuman studyrealism assessmentimage quality
0
0 comments X

The pith

A human study of nine inpainting algorithms yields objective metrics that track perceived realism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper notes that no standard way exists to judge inpainting results because realism depends on human perception and current objective metrics fail to match it. The authors therefore ran a subjective comparison in which observers rated outputs from nine current algorithms. From the collected human ratings they derived new objective metrics. These metrics show strong alignment with the subjective scores. The work supplies both a benchmark set of human judgments and practical metrics that future algorithms can be measured against.

Core claim

By conducting a subjective comparison of nine state-of-the-art inpainting algorithms, the authors establish a set of objective quality metrics that exhibit high correlation with human judgments of realism in inpainted images.

What carries the argument

The subjective comparison study that supplies human ratings of realism, from which new objective metrics are fitted to predict those ratings.

If this is right

  • Future inpainting algorithms can be ranked and improved using the fitted metrics without new human studies each time.
  • The collected human ratings serve as a fixed benchmark dataset for validating any new objective measure.
  • Development pipelines can target the metrics directly to produce outputs that better match observer preferences.
  • Standardized evaluation reduces reliance on ad-hoc visual inspection when comparing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same human-rating approach could be applied to related editing tasks such as denoising or super-resolution if the perceptual cues overlap.
  • Training inpainting networks with the new metrics as a loss term might directly optimize for human-like results.
  • Metrics fitted to one study may need periodic re-calibration if observer preferences or image distributions shift over time.

Load-bearing premise

The image set, observer pool, and study design produce ratings that reflect stable, general human perception of inpainting quality.

What would settle it

A new subjective study using different images or observers produces rankings that the proposed metrics predict poorly.

Figures

Figures reproduced from arXiv: 1907.06296 by Dmitry Vatolin, Ivan Molodetskikh, Mikhail Erofeev.

Figure 1
Figure 1. Figure 1: Images for the subjective inpainting comparison. The black square in the center is the area to be inpainted. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Subjective-comparison results across three images in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Three images from our test set, inpainted by three human [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results of the subjective study comparing images inpainted by human artists with images inpainted by conventional and [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of inpainting results from Artist #1 and statis [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Inpainting quality estimated by VGG-16 for one image at [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Mean Pearson and Spearman correlations between objective inpainting-quality metrics and subjective human comparisons [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Mean Pearson and Spearman correlations between objective inpainting-quality metrics and subjective human comparisons [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

The field of automatic image inpainting has progressed rapidly in recent years, but no one has yet proposed a standard method of evaluating algorithms. This absence is due to the problem's challenging nature: image-inpainting algorithms strive for realism in the resulting images, but realism is a subjective concept intrinsic to human perception. Existing objective image-quality metrics provide a poor approximation of what humans consider more or less realistic. To improve the situation and to better organize both prior and future research in this field, we conducted a subjective comparison of nine state-of-the-art inpainting algorithms and propose objective quality metrics that exhibit high correlation with the results of our comparison.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper conducts a subjective comparison of nine state-of-the-art image inpainting algorithms on a chosen image set and proposes objective quality metrics that exhibit high correlation with the subjective results, aiming to address the lack of standard perceptual evaluation methods in the field.

Significance. If the subjective study design produces a stable ground truth and the metrics generalize, the work would fill an important gap by supplying perceptually grounded evaluation tools for inpainting research. The explicit grounding in human judgments is a constructive contribution, though the internal fitting process limits immediate adoption without further validation evidence.

major comments (2)
  1. [Abstract] Abstract: the claim that the proposed metrics 'exhibit high correlation' supplies no information on participant count, image selection, statistical tests, study size, or validation procedure, preventing assessment of whether the data-to-claim link is load-bearing.
  2. [Subjective comparison / metric sections] Subjective comparison and metric construction sections: the objective metrics are defined via correlation with the authors' own subjective data; without reported cross-validation, hold-out images, or external benchmarks, the central claim that these metrics provide a reliable perceptual standard rests on internal fitting whose generalizability is untested.
minor comments (1)
  1. [Abstract / Introduction] The abstract and introduction could more explicitly state the number of images and participants to allow readers to gauge the scale of the study immediately.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater transparency in reporting our subjective study and for questioning the generalizability of the proposed metrics. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the proposed metrics 'exhibit high correlation' supplies no information on participant count, image selection, statistical tests, study size, or validation procedure, preventing assessment of whether the data-to-claim link is load-bearing.

    Authors: We agree that the abstract should supply these details to allow readers to evaluate the strength of the reported correlations. In the revised manuscript we will expand the abstract to state the number of participants, image selection criteria, statistical tests, study size, and validation procedure. revision: yes

  2. Referee: [Subjective comparison / metric sections] Subjective comparison and metric construction sections: the objective metrics are defined via correlation with the authors' own subjective data; without reported cross-validation, hold-out images, or external benchmarks, the central claim that these metrics provide a reliable perceptual standard rests on internal fitting whose generalizability is untested.

    Authors: The referee correctly notes that the metrics were fitted to the authors' subjective ratings. The original manuscript does not report cross-validation or hold-out testing. We will add a cross-validation analysis within the existing dataset to the metric-construction section; external benchmarks lie outside the scope of the present work. revision: partial

Circularity Check

1 steps flagged

Objective metrics proposed and validated solely against authors' own subjective comparison data

specific steps
  1. fitted input called prediction [Abstract]
    "we conducted a subjective comparison of nine state-of-the-art inpainting algorithms and propose objective quality metrics that exhibit high correlation with the results of our comparison."

    The objective metrics are proposed specifically because they correlate with the subjective comparison performed in the same work. The 'high correlation' result is therefore produced by the selection or construction of the metrics to fit the study's outputs, rather than by testing pre-existing metrics against independent data.

full rationale

The paper's central contribution is a subjective study of nine inpainting algorithms followed by the proposal of objective metrics that 'exhibit high correlation with the results of our comparison.' This directly matches the fitted_input_called_prediction pattern: the subjective rankings serve as the fitted input, and the metrics are selected or designed to match them, rendering the reported high correlation a consequence of the fitting process rather than an independent test. No external benchmarks, hold-out sets, or cross-study validation are indicated in the provided text, so the validation chain reduces to the study data itself. This is a moderate circularity burden (score 6) because the claim of perceptual motivation rests on internal consistency with the authors' chosen images, participants, and protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the core domain assumption visible in the text.

axioms (1)
  • domain assumption Aggregated human subjective judgments constitute a reliable and stable ground truth for perceptual realism in inpainted images.
    The paper treats the subjective comparison results as the reference against which objective metrics are judged.

pith-pipeline@v0.9.0 · 5634 in / 1137 out tokens · 22199 ms · 2026-05-24T21:20:08.210415+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 4 internal anchors

  1. [1]

    J. H. Bappy, A. K. Roy-Chowdhury, J. Bunk, L. Nataraj, and B. S. Manjunath. Exploiting spatial structure for localizing manipulated image regions. In The IEEE International Con- ference on Computer Vision (ICCV), Oct 2017. 2

  2. [2]

    R. A. Bradley and M. E. Terry. Rank analysis of incom- plete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. 3

  3. [3]

    F. Chollet. Xception: Deep learning with depthwise separable convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 5

  4. [4]

    Criminisi, P

    A. Criminisi, P. P´erez, and K. Toyama. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9):1200–1212, 2004. 1, 2, 3, 5

  5. [5]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5

  6. [6]

    He and J

    K. He and J. Sun. Statistics of patch offsets for image com- pletion. In European Conference on Computer Vision, pages 16–29. Springer, 2012. 2, 4, 5

  7. [7]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 5

  8. [8]

    K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016. 5

  9. [9]

    Iizuka, E

    S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):107, 2017. 3, 5

  10. [10]

    Johnson, A

    J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision , pages 694–711. Springer,

  11. [11]

    H. Li, G. Li, L. Lin, H. Yu, and Y . Yu. Context-aware semantic inpainting. IEEE Transactions on Cybernetics, 2018. 1

  12. [12]

    H. Li, W. Luo, X. Qiu, and J. Huang. Image forgery localiza- tion via integrating tampering possibility maps. IEEE Trans- actions on Information Forensics and Security, 12(5):1240– 1252, 2017. 2

  13. [13]

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vi- sion, pages 740–755. Springer, 2014. 5

  14. [14]

    C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In The European Conference on Computer Vision (ECCV), September 2018. 5

  15. [15]

    G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro. Image inpainting for irregular holes using par- tial convolutions. In The European Conference on Computer Vision (ECCV), September 2018. 1, 3

  16. [16]

    P. Liu, X. Qi, P. He, Y . Li, M. R. Lyu, and I. King. Semanti- cally consistent image completion with fine-grained details. arXiv preprint arXiv:1711.09345, 2017. 1

  17. [17]

    Spectral Normalization for Generative Adversarial Networks

    T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018. 5

  18. [18]

    Pathak, P

    D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 1

  19. [19]

    Pun, X.-C

    C.-M. Pun, X.-C. Yuan, and X.-L. Bi. Image forgery detection using adaptive oversegmentation and feature point matching. IEEE Transactions on Information Forensics and Security , 10(8):1705–1716, 2015. 2

  20. [20]

    Salloum, Y

    R. Salloum, Y . Ren, and C.-C. J. Kuo. Image splicing local- ization using a multi-task fully convolutional network (mfcn). Journal of Visual Communication and Image Representation, 51:201–209, 2018. 2

  21. [21]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5

  22. [22]

    Y . Song, C. Yang, Z. Lin, H. Li, Q. Huang, and C. J. Kuo. Image inpainting using multi-scale feature image translation. arXiv preprint arXiv:1711.08590, 2, 2017. 1

  23. [23]

    Szegedy, S

    C. Szegedy, S. Ioffe, V . Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual con- nections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017. 5

  24. [24]

    Szegedy, V

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 5

  25. [25]

    A. Telea. An image inpainting technique based on the fast marching method. Journal of Graphics Tools, 9(1):23–34,

  26. [26]

    Ulyanov, A

    D. Ulyanov, A. Vedaldi, and V . Lempitsky. Deep image prior. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 3

  27. [27]

    Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600– 612, 2004. 5

  28. [28]

    Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan. Shift-net: Image inpainting via deep feature rearrangement. In The European Conference on Computer Vision (ECCV), September 2018. 3, 5

  29. [29]

    C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li. High-resolution image inpainting using multi-scale neural patch synthesis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 2, 3

  30. [30]

    J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Free- form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589, 2018. 1

  31. [31]

    J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Gen- erative image inpainting with contextual attention. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2018. 1, 3, 4, 5

  32. [32]

    P. Zhou, X. Han, V . I. Morariu, and L. S. Davis. Learning rich features for image manipulation detection. In The IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), June 2018. 2, 5

  33. [33]

    X. Zhu, Y . Qian, X. Zhao, B. Sun, and Y . Sun. A deep learning approach to patch-based image inpainting forensics. Signal Processing: Image Communication, 67:90–99, 2018. 2