pith. sign in

arxiv: 1907.05283 · v1 · pith:2GEO4RITnew · submitted 2019-07-08 · 💻 cs.CV · cs.LG· eess.IV· stat.ML

A Comparison of Super-Resolution and Nearest Neighbors Interpolation Applied to Object Detection on Satellite Data

Pith reviewed 2026-05-25 01:00 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IVstat.ML
keywords super-resolutionobject detectionsatellite imagerynearest neighborsxViewYOLOv2image upscaling
0
0 comments X

The pith

Nearest neighbors upscaling matches deep super-resolution for object detection accuracy on satellite images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares multi-scale deep super-resolution against simple nearest-neighbors interpolation as a preprocessing step before running object detection on satellite imagery. It introduces a pipeline that tiles large scenes, upsamples them four times, applies YOLOv2 detection, and stitches the labels back together. The work shows that four-times upscaling raises average precision for a vehicle class by 23 percent. Yet the two upscaling methods produce detection scores that differ by only 0.0002 AP, despite the super-resolved images appearing sharper to the eye. This indicates that the extra effort of learned super-resolution yields negligible benefit for the downstream detection task under the tested conditions.

Core claim

When satellite images from the xView dataset are upscaled by a factor of four from 30 cm to an effective 7.5 cm ground sample distance before object detection, the Multi-scale Deep Super-Resolution model and nearest-neighbor interpolation produce nearly identical results, with a difference of only 0.0002 in average precision.

What carries the argument

A multi-stage tiling and label-stitching pipeline that applies either MDSR or nearest-neighbor upscaling at 4x before YOLOv2 detection.

Load-bearing premise

The chosen MDSR model, YOLOv2 architecture, xView dataset, and tiling pipeline are representative enough that the near-equivalence between super-resolution and nearest neighbors would hold for other super-resolution methods, detectors, or satellite datasets.

What would settle it

Repeating the experiment on a different satellite dataset or with a different detector and measuring an average-precision gap larger than 0.01 between MDSR and nearest neighbors would disprove the reported equivalence.

Figures

Figures reproduced from arXiv: 1907.05283 by Cem Safak Sahin, Evan Koester.

Figure 2
Figure 2. Figure 2: Upscaling a Single xView Object. We extracted a 48x48 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Object Detection Pipeline using multi-stage tiling, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: PR Curve demonstrating the effects of 1-Stage tiling [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Object Detection Results on a Parking Lot Scene with column (a) showing a 1-stage tiling schema without upsampling, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

As Super-Resolution (SR) has matured as a research topic, it has been applied to additional topics beyond image reconstruction. In particular, combining classification or object detection tasks with a super-resolution preprocessing stage has yielded improvements in accuracy especially with objects that are small relative to the scene. While SR has shown promise, a study comparing SR and naive upscaling methods such as Nearest Neighbors (NN) interpolation when applied as a preprocessing step for object detection has not been performed. We apply the topic to satellite data and compare the Multi-scale Deep Super-Resolution (MDSR) system to NN on the xView challenge dataset. To do so, we propose a pipeline for processing satellite data that combines multi-stage image tiling and upscaling, the YOLOv2 object detection architecture, and label stitching. We compare the effects of training models using an upscaling factor of 4, upscaling images from 30cm Ground Sample Distance (GSD) to an effective GSD of 7.5cm. Upscaling by this factor significantly improves detection results, increasing Average Precision (AP) of a generalized vehicle class by 23 percent. We demonstrate that while SR produces upscaled images that are more visually pleasing than their NN counterparts, object detection networks see little difference in accuracy with images upsampled using NN obtaining nearly identical results to the MDSRx4 enhanced images with a difference of 0.0002 AP between the two methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript compares Multi-scale Deep Super-Resolution (MDSR) against Nearest Neighbors (NN) interpolation as 4x upscaling preprocessing for YOLOv2 object detection on the xView satellite dataset. It reports that upscaling improves Average Precision (AP) for a vehicle class by 23%, yet MDSR and NN produce nearly identical detection performance (AP difference of 0.0002). A multi-stage tiling and label-stitching pipeline is proposed to handle the data.

Significance. If the reported equivalence is statistically reliable, the result would indicate that learned super-resolution provides negligible benefit over naive interpolation for downstream object detection on satellite imagery, potentially simplifying large-scale preprocessing pipelines without loss of accuracy.

major comments (1)
  1. [Abstract] Abstract: The central claim that NN and MDSR yield 'nearly identical' results (difference of 0.0002 AP) is presented as a single scalar without error bars, standard deviations, confidence intervals, or results from repeated training runs. Given the stochastic elements in YOLOv2 (initialization, data order, augmentations) and the multi-stage tiling pipeline, this difference cannot be distinguished from run-to-run variance, undermining the equivalence conclusion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript comparing MDSR and nearest-neighbor interpolation for object detection preprocessing. The single major comment concerns the statistical reliability of the reported AP difference. We address this point directly below and outline revisions to improve the presentation of results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that NN and MDSR yield 'nearly identical' results (difference of 0.0002 AP) is presented as a single scalar without error bars, standard deviations, confidence intervals, or results from repeated training runs. Given the stochastic elements in YOLOv2 (initialization, data order, augmentations) and the multi-stage tiling pipeline, this difference cannot be distinguished from run-to-run variance, undermining the equivalence conclusion.

    Authors: We agree that the manuscript presents the 0.0002 AP difference from single training runs without variance estimates or repeated experiments, which limits the strength of the equivalence claim given the stochastic nature of YOLOv2 training. While the absolute difference remains negligible relative to the 23% AP gain from 4x upscaling, we acknowledge this does not fully rule out run-to-run effects. In the revised manuscript we will add results from multiple independent training runs (different random seeds) for both methods, reporting mean AP values with standard deviations and updating the abstract and results sections accordingly to provide a more robust basis for the comparison. revision: yes

Circularity Check

0 steps flagged

Purely empirical comparison; no derivation or self-referential reduction

full rationale

The paper reports experimental results from applying two fixed upscaling methods (MDSR and nearest-neighbor interpolation) as preprocessing before training YOLOv2 on the xView dataset. No equations, fitted parameters, uniqueness theorems, or ansatzes are introduced whose outputs are then presented as independent predictions. The central numerical claim (0.0002 AP difference) is a direct measurement on held-out data, not a quantity that reduces to its own inputs by construction. Self-citations are absent from the load-bearing steps. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical machine-learning comparison study; no free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5801 in / 1010 out tokens · 23950 ms · 2026-05-25T01:00:16.796726+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 8 internal anchors

  1. [1]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009

  2. [2]

    Everingham, L

    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser- man, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol. 88, no. 2, pp. 303338, Sep. 2009

  3. [3]

    Lin et al., Microsoft COCO: Common Objects in Context, in Computer Vision ECCV 2014, Springer International Publishing, 2014, pp

    T.-Y . Lin et al., Microsoft COCO: Common Objects in Context, in Computer Vision ECCV 2014, Springer International Publishing, 2014, pp. 740755

  4. [4]

    Krizhevsky

    A. Krizhevsky. Learning multiple layers of features from tiny images. Tech Report, 2009

  5. [5]

    F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. ”Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size”. arXiv preprint, arXiv:1602.07360, 2016

  6. [6]

    7132-7141

    Jie Hu, Li Shen, and Gang Sun, ”Squeeze-and-Excitation Networks,”; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132-7141

  7. [7]

    S. Ren, K. He, R. Girshick, and J. Sun, ”Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 11371149, Jun. 2017

  8. [8]

    D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y . Bulatov, and B. McCord, ”xView: objects in context in overhead imagery”. arXiv preprint, arXiv:1802.07856, 2018

  9. [9]

    COFGA: Classification Of Fine-Grained Features In Aerial Images

    E. Dahan, and T. Diskin, ”COFGA: Classification Of Fine-Grained Features In Aerial Images,” arXiv preprint, arXiv:1808.09001, 2018

  10. [10]

    G. Xia, X. Bai, J. Ding, Z. Zhu, S. J. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, ”DOTA: a large-scale dataset for object detection in aerial images,” arXiv preprint, arXiv:1711.10398, 2018

  11. [11]

    T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, ”A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning,” Computer Vision ECCV 2016, 9907:785800, 2016

  12. [12]

    Redmon, S

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  13. [13]

    Liu et al., SSD: Single Shot MultiBox Detector, in Computer Vision ECCV 2016, Springer International Publishing, 2016, pp

    W. Liu et al., SSD: Single Shot MultiBox Detector, in Computer Vision ECCV 2016, Springer International Publishing, 2016, pp. 2137

  14. [14]

    K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  15. [15]

    K. He, G. Gkioxari, P. Dollar, and R. Girshick, Mask R-CNN, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017

  16. [16]

    Redmon and A

    J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  17. [17]

    Adam: A Method for Stochastic Optimization

    D. Kingma, and J. Ba, ”Adam: A Method for Stochastic Optimization,” arXiv preprint, arXiv:1412.6980, 2017

  18. [18]

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, Focal Loss for Dense Object Detection, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017. (a) (b) (c) (d) Fig. 5: Object Detection Results on a Parking Lot Scene with column (a) showing a 1-stage tiling schema without upsampling, column (b) a 1-stage tiling with a NNx2 upscaling,...

  19. [19]

    D. Lam et. al., ”A New Loss Function for CNN Classifier Based on Pre-defined Evenly-Distributed Class Centroids,” arXiv preprint, arXiv:1904.06008, 2019

  20. [20]

    Reduced Focal Loss: 1st Place Solution to xView object detection in Satellite Imagery

    N. Sergievskiy, and A. Ponamarev, ”Reduced focal loss: 1st place solution to xview object detection in satellite imagery,” arXiv preprint, arXiv:1903.01347, 2019

  21. [21]

    Ledig et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

    C. Ledig et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  22. [22]

    B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, Enhanced Deep Residual Networks for Single Image Super-Resolution, in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017

  23. [23]

    Haris, G

    M. Haris, G. Shakhnarovich, and N. Ukita, Deep Back-Projection Networks for Super-Resolution, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

  24. [24]

    T. Tong, G. Li, X. Liu, and Q. Gao, Image Super-Resolution Using Dense Skip Connections, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017

  25. [25]

    Lai, J.-B

    W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  26. [26]

    In: Proceedings of the International Conference on Curves and Surfaces

    Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparserepresentations. In: Proceedings of the International Conference on Curves and Surfaces. pp. 711730 (2010)

  27. [27]

    Martin, C

    D. Martin, C. Fowlkes, D. Tal, and J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001

  28. [28]

    Agustsson and R

    E. Agustsson and R. Timofte, NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study, in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017

  29. [29]

    W. Shi et al., Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  30. [30]

    Shermeyer, and A

    J. Shermeyer, and A. Van Etten, ”The effects of super-resolution on object detection performance in satellite imagery,”, arXiv preprint, arXiv:1812:04098, December 2018

  31. [31]

    Bosch, C

    M. Bosch, C. M. Gifford, and P. A. Rodriguez, Super-Resolution for Overhead Imagery Using DenseNets and Adversarial Learning, in 2018 IEEE Winter Conference on Applications of Computer Vision (W ACV), 2018

  32. [32]

    Task-Driven Super Resolution: Object Detection in Low-resolution Images

    M Haris, G. Shakhnarovich, and N. Ukita, ”Task-driven super res- olution: object detection in low-resolution images,” arXiv preprint, arXiv:1803.11316, 2018

  33. [33]

    S. N. Ferdous, M. Mostofa, and N. Nasrabadi, Super resolution-assisted deep aerial vehicle detection, in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, 2019