pith. sign in

arxiv: 1907.04404 · v1 · pith:QDR56NECnew · submitted 2019-07-09 · 💻 cs.CV

A New Stereo Benchmarking Dataset for Satellite Images

Pith reviewed 2026-05-25 00:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords satellite stereodisparity ground truthbenchmarking datasetmulti-date imagesDSM fusionLiDAR alignmentstereo rectificationWorldView images
0
0 comments X

The pith

A new public dataset supplies groundtruthed disparities for stereo pairs from multi-date satellite images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a benchmarking dataset consisting of stereo-rectified satellite images and their associated ground truth disparities for ten areas of interest. Disparities are derived by building a fused digital surface model from the stereo pairs and aligning it to 30-centimeter LiDAR data, with additional validation through human-annotated points in two of the areas. The dataset includes building masks to indicate regions where stereo matching should be reliable despite seasonal differences in the images, along with metadata on acquisition parameters. Rectification accuracy reaches levels comparable to prior state-of-the-art stereo datasets, and all images are released publicly.

Core claim

The authors establish a dataset of multi-date satellite stereo pairs with disparities groundtruthed via fused DSM construction from the pairs followed by alignment to 30 cm LiDAR, and they demonstrate through quantitative human point evaluation in two AOIs that the disparities are accurate while rectification matches existing benchmarks.

What carries the argument

The fused DSM from stereo pairs aligned to LiDAR, which generates the ground truth disparities.

If this is right

  • Stereo reconstruction algorithms can now be tested on multi-date satellite data with known seasonal variations.
  • Building masks enable evaluation focused on reliable matching regions.
  • Included metadata on dates, angles, and intersection angles supports detailed analysis of stereo pairs.
  • Accuracy analyses provide benchmarks for the quality of the ground truth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset may support development of algorithms robust to seasonal changes in vegetation and lighting.
  • Validation approach could extend to creating ground truth for other satellite stereo collections.
  • Public availability allows community-wide comparison of stereo methods on real-world satellite imagery.

Load-bearing premise

Aligning the fused DSM constructed from the stereo pairs to 30 cm LiDAR produces accurate ground truth disparities that human annotated points can independently confirm without major systematic errors from fusion or alignment.

What would settle it

Discrepancies between the groundtruthed disparities and a larger set of human annotated points across more AOIs, or independent checks revealing consistent biases in the fused DSM alignment.

Figures

Figures reproduced from arXiv: 1907.04404 by Avinash C. Kak, Bharath Comandur, Sonali Patil, Tanmay Prakash.

Figure 1
Figure 1. Figure 1: Qualitative results for our groundtruthed disparity maps from the top four largest AOIs. From left to right the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A stereo rectified chip pair from the MP2 AOI, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall pipeline of our approach, showing differ [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The input raw data includes RPC files for camera parameters and NTF files for raw sensor data. SRTM (Shuttle [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of our groundtruthing process. Referring to labels 1-4 in the figure, (1) We map points from stereo [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average chip-level rectification error (absolute y-error) distributions for all eleven datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Disparity errors (using human annotated ground [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Stereo Matching results – “τ < 25” denotes re￾sults on pairs where the time interval is less than a month and “100 < τ < 150” denotes results on pairs where the interval is between 100 to 250 days. 7. Conclusion We have contributed a large benchmarking stereo dataset for out-of-date satellite images and also provided a frame￾work for how such a dataset can be constructed. In the dataset we make available, … view at source ↗
read the original abstract

In order to facilitate further research in stereo reconstruction with multi-date satellite images, the goal of this paper is to provide a set of stereo-rectified images and the associated groundtruthed disparities for 10 AOIs (Area of Interest) drawn from two sources: 8 AOIs from IARPA's MVS Challenge dataset and 2 AOIs from the CORE3D-Public dataset. The disparities were groundtruthed by first constructing a fused DSM from the stereo pairs and by aligning 30 cm LiDAR with the fused DSM. Unlike the existing benckmarking datasets, we have also carried out a quantitative evaluation of our groundtruthed disparities using human annotated points in two of the AOIs. Additionally, the rectification accuracy in our dataset is comparable to the same in the existing state-of-the-art stereo datasets. In general, we have used the WorldView-3 (WV3) images for the dataset, the exception being the UCSD area for which we have used both WV3 and WorldView-2 (WV2) images. All of the dataset images are now in the public domain. Since multi-date satellite images frequently include images acquired in different seasons (which creates challenges in finding corresponding pairs of pixels for stereo), our dataset also includes for each image a building mask over which the disparities estimated by stereo should prove reliable. Additional metadata included in the dataset includes information about each image's acquisition date and time, the azimuth and elevation angles of the camera, and the intersection angles for the two views in a stereo pair. Also included in the dataset are both quantitative and qualitative analyses of the accuracy of the groundtruthed disparity maps. Our dataset is available for download at \url{https://engineering.purdue.edu/RVL/Database/SatStereo/index.html}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a new public stereo benchmarking dataset for multi-date satellite images, covering 10 AOIs (8 from IARPA MVS Challenge, 2 from CORE3D-Public) with stereo-rectified WorldView-3 (and some WorldView-2) image pairs, groundtruthed disparities, building masks, acquisition metadata, and accuracy analyses. Ground truth disparities are constructed by fusing a DSM from the input stereo pairs and aligning it to 30 cm LiDAR; quantitative validation against human-annotated points is provided for two AOIs, and rectification accuracy is stated to be comparable to existing datasets.

Significance. If the ground truth holds, the dataset would be a useful addition for evaluating stereo methods on challenging multi-date satellite imagery with seasonal changes, as it supplies building masks to focus on reliable regions and includes human-point validation absent from prior benchmarks. The public release and metadata on intersection angles and acquisition times are practical strengths for reproducibility.

major comments (2)
  1. [Abstract] Abstract (groundtruthing pipeline): constructing the fused DSM directly from the same stereo pairs whose disparities are being benchmarked risks systematic bias (e.g., from seasonal appearance changes, building edges, or low intersection angles) that propagates into the LiDAR alignment; the manuscript provides no quantitative analysis of error propagation or exclusion criteria for such failure modes.
  2. [Abstract] Abstract (validation coverage): human-annotated point validation is reported for only two of ten AOIs; the remaining eight AOIs rest entirely on the untested fusion-alignment chain, so the central claim that the released disparities constitute reliable ground truth for benchmarking is not fully supported across the dataset.
minor comments (2)
  1. [Abstract] Abstract: 'benckmarking' is a typo for 'benchmarking'.
  2. [Abstract] Abstract: the statement that rectification accuracy is 'comparable' to state-of-the-art datasets lacks a specific quantitative table or cited reference values for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our paper. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (groundtruthing pipeline): constructing the fused DSM directly from the same stereo pairs whose disparities are being benchmarked risks systematic bias (e.g., from seasonal appearance changes, building edges, or low intersection angles) that propagates into the LiDAR alignment; the manuscript provides no quantitative analysis of error propagation or exclusion criteria for such failure modes.

    Authors: The ground truth disparities are ultimately derived from the LiDAR data after alignment with the fused DSM. The fusion process aggregates information from multiple stereo pairs per AOI, which are acquired under varying conditions, thereby reducing the impact of any single pair's biases such as those from seasonal changes or low intersection angles. The alignment step uses the independent LiDAR as the reference to correct the fused DSM. Although a dedicated quantitative error propagation study was not included, the human validation results for two AOIs indicate that the final disparities are accurate. We will revise the manuscript to include additional discussion on the robustness of the pipeline and potential limitations. revision: partial

  2. Referee: [Abstract] Abstract (validation coverage): human-annotated point validation is reported for only two of ten AOIs; the remaining eight AOIs rest entirely on the untested fusion-alignment chain, so the central claim that the released disparities constitute reliable ground truth for benchmarking is not fully supported across the dataset.

    Authors: The human-annotated validation is presented for two AOIs to provide quantitative evidence of the pipeline's accuracy in representative cases. The same fusion and alignment procedure is applied consistently to all ten AOIs, and the manuscript includes quantitative and qualitative accuracy analyses for the entire dataset. The LiDAR alignment provides an independent high-accuracy reference for all AOIs. We maintain that this supports the reliability of the ground truth across the dataset, though we acknowledge the value of additional validation where possible. revision: no

Circularity Check

0 steps flagged

No circularity: dataset construction relies on external LiDAR and human annotations with no derivations or self-referential steps

full rationale

The paper is a dataset release describing construction of ground-truth disparities via fused DSM from stereo pairs aligned to external 30 cm LiDAR, plus human-point validation in two AOIs. No equations, predictions, fitted parameters, or derivation chains exist. Validation uses independent external sources (LiDAR, human annotations) rather than internal consistency. No self-citations or ansatzes are load-bearing for any claimed result. This matches the default non-circular case for empirical dataset papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper. No mathematical derivations, free parameters, axioms, or invented entities are present; the ground truth depends on external 30 cm LiDAR data and human annotations.

pith-pipeline@v0.9.0 · 5868 in / 1270 out tokens · 26783 ms · 2026-05-25T00:09:14.281429+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    https://www.jhuapl.edu/pubgeo/ 170807-FOSS4G-JHUAPL-Open-Source-Geospatial-Tools

    Open source geospatial tools for 3d registration and scene classification. https://www.jhuapl.edu/pubgeo/ 170807-FOSS4G-JHUAPL-Open-Source-Geospatial-Tools. pdf. 7, 8

  2. [2]

    https: //dg-cms-uploads-production.s3

    Radiometric use of worldview-2 imagery. https: //dg-cms-uploads-production.s3. amazonaws.com/uploads/document/file/104/ Radiometric_Use_of_WorldView-2_Imagery. pdf. 5

  3. [3]

    https: //dg-cms-uploads-production.s3

    Radiometric use of worldview-3 imagery. https: //dg-cms-uploads-production.s3. amazonaws.com/uploads/document/file/207/ Radiometric_Use_of_WorldView-3_v2.pdf. 5

  4. [4]

    Vision Algorithms, 34099:298–372, 2000

    Bundle Adjustment-A Modern Synthesis. Vision Algorithms, 34099:298–372, 2000. 5

  5. [5]

    volume 3951 LNCS, pages 404–417, 2006

    SURF: Speeded up robust features. volume 3951 LNCS, pages 404–417, 2006. 5

  6. [6]

    Bosch, K

    M. Bosch, K. Foster, G. Christie, S. Wang, G. D. Hager, and M. Brown. Semantic stereo for incidental satellite images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1524–1532, Jan 2019. 4

  7. [7]

    Bosch, Z

    M. Bosch, Z. Kurtz, S. Hagstrom, and M. Brown. A multiple view stereo benchmark for satellite imagery. In 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pages 1–9. IEEE, 2016. 1

  8. [8]

    Brown, H

    M. Brown, H. Goldberg, K. Foster, A. Leichtman, S. Wang, S. Hagstrom, M. Bosch, and S. Almes. Large-scale public lidar and satellite image data set for urban semantic labeling. In Laser Radar Technology and Applications XXIII, volume 10636, page 106360P. International Society for Optics and Photonics, 2018. 1

  9. [9]

    Buades and G

    A. Buades and G. Facciolo. Reliable multiscale and mul- tiwindow stereo matching. SIAM Journal on Imaging Sci- ences, 8(2):888–915, 2015. 8

  10. [10]

    D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A nat- uralistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Com- puter Vision (ECCV), Part IV , LNCS 7577, pages 611–625. Springer-Verlag, Oct. 2012. 3

  11. [11]

    De Franchis, E

    C. De Franchis, E. Meinhardt-Llopis, J. Michel, J.-M. Morel, and G. Facciolo. An automatic and modular stereo pipeline for pushbroom images. In ISPRS Annals of the Photogram- metry, Remote Sensing and Spatial Information Sciences ,

  12. [12]

    Facciolo, C

    G. Facciolo, C. De Franchis, and E. Meinhardt-Llopis. Au- tomatic 3d reconstruction from multi-date satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 57–66, 2017. 5

  13. [13]

    Geiger, P

    A. Geiger, P. Lenz, and R. Urtasun. Are we ready for au- tonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012. 3

  14. [14]

    Grodecki and G

    J. Grodecki and G. Dial. Block adjustment of high-resolution satellite images described by Rational Polynomials. Pho- togrammetric Engineering and Remote Sensing , 69(1):59– 68, 2003. 5

  15. [15]

    Hartley and A

    R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. 5

  16. [16]

    Hirschmuller

    H. Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern anal- ysis and machine intelligence, 30(2):328–341, 2008. 7, 8

  17. [17]

    Hirschmuller and D

    H. Hirschmuller and D. Scharstein. Evaluation of cost func- tions for stereo matching. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE,

  18. [18]

    Loop and Z

    C. Loop and Z. Zhang. Computing rectifying homographies for stereo vision. InProceedings. 1999 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), volume 1, pages 125–131. IEEE, 1999. 5

  19. [19]

    D. G. Lowe. Distinctive image features from scale- invariant keypoints. International journal of computer vi- sion, 60(2):91–110, 2004. 5

  20. [20]

    Menze and A

    M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3

  21. [21]

    Nakamura, T

    Y . Nakamura, T. Matsuura, K. Satoh, and Y . Ohta. Occlusion detectable stereo-occlusion patterns in camera matrix. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pages 371–378. IEEE, 1996. 3

  22. [22]

    J. Oh, W. H. Lee, C. K. Toth, D. A. Grejner-Brzezinska, and C. Lee. A piecewise approach to epipolar resampling of pushbroom satellite images based on rpc. Photogrammetric Engineering & Remote Sensing, 76(12):1353–1363, 2010. 4

  23. [23]

    I. R. Otero. Anatomy of the SIFT Method. PhD thesis, ´Ecole normale sup´erieure de Cachan-ENS Cachan, 2015. 5

  24. [24]

    Scharstein, H

    D. Scharstein, H. Hirschm ¨uller, Y . Kitajima, G. Krathwohl, N. Neˇsi´c, X. Wang, and P. Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition , pages 31–42. Springer,

  25. [25]

    Scharstein and R

    D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In- ternational journal of computer vision , 47(1-3):7–42, 2002. 3

  26. [26]

    Scharstein and R

    D. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured light. In 2003 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition,

  27. [27]

    IEEE, 2003

    Proceedings., volume 1, pages I–I. IEEE, 2003. 3

  28. [28]

    Schops, J

    T. Schops, J. L. Schonberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger. A multi-view stereo benchmark with high-resolution images and multi- camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3260– 3269, 2017. 3