A New Stereo Benchmarking Dataset for Satellite Images
Pith reviewed 2026-05-25 00:09 UTC · model grok-4.3
The pith
A new public dataset supplies groundtruthed disparities for stereo pairs from multi-date satellite images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish a dataset of multi-date satellite stereo pairs with disparities groundtruthed via fused DSM construction from the pairs followed by alignment to 30 cm LiDAR, and they demonstrate through quantitative human point evaluation in two AOIs that the disparities are accurate while rectification matches existing benchmarks.
What carries the argument
The fused DSM from stereo pairs aligned to LiDAR, which generates the ground truth disparities.
If this is right
- Stereo reconstruction algorithms can now be tested on multi-date satellite data with known seasonal variations.
- Building masks enable evaluation focused on reliable matching regions.
- Included metadata on dates, angles, and intersection angles supports detailed analysis of stereo pairs.
- Accuracy analyses provide benchmarks for the quality of the ground truth.
Where Pith is reading between the lines
- The dataset may support development of algorithms robust to seasonal changes in vegetation and lighting.
- Validation approach could extend to creating ground truth for other satellite stereo collections.
- Public availability allows community-wide comparison of stereo methods on real-world satellite imagery.
Load-bearing premise
Aligning the fused DSM constructed from the stereo pairs to 30 cm LiDAR produces accurate ground truth disparities that human annotated points can independently confirm without major systematic errors from fusion or alignment.
What would settle it
Discrepancies between the groundtruthed disparities and a larger set of human annotated points across more AOIs, or independent checks revealing consistent biases in the fused DSM alignment.
Figures
read the original abstract
In order to facilitate further research in stereo reconstruction with multi-date satellite images, the goal of this paper is to provide a set of stereo-rectified images and the associated groundtruthed disparities for 10 AOIs (Area of Interest) drawn from two sources: 8 AOIs from IARPA's MVS Challenge dataset and 2 AOIs from the CORE3D-Public dataset. The disparities were groundtruthed by first constructing a fused DSM from the stereo pairs and by aligning 30 cm LiDAR with the fused DSM. Unlike the existing benckmarking datasets, we have also carried out a quantitative evaluation of our groundtruthed disparities using human annotated points in two of the AOIs. Additionally, the rectification accuracy in our dataset is comparable to the same in the existing state-of-the-art stereo datasets. In general, we have used the WorldView-3 (WV3) images for the dataset, the exception being the UCSD area for which we have used both WV3 and WorldView-2 (WV2) images. All of the dataset images are now in the public domain. Since multi-date satellite images frequently include images acquired in different seasons (which creates challenges in finding corresponding pairs of pixels for stereo), our dataset also includes for each image a building mask over which the disparities estimated by stereo should prove reliable. Additional metadata included in the dataset includes information about each image's acquisition date and time, the azimuth and elevation angles of the camera, and the intersection angles for the two views in a stereo pair. Also included in the dataset are both quantitative and qualitative analyses of the accuracy of the groundtruthed disparity maps. Our dataset is available for download at \url{https://engineering.purdue.edu/RVL/Database/SatStereo/index.html}
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a new public stereo benchmarking dataset for multi-date satellite images, covering 10 AOIs (8 from IARPA MVS Challenge, 2 from CORE3D-Public) with stereo-rectified WorldView-3 (and some WorldView-2) image pairs, groundtruthed disparities, building masks, acquisition metadata, and accuracy analyses. Ground truth disparities are constructed by fusing a DSM from the input stereo pairs and aligning it to 30 cm LiDAR; quantitative validation against human-annotated points is provided for two AOIs, and rectification accuracy is stated to be comparable to existing datasets.
Significance. If the ground truth holds, the dataset would be a useful addition for evaluating stereo methods on challenging multi-date satellite imagery with seasonal changes, as it supplies building masks to focus on reliable regions and includes human-point validation absent from prior benchmarks. The public release and metadata on intersection angles and acquisition times are practical strengths for reproducibility.
major comments (2)
- [Abstract] Abstract (groundtruthing pipeline): constructing the fused DSM directly from the same stereo pairs whose disparities are being benchmarked risks systematic bias (e.g., from seasonal appearance changes, building edges, or low intersection angles) that propagates into the LiDAR alignment; the manuscript provides no quantitative analysis of error propagation or exclusion criteria for such failure modes.
- [Abstract] Abstract (validation coverage): human-annotated point validation is reported for only two of ten AOIs; the remaining eight AOIs rest entirely on the untested fusion-alignment chain, so the central claim that the released disparities constitute reliable ground truth for benchmarking is not fully supported across the dataset.
minor comments (2)
- [Abstract] Abstract: 'benckmarking' is a typo for 'benchmarking'.
- [Abstract] Abstract: the statement that rectification accuracy is 'comparable' to state-of-the-art datasets lacks a specific quantitative table or cited reference values for direct comparison.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our paper. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract (groundtruthing pipeline): constructing the fused DSM directly from the same stereo pairs whose disparities are being benchmarked risks systematic bias (e.g., from seasonal appearance changes, building edges, or low intersection angles) that propagates into the LiDAR alignment; the manuscript provides no quantitative analysis of error propagation or exclusion criteria for such failure modes.
Authors: The ground truth disparities are ultimately derived from the LiDAR data after alignment with the fused DSM. The fusion process aggregates information from multiple stereo pairs per AOI, which are acquired under varying conditions, thereby reducing the impact of any single pair's biases such as those from seasonal changes or low intersection angles. The alignment step uses the independent LiDAR as the reference to correct the fused DSM. Although a dedicated quantitative error propagation study was not included, the human validation results for two AOIs indicate that the final disparities are accurate. We will revise the manuscript to include additional discussion on the robustness of the pipeline and potential limitations. revision: partial
-
Referee: [Abstract] Abstract (validation coverage): human-annotated point validation is reported for only two of ten AOIs; the remaining eight AOIs rest entirely on the untested fusion-alignment chain, so the central claim that the released disparities constitute reliable ground truth for benchmarking is not fully supported across the dataset.
Authors: The human-annotated validation is presented for two AOIs to provide quantitative evidence of the pipeline's accuracy in representative cases. The same fusion and alignment procedure is applied consistently to all ten AOIs, and the manuscript includes quantitative and qualitative accuracy analyses for the entire dataset. The LiDAR alignment provides an independent high-accuracy reference for all AOIs. We maintain that this supports the reliability of the ground truth across the dataset, though we acknowledge the value of additional validation where possible. revision: no
Circularity Check
No circularity: dataset construction relies on external LiDAR and human annotations with no derivations or self-referential steps
full rationale
The paper is a dataset release describing construction of ground-truth disparities via fused DSM from stereo pairs aligned to external 30 cm LiDAR, plus human-point validation in two AOIs. No equations, predictions, fitted parameters, or derivation chains exist. Validation uses independent external sources (LiDAR, human annotations) rather than internal consistency. No self-citations or ansatzes are load-bearing for any claimed result. This matches the default non-circular case for empirical dataset papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://www.jhuapl.edu/pubgeo/ 170807-FOSS4G-JHUAPL-Open-Source-Geospatial-Tools
Open source geospatial tools for 3d registration and scene classification. https://www.jhuapl.edu/pubgeo/ 170807-FOSS4G-JHUAPL-Open-Source-Geospatial-Tools. pdf. 7, 8
-
[2]
https: //dg-cms-uploads-production.s3
Radiometric use of worldview-2 imagery. https: //dg-cms-uploads-production.s3. amazonaws.com/uploads/document/file/104/ Radiometric_Use_of_WorldView-2_Imagery. pdf. 5
-
[3]
https: //dg-cms-uploads-production.s3
Radiometric use of worldview-3 imagery. https: //dg-cms-uploads-production.s3. amazonaws.com/uploads/document/file/207/ Radiometric_Use_of_WorldView-3_v2.pdf. 5
-
[4]
Vision Algorithms, 34099:298–372, 2000
Bundle Adjustment-A Modern Synthesis. Vision Algorithms, 34099:298–372, 2000. 5
work page 2000
-
[5]
volume 3951 LNCS, pages 404–417, 2006
SURF: Speeded up robust features. volume 3951 LNCS, pages 404–417, 2006. 5
work page 2006
- [6]
- [7]
-
[8]
M. Brown, H. Goldberg, K. Foster, A. Leichtman, S. Wang, S. Hagstrom, M. Bosch, and S. Almes. Large-scale public lidar and satellite image data set for urban semantic labeling. In Laser Radar Technology and Applications XXIII, volume 10636, page 106360P. International Society for Optics and Photonics, 2018. 1
work page 2018
-
[9]
A. Buades and G. Facciolo. Reliable multiscale and mul- tiwindow stereo matching. SIAM Journal on Imaging Sci- ences, 8(2):888–915, 2015. 8
work page 2015
-
[10]
D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A nat- uralistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Com- puter Vision (ECCV), Part IV , LNCS 7577, pages 611–625. Springer-Verlag, Oct. 2012. 3
work page 2012
-
[11]
C. De Franchis, E. Meinhardt-Llopis, J. Michel, J.-M. Morel, and G. Facciolo. An automatic and modular stereo pipeline for pushbroom images. In ISPRS Annals of the Photogram- metry, Remote Sensing and Spatial Information Sciences ,
-
[12]
G. Facciolo, C. De Franchis, and E. Meinhardt-Llopis. Au- tomatic 3d reconstruction from multi-date satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 57–66, 2017. 5
work page 2017
- [13]
-
[14]
J. Grodecki and G. Dial. Block adjustment of high-resolution satellite images described by Rational Polynomials. Pho- togrammetric Engineering and Remote Sensing , 69(1):59– 68, 2003. 5
work page 2003
-
[15]
R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. 5
work page 2003
-
[16]
H. Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern anal- ysis and machine intelligence, 30(2):328–341, 2008. 7, 8
work page 2008
-
[17]
H. Hirschmuller and D. Scharstein. Evaluation of cost func- tions for stereo matching. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE,
work page 2007
-
[18]
C. Loop and Z. Zhang. Computing rectifying homographies for stereo vision. InProceedings. 1999 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), volume 1, pages 125–131. IEEE, 1999. 5
work page 1999
-
[19]
D. G. Lowe. Distinctive image features from scale- invariant keypoints. International journal of computer vi- sion, 60(2):91–110, 2004. 5
work page 2004
-
[20]
M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3
work page 2015
-
[21]
Y . Nakamura, T. Matsuura, K. Satoh, and Y . Ohta. Occlusion detectable stereo-occlusion patterns in camera matrix. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pages 371–378. IEEE, 1996. 3
work page 1996
-
[22]
J. Oh, W. H. Lee, C. K. Toth, D. A. Grejner-Brzezinska, and C. Lee. A piecewise approach to epipolar resampling of pushbroom satellite images based on rpc. Photogrammetric Engineering & Remote Sensing, 76(12):1353–1363, 2010. 4
work page 2010
-
[23]
I. R. Otero. Anatomy of the SIFT Method. PhD thesis, ´Ecole normale sup´erieure de Cachan-ENS Cachan, 2015. 5
work page 2015
-
[24]
D. Scharstein, H. Hirschm ¨uller, Y . Kitajima, G. Krathwohl, N. Neˇsi´c, X. Wang, and P. Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition , pages 31–42. Springer,
-
[25]
D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In- ternational journal of computer vision , 47(1-3):7–42, 2002. 3
work page 2002
-
[26]
D. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured light. In 2003 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition,
work page 2003
- [27]
-
[28]
T. Schops, J. L. Schonberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger. A multi-view stereo benchmark with high-resolution images and multi- camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3260– 3269, 2017. 3
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.