Recognition: no theorem link
Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models
Pith reviewed 2026-05-12 01:32 UTC · model grok-4.3
The pith
Flood depth is estimated geometrically from aerial images by fusing transformer segmentation masks with elevation models to determine a single water surface level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a geometric Water Surface Elevation approach in which Mask2Former segmentation masks are fused with Digital Elevation Models to identify the water-land boundary, calculate a global water surface elevation Z_water, and compute per-pixel flood depths on the principle of local hydrostatic equilibrium.
What carries the argument
The Water Surface Elevation workflow that fuses transformer-based flood masks with DEMs to locate the boundary, set a global Z_water, and derive per-pixel depths from elevation differences.
If this is right
- High-performance 2D segmentation directly yields 3D volumetric flood information from monocular imagery.
- The pipeline avoids the computational delay of full hydrodynamic simulations for post-disaster use.
- The method is demonstrated on the FloodNet and CRASAR-U-DROIDS datasets for practical validation.
- Per-pixel depths become available once the water-land boundary is identified from the fused data.
Where Pith is reading between the lines
- High-resolution DEMs would be essential for accurate depths in areas with steep terrain.
- The approach could be tested on time-series imagery to track how depths evolve during a flood event.
- Where flow is present the single-level assumption may need adjustment, suggesting possible hybrid use with simpler flow models.
Load-bearing premise
A single global water surface elevation can represent the entire water body while local hydrostatic equilibrium holds without major flow or wind effects.
What would settle it
Direct measurements of differing water surface heights at separate boundary points or evidence of strong currents that violate hydrostatic equilibrium would disprove the central geometric calculation.
Figures
read the original abstract
Post-disaster situational awareness relies heavily on understanding both the extent and the volume of floodwaters. While 2D semantic segmentation provides accurate flood masking, it lacks the vertical dimension required to assess navigability and structural risk. This paper presents a geometric "Water Surface Elevation" approach for estimating flood depth from monocular aerial imagery. Our pipeline utilizes Mask2Former, a state-of-the-art transformer-based segmentation model, to generate precise 2D flood masks. These masks are fused with Digital Elevation Models (DEMs) to identify the water-land boundary, calculate a global water surface elevation ($Z_{water}$), and compute per-pixel depth based on the principle of local hydrostatic equilibrium. We evaluate this workflow using the FloodNet and CRASAR-U-DROIDS datasets, demonstrating how high-performance segmentation can be leveraged to extract 3D volumetric data from 2D imagery without the latency of hydrodynamic simulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a geometric method for estimating flood depths from monocular aerial imagery. It employs Mask2Former for 2D flood segmentation, fuses the resulting masks with DEMs to extract the water-land boundary, computes a single global water surface elevation Z_water from boundary pixel elevations, and derives per-pixel depths as Z_water minus local DEM values under the assumption of local hydrostatic equilibrium. The workflow is evaluated on the FloodNet and CRASAR-U-DROIDS datasets to show extraction of 3D volumetric information without running hydrodynamic simulations.
Significance. If validated, the approach offers a fast, parameter-free alternative to simulation-based methods for obtaining volumetric flood data from standard 2D imagery and DEMs, which could be valuable for rapid post-disaster situational awareness. The use of a state-of-the-art transformer segmentation model and the direct geometric derivation (no fitted parameters) are strengths that align with needs in computer vision for disaster applications. However, the significance is tempered by the untested flat-surface assumption and lack of depth-specific quantitative validation.
major comments (3)
- [Method] Method section: The exact procedure for computing the global Z_water from boundary pixels (e.g., mean, median, maximum, or other statistic of DEM elevations at the water-land interface) is not specified. This choice is load-bearing for all subsequent per-pixel depth values and must be stated explicitly, ideally with a formula.
- [Evaluation] Evaluation section: The reported experiments focus on segmentation performance but provide no quantitative metrics for depth estimation accuracy (e.g., MAE, RMSE against ground-truth bathymetry or hydrodynamic reference solutions) on FloodNet or CRASAR-U-DROIDS. Without such validation, the central claim of reliable 3D volumetric extraction cannot be assessed.
- [Introduction/Method] Introduction/Method: The assumption of a single global Z_water (i.e., perfectly level water surface under local hydrostatic equilibrium) is stated but not stress-tested. No analysis or examples address potential violations from flow-induced slopes, wind setup, or non-hydrostatic effects, which would directly invalidate the per-pixel depth formula Z_water - DEM.
minor comments (3)
- [Abstract] Abstract: The phrase 'precise 2D flood masks' should be qualified with the specific segmentation metrics (e.g., mIoU) achieved on the evaluation datasets.
- [Related Work] Related work discussion appears limited; add references to prior geometric or DEM-fusion approaches for flood depth estimation to better contextualize the contribution.
- [Figures] Ensure all figures showing depth maps include color bars, scale bars, and quantitative error visualizations if available.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions have been made to improve clarity and address limitations.
read point-by-point responses
-
Referee: [Method] Method section: The exact procedure for computing the global Z_water from boundary pixels (e.g., mean, median, maximum, or other statistic of DEM elevations at the water-land interface) is not specified. This choice is load-bearing for all subsequent per-pixel depth values and must be stated explicitly, ideally with a formula.
Authors: We agree that the aggregation method for Z_water was not explicitly detailed. In the revised manuscript, the Method section now specifies that Z_water is computed as the median of DEM elevations at water-land boundary pixels (chosen for robustness to DEM noise and boundary misclassifications). The formula added is Z_water = median({DEM(p) | p in boundary pixels}), along with pseudocode for the full pipeline from mask to depths. revision: yes
-
Referee: [Evaluation] Evaluation section: The reported experiments focus on segmentation performance but provide no quantitative metrics for depth estimation accuracy (e.g., MAE, RMSE against ground-truth bathymetry or hydrodynamic reference solutions) on FloodNet or CRASAR-U-DROIDS. Without such validation, the central claim of reliable 3D volumetric extraction cannot be assessed.
Authors: We acknowledge the absence of quantitative depth metrics. The FloodNet and CRASAR-U-DROIDS datasets provide only 2D segmentation ground truth and lack bathymetry or depth references, precluding direct MAE/RMSE computation against hydrodynamic solutions. We have added a dedicated limitations subsection noting this constraint and have included qualitative depth map visualizations. Future extensions will target datasets with depth annotations. revision: partial
-
Referee: [Introduction/Method] Introduction/Method: The assumption of a single global Z_water (i.e., perfectly level water surface under local hydrostatic equilibrium) is stated but not stress-tested. No analysis or examples address potential violations from flow-induced slopes, wind setup, or non-hydrostatic effects, which would directly invalidate the per-pixel depth formula Z_water - DEM.
Authors: The flat-surface assumption is foundational, and we have expanded both the Introduction and Method to discuss its applicability. For the post-disaster scenes in our datasets, water bodies are largely quiescent with limited flow over the imaged scales, supporting the local hydrostatic approximation. We added analysis of error sources (e.g., wind setup inducing <5 cm slopes over 100 m) and note that deviations would be detectable as inconsistencies at boundaries. This provides a fast baseline while acknowledging cases where full hydrodynamics would be needed. revision: yes
Circularity Check
No circularity: depth computation applies stated geometric principle directly from boundary data
full rationale
The paper's core workflow extracts a water-land boundary from the Mask2Former mask fused with DEM elevations, sets a single global Z_water from those boundary values, and subtracts local DEM heights to obtain per-pixel depths under the local hydrostatic equilibrium assumption. This is a direct geometric calculation with no parameter fitting, no self-referential equations, and no load-bearing self-citations or imported ansatzes described in the abstract or method outline. The result is not equivalent to its inputs by construction; it encodes an explicit physical modeling choice whose validity can be checked against external bathymetry or hydrodynamic references.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Local hydrostatic equilibrium holds so that water surface elevation is constant across the flooded region.
Reference graph
Works this paper leans on
-
[1]
Validation of anuga hydraulic model using exact solutions to shallow water wave problems,
S. Mungkasi and S. G. Roberts, “Validation of anuga hydraulic model using exact solutions to shallow water wave problems,”Journal of Physics: Conference Series, vol. 423, no. 1, p. 012029, apr 2013. [Online]. Available: https://doi.org/10.1088/1742-6596/423/1/012029
-
[2]
Hec-ras river analysis system. hydraulic refer- ence manual. version 1.0
G. W. Brunner, “Hec-ras river analysis system. hydraulic refer- ence manual. version 1.0.” 1995
work page 1995
-
[3]
Hervouet,Hydrodynamics of free surface flows: modelling with the finite element method
J.-M. Hervouet,Hydrodynamics of free surface flows: modelling with the finite element method. John Wiley & Sons, 2007
work page 2007
-
[4]
Hurricane Harvey Flood Depth Rasters,
Federal Emergency Management Agency, “Hurricane Harvey Flood Depth Rasters,” https://www.fema.gov/flood-maps, 2017, flood depth grids derived from post-event high-water mark surveys and hydraulic modeling, Harvey 2017
work page 2017
-
[5]
Extent and depth of flooding using sar sentinel- 1 and machine learning algorithms,
J. Soria-Ruiz, Y . M. Fernandez-Ordonez, and J. P. Ambrosio- Ambrosio, “Extent and depth of flooding using sar sentinel- 1 and machine learning algorithms,” inIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Sympo- sium. IEEE, 2023, pp. 2246–2249
work page 2023
-
[6]
Sentinel-1 sar and lidar to detect extent and depth flood using random forests machine learning,
J. Soria-Ruiz, Y . M. Fernandez-Ordo ˜nez, J. P. Ambrosio- Ambrosio, and M. A. Escalona-Maurice, “Sentinel-1 sar and lidar to detect extent and depth flood using random forests machine learning,” inIGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2022, pp. 5113–5116
work page 2022
-
[7]
Floodnet: A high resolution aerial imagery dataset for post flood scene understanding,
M. Rahnemoonfar, T. Chowdhury, A. Sarkar, D. Varshney, M. Yari, and R. R. Murphy, “Floodnet: A high resolution aerial imagery dataset for post flood scene understanding,”IEEE Access, vol. 9, pp. 89 644–89 654, 2021
work page 2021
-
[8]
Masked-attention mask transformer for universal image segmentation,
B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Gird- har, “Masked-attention mask transformer for universal image segmentation,” 2022
work page 2022
-
[9]
T. Manzini, P. Perali, R. Karnik, and R. Murphy, “Crasar-u- droids: A large scale benchmark dataset for building alignment and damage assessment in georectified suas imagery,”arXiv preprint arXiv:2407.17673, 2024
-
[10]
M. Rahnemoonfar, T. Chowdhury, and R. Murphy, “Rescuenet: a high resolution uav semantic segmentation dataset for natural disaster damage assessment,”Scientific data, vol. 10, no. 1, p. 913, 2023
work page 2023
-
[11]
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,”arXiv preprint arXiv:1606.02147, 2016
work page Pith review arXiv 2016
-
[12]
Encoder-decoder with atrous separable convolution for se- mantic image segmentation,
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for se- mantic image segmentation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818
work page 2018
-
[13]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural informa- tion processing systems, vol. 34, pp. 12 077–12 090, 2021
work page 2021
-
[14]
Attention U-Net: Learning Where to Look for the Pancreas
O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y . Hammerla, B. Kainz et al., “Attention u-net: Learning where to look for the pan- creas,”arXiv preprint arXiv:1804.03999, 2018
work page internal anchor Pith review arXiv 2018
-
[15]
Z. Song and Y . Tuo, “Automated flood depth estimates from online traffic sign images: Explorations of a convolutional neural network-based method,”Sensors, vol. 21, no. 16, p. 5614, 2021
work page 2021
-
[16]
S. Liu, W. Zheng, X. Wang, H. Xiong, J. Cheng, C. Yong, W. Zhang, and X. Zou, “A novel depth measurement method for urban flooding based on surveillance video images and a floating ruler,”Natural Hazards, vol. 119, no. 3, pp. 1967–1989, 2023
work page 1967
-
[17]
Sar-based flood mapping, where we are and future challenges,
M. Chini, R. Pelich, Y . Li, R. Hostache, J. Zhao, C. Di Mauro, and P. Matgen, “Sar-based flood mapping, where we are and future challenges,” in2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 884–886
work page 2021
-
[18]
Opentopography: a services ori- ented architecture for community access to lidar topography,
S. Krishnan, C. Crosby, V . Nandigam, M. Phan, C. Cowart, C. Baru, and R. Arrowsmith, “Opentopography: a services ori- ented architecture for community access to lidar topography,” in Proceedings of the 2nd international conference on computing for Geospatial Research & Applications, 2011, pp. 1–8
work page 2011
-
[19]
Flood depth estimation by means of high-resolution sar images and lidar data,
F. Cian, M. Marconcini, P. Ceccato, and C. Giupponi, “Flood depth estimation by means of high-resolution sar images and lidar data,”Natural Hazards and Earth System Sciences, vol. 18, no. 11, pp. 3063–3084, 2018
work page 2018
-
[20]
D. Y . Hancock, J. Fischer, J. M. Lowe, W. Snapp-Childs, M. Pierce, S. Marru, J. E. Coulter, M. Vaughn, B. Beck, N. Merchant, E. Skidmore, and G. Jacobs, “Jetstream2: Accelerating cloud computing via jetstream,” inPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions, ser. PEARC ’21. New York, NY , USA: Association f...
-
[21]
T. J. Boerner, S. Deems, T. R. Furlani, S. L. Knuth, and J. Towns, “Access: Advancing innovation: Nsf’s advanced cyberinfrastructure coordination ecosystem: Services & support,” inPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good, ser. PEARC ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 173...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.