Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

Adithi Srinath; Ali Mostafavi; Junwei Ma; Lipai Huang; Manas Singh

arxiv: 2606.05265 · v1 · pith:OEXC6TLJnew · submitted 2026-06-03 · 💻 cs.LG

Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

Lipai Huang , Adithi Srinath , Manas Singh , Junwei Ma , Ali Mostafavi This is my paper

Pith reviewed 2026-06-28 07:26 UTC · model grok-4.3

classification 💻 cs.LG

keywords flood depth predictioncoreset selectiontabular foundation modelsdomain adaptationdata efficiencywatershed transfersurrogate modeling

0 comments

The pith

Domain-aware coreset selection lets tabular foundation models predict flood depths with 0.7% of training data while transferring across watersheds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a small, carefully chosen subset of simulation data can train a tabular foundation model to predict flood depths almost as accurately as one trained on the full dataset. By stratifying storms according to their return period and the watershed they most affect, then selecting hexagons with a target-aware spatial method, the approach creates a representative coreset. This allows the model to generalize to new watersheds without any additional training or labeled data for those areas. The result is a fast, transferable surrogate that reduces the data and computation needed for near-real-time flood forecasting in multiple locations.

Core claim

A domain-aware coreset construction pipeline that stratifies storms by return period and most-affected watershed and samples hexagons with a target-aware spatial selector conditions a tabular foundation model at inference time. With only 0.7% of the per-watershed training pool this yields a mean R² of 0.663 across nine Houston-area watersheds, within 98.5% of the full supervised reference of 0.673, and it transfers to held-out watersheds without task-specific retraining while outperforming a coreset-trained supervised baseline.

What carries the argument

The domain-aware coreset construction pipeline, which conditions the tabular foundation model at inference time through stratification by return period and watershed plus target-aware spatial selection.

If this is right

The model achieves near-supervised accuracy with far less data.
It enables transfer to held-out watersheds without retraining.
It outperforms supervised baselines trained on the same coreset in transfer settings.
On real storms it can exceed the supervised reference in out-of-distribution cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the coreset selection generalizes, similar pipelines could apply to other environmental simulations requiring data efficiency.
Testing the method on watersheds outside the Houston area would confirm broader applicability.
The approach suggests that foundation models can serve as flexible surrogates when paired with smart data selection rather than full retraining.

Load-bearing premise

The domain-aware stratification and spatial selector create a coreset representative enough for the tabular foundation model to generalize across watersheds without per-watershed fine-tuning.

What would settle it

Observing whether the model's performance on a new set of held-out watersheds falls below the coreset-trained supervised baseline or drops significantly from the 98.5% relative accuracy would falsify the claim of effective transfer.

Figures

Figures reproduced from arXiv: 2606.05265 by Adithi Srinath, Ali Mostafavi, Junwei Ma, Lipai Huang, Manas Singh.

**Figure 1.** Figure 1: Conceptual overview of the proposed approach. (1) Physics-based flood simulation archive: a HEC-RAS knowledge base of 592 synthetic storm events across nine Houston-area watersheds, on the order of 108 event-hexagon rows, with storm metadata, watershed boundaries, and NOAA Atlas 14 return-period labels. (2) Domain-aware coreset construction: a two-stage pipeline compresses the archive into a compact, hydro… view at source ↗

**Figure 2.** Figure 2: The nine Houston-area watersheds, dissolved from the HEC-RAS simulation mesh. Basemap rendered with contextily (https://contextily.readthedocs.io/en/latest/index. html) using OpenStreetMap and CARTO tiles. The database contains 592 synthetic storm events generated by applying a Rasterized Time-series Resampling Method to historic storms in the area, with durations from 1 to 33 hours and hourly rainfall gr… view at source ↗

**Figure 3.** Figure 3: Per-watershed event allocation at 𝑁𝑒 = 50. Bars stacked by return period bin, with x-axis labels showing the selected / training-pool ratio per watershed [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Spatial distribution of selected hexagons for Brays Bayou at 𝑁 = 10k (𝑁ℎ = 200) under all five Stage 2 methods. Panel titles include the mean nearest-neighbor distance. The sparser sampling at the watershed’s southwestern arm reflects the underlying mesh, which thins out where the watershed narrows. the per-hexagon depth signal compresses each hexagon’s response across the Stage 1 event mix into a single t… view at source ↗

**Figure 5.** Figure 5: Within-watershed 𝑅2 scaling with coreset size 𝑁. Mean 𝑅2 across the nine watersheds for the six coreset-based models under FL-Depth, with the Full-KB-XGB reference shown as the dotted line. rises monotonically with 𝑁 and reaches 𝑅2 = 0.663 at 𝑁 = 50k, the highest among coreset-based models and a 98.5% recovery of the Full-KB-XGB reference at 0.673. The same fine-tuning recipe applied to v2.6 yields TabPFN-… view at source ↗

**Figure 6.** Figure 6: Cross-watershed LOO mean 𝑅2 versus context size 𝑁 under the geo (left) and all (right) source-selection modes, averaged over nine held-out target watersheds. outside the synthetic training envelope on the cumulativerainfall and duration axes and therefore serves as a fully outof-distribution test. Imelda has cells both inside and outside this envelope across the nine watersheds [PITH_FULL_IMAGE:figures/… view at source ↗

read the original abstract

Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time. The pipeline stratifies storms by return period and most-affected watershed, then samples hexagons with a target-aware spatial selector. With 0.7% of the per-watershed training pool, the model attains a mean $R^2$ of 0.663 across nine Houston-area watersheds, within 98.5% of the supervised reference ($R^2$ = 0.673). It transfers to held-out watersheds without task-specific retraining, staying ahead of a coreset-trained supervised baseline. On real storms it exceeds the supervised reference on a far out-of-distribution case and trails it on a mostly in-distribution one. Domain-aware coreset construction lets tabular foundation models deliver data-efficient, watershed-transferable flood predictions without per-watershed training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets near full-data R² with 0.7% via domain-aware coreset plus inference-time tabular model conditioning and shows cross-watershed transfer, but the representativeness step lacks supporting checks.

read the letter

The main thing to know is that the work reports a mean R² of 0.663 across nine Houston watersheds using only 0.7% of the per-watershed training data, within 98.5% of a full supervised reference at 0.673, and the model transfers to held-out watersheds without retraining while beating a coreset-trained supervised baseline.

What is new is the specific pipeline: stratify storms by return period and most-affected watershed, then apply a target-aware spatial selector to build the coreset, and condition a tabular foundation model only at inference time. This targets data-efficient, transferable flood depth surrogates. The paper does well by grounding the claim in concrete numbers on real watersheds and by testing on both held-out sites and actual storms, where performance sometimes exceeds the reference on out-of-distribution cases.

The soft spot is the representativeness assumption. The headline transfer result rests on the stratification plus spatial selector producing a coreset whose coverage is sufficient for generalization without per-watershed fine-tuning, yet the abstract and stress-test note give no distribution-matching metrics, no Wasserstein distances, and no ablations that isolate the contribution of each coreset step. Error bars and pre-specification details for the 0.7% fraction are also missing from the provided summary. These gaps make it hard to judge how much the result depends on the Houston data versus the method.

This is for applied ML researchers or hydrologists working on surrogate modeling for flood forecasting. A reader focused on tabular foundation models or coreset techniques for regression could extract useful empirical patterns.

It deserves peer review because the quantitative transfer numbers are specific enough to be worth verifying in full, even if the validation needs tightening.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a domain-aware coreset construction pipeline that stratifies storms by return period and most-affected watershed, then applies a target-aware spatial selector to sample a small subset of hexagons. This coreset is used to condition a tabular foundation model at inference time for flood depth prediction. The central empirical claim is that 0.7% of the per-watershed training pool yields a mean R² of 0.663 across nine Houston-area watersheds (98.5% of the full supervised reference R²=0.673), with transfer to held-out watersheds without task-specific retraining and outperformance of a coreset-trained supervised baseline on real storms.

Significance. If the transfer results hold under rigorous validation, the work would demonstrate a practical route to data-efficient, watershed-transferable surrogates that reduce the millions of rows typically required per watershed while leveraging foundation models for zero-shot adaptation. This addresses a key bottleneck in real-time flood modeling and could generalize to other physics-based simulation domains where labeled data are expensive to generate.

major comments (2)

[Abstract / Methods] Abstract and Methods (coreset pipeline description): The transfer claim to held-out watersheds without per-watershed fine-tuning rests on the unverified assumption that stratification by return period plus the target-aware spatial selector produces a coreset whose coverage of the input-output distribution is sufficient for the foundation model. No quantitative distribution-matching metrics (e.g., Wasserstein distance on hydraulic variables or depth) or ablations that remove either the stratification step or the spatial selector are reported, leaving the representativeness of the 0.7% coreset as a load-bearing but unsupported step.
[Results] Results (performance reporting): The headline R² figures (0.663 vs. 0.673) are presented without error bars, standard deviations across runs, or details on the number of independent trials. This makes it impossible to determine whether the 98.5% retention is statistically distinguishable from the supervised reference or sensitive to the particular 0.7% selection.

minor comments (1)

[Abstract] The abstract states that the model 'exceeds the supervised reference on a far out-of-distribution case' but provides no quantitative definition of 'far out-of-distribution' or the specific storm characteristics used for that comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We address each major comment below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods (coreset pipeline description): The transfer claim to held-out watersheds without per-watershed fine-tuning rests on the unverified assumption that stratification by return period plus the target-aware spatial selector produces a coreset whose coverage of the input-output distribution is sufficient for the foundation model. No quantitative distribution-matching metrics (e.g., Wasserstein distance on hydraulic variables or depth) or ablations that remove either the stratification step or the spatial selector are reported, leaving the representativeness of the 0.7% coreset as a load-bearing but unsupported step.

Authors: We agree that the current manuscript lacks explicit quantitative support for the coreset's distributional coverage. In the revision we will add (i) Wasserstein distances computed on hydraulic variables (rainfall, terrain slope, roughness) and output depths between the 0.7% coreset and the full per-watershed pool, and (ii) ablation results that successively disable return-period stratification and the target-aware spatial selector while measuring transfer R² on held-out watersheds. These additions will directly quantify the contribution of each pipeline component. revision: yes
Referee: [Results] Results (performance reporting): The headline R² figures (0.663 vs. 0.673) are presented without error bars, standard deviations across runs, or details on the number of independent trials. This makes it impossible to determine whether the 98.5% retention is statistically distinguishable from the supervised reference or sensitive to the particular 0.7% selection.

Authors: The observation is correct; variability statistics are absent from the reported headline numbers. We will revise the Results section to report mean R² together with standard deviation across ten independent coreset draws and inference runs, and we will state the exact number of trials. This will permit readers to evaluate both statistical distinguishability from the full-supervised baseline and sensitivity to the particular 0.7% selection. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on reported performance without self-referential definitions or derivations

full rationale

The paper reports experimental results on flood depth prediction using domain-aware coreset selection and tabular foundation models, achieving specific R² values with 0.7% of training data. No equations, derivations, or mathematical chains are present in the abstract or described content. The central claims concern empirical generalization across watersheds and comparison to baselines, which are evaluated via held-out performance metrics rather than any quantity defined in terms of itself or fitted parameters renamed as predictions. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in a way that reduces the result to its inputs. This is a standard empirical ML paper whose validity hinges on experimental design and data, not circular construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated or derivable from the provided text.

pith-pipeline@v0.9.1-grok · 5735 in / 1107 out tokens · 23783 ms · 2026-06-28T07:26:15.137380+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Practical Coreset Constructions for Machine Learning

Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476 . Bentivoglio,R.,Isufi,E.,Jonkman,S.N.,Taormina,R.,2022.Deeplearning methodsforfloodmapping:Areviewofexistingapplicationsandfuture research directions. Hydrology and Earth System Sciences 26, 4345–

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

System for automated geoscientific analyses (SAGA) v. 2.1.4. Geoscientific Model Develop- ment 8, 1991–2007. Esparza, M., Battala, V., Mostafavi, A.,

1991
[3]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

A graph neural network and decision tree modeling approach for predicting wildfire-induced buildingdamage. Computer-AidedCivilandInfrastructureEngineering , 100085. Grinsztajn, L., Flöge, K., Key, O., Birkel, F., Jund, P., Roof, C., Jäger, L., Hollmann,N.,Hutter,F.,2025. TabPFN-2.5:Advancingthestateofthe art in tabular foundation models. arXiv preprint ar...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Hollmann,N.,Müller,S.,Purucker,L.,Krishnakumar,A.,Körfer,M.,Hoo, S.B.,Schirrmeister,R.T.,Hutter,F.,2025.Accuratepredictionsonsmall data with a tabular foundation model

TabPFN: A transformerthatsolvessmalltabularclassificationproblemsinasecond, in: International Conference on Learning Representations. Hollmann,N.,Müller,S.,Purucker,L.,Krishnakumar,A.,Körfer,M.,Hoo, S.B.,Schirrmeister,R.T.,Hutter,F.,2025.Accuratepredictionsonsmall data with a tabular foundation model. Nature 637, 319–326. Killamsetty, K., Sivasubramanian,...

2025
[5]

arXiv preprint arXiv:2512.17785

A parametric framework for anticipatory flashflood warning: Integrating landscape vulnerability with precipitation forecasts. arXiv preprint arXiv:2512.17785 . Lin, H., Bilmes, J.,

work page arXiv
[6]

TransformerscandoBayesianinference,in:InternationalConferenceon Learning Representations

Müller,S.,Hollmann,N.,PinedaArango,S.,Grabocka,J.,Hutter,F.,2022. TransformerscandoBayesianinference,in:InternationalConferenceon Learning Representations. NationalOceanicandAtmosphericAdministration,2018. NOAAAtlas14 precipitation-frequency Atlas of the United States.https://hdsc.nws. noaa.gov/pfds/. Accessed via the Precipitation Frequency Data Server (...

2022
[7]

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

TabICL: A tabular foundation model for in-context learning on large data. arXiv preprint arXiv:2502.05564 . Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F., Dormann, C.F.,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

arXivpreprintarXiv:2506.08982

On finetuningtabularfoundationmodels. arXivpreprintarXiv:2506.08982 . Sener, O., Savarese, S.,

work page arXiv
[9]

Wahl,T.,Jain,S.,Bender,J.,Meyers,S.D.,Luther,M.E.,2015

Retrieval & fine-tuning for in-context tabular models, in: Advances in Neural Information Processing Systems. Wahl,T.,Jain,S.,Bender,J.,Meyers,S.D.,Luther,M.E.,2015. Increasing risk of compound flooding from storm surge and rainfall for major US cities. NatureClimateChange5,1093–1097. doi:10.1038/nclimate2736. Wei, K., Iyer, R., Bilmes, J.,

work page doi:10.1038/nclimate2736 2015
[10]

1954–1963

Submodularity in data subset selection and active learning, in: International Conference on Machine Learning, pp. 1954–1963. Xiao, Y., Mostafavi, A.,

1954
[11]

arXiv preprint arXiv:2309.14610

Unsupervised graph deep learning re- veals emergent flood risk profile of urban areas. arXiv preprint arXiv:2309.14610 . Zahura, F.T., Goodall, J.L., Sadler, J.M., Shen, Y., Morsy, M.M., Behl, M.,

work page arXiv
[12]

doi:10.1029/2019WR027038

Training machine learning surrogate models from a high- fidelityphysics-basedmodel:Applicationforreal-timestreet-scaleflood predictioninanurbancoastalcommunity.WaterResourcesResearch56, e2019WR027038. doi:10.1029/2019WR027038. Zscheischler,J.,Westra,S.,vandenHurk,B.J.J.M.,Seneviratne,S.I.,Ward, P.J., Pitman, A., AghaKouchak, A., Bresch, D.N., Leonard, M.,...

work page doi:10.1029/2019wr027038 2018

[1] [1]

Practical Coreset Constructions for Machine Learning

Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476 . Bentivoglio,R.,Isufi,E.,Jonkman,S.N.,Taormina,R.,2022.Deeplearning methodsforfloodmapping:Areviewofexistingapplicationsandfuture research directions. Hydrology and Earth System Sciences 26, 4345–

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

System for automated geoscientific analyses (SAGA) v. 2.1.4. Geoscientific Model Develop- ment 8, 1991–2007. Esparza, M., Battala, V., Mostafavi, A.,

1991

[3] [3]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

A graph neural network and decision tree modeling approach for predicting wildfire-induced buildingdamage. Computer-AidedCivilandInfrastructureEngineering , 100085. Grinsztajn, L., Flöge, K., Key, O., Birkel, F., Jund, P., Roof, C., Jäger, L., Hollmann,N.,Hutter,F.,2025. TabPFN-2.5:Advancingthestateofthe art in tabular foundation models. arXiv preprint ar...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Hollmann,N.,Müller,S.,Purucker,L.,Krishnakumar,A.,Körfer,M.,Hoo, S.B.,Schirrmeister,R.T.,Hutter,F.,2025.Accuratepredictionsonsmall data with a tabular foundation model

TabPFN: A transformerthatsolvessmalltabularclassificationproblemsinasecond, in: International Conference on Learning Representations. Hollmann,N.,Müller,S.,Purucker,L.,Krishnakumar,A.,Körfer,M.,Hoo, S.B.,Schirrmeister,R.T.,Hutter,F.,2025.Accuratepredictionsonsmall data with a tabular foundation model. Nature 637, 319–326. Killamsetty, K., Sivasubramanian,...

2025

[5] [5]

arXiv preprint arXiv:2512.17785

A parametric framework for anticipatory flashflood warning: Integrating landscape vulnerability with precipitation forecasts. arXiv preprint arXiv:2512.17785 . Lin, H., Bilmes, J.,

work page arXiv

[6] [6]

TransformerscandoBayesianinference,in:InternationalConferenceon Learning Representations

Müller,S.,Hollmann,N.,PinedaArango,S.,Grabocka,J.,Hutter,F.,2022. TransformerscandoBayesianinference,in:InternationalConferenceon Learning Representations. NationalOceanicandAtmosphericAdministration,2018. NOAAAtlas14 precipitation-frequency Atlas of the United States.https://hdsc.nws. noaa.gov/pfds/. Accessed via the Precipitation Frequency Data Server (...

2022

[7] [7]

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

TabICL: A tabular foundation model for in-context learning on large data. arXiv preprint arXiv:2502.05564 . Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F., Dormann, C.F.,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

arXivpreprintarXiv:2506.08982

On finetuningtabularfoundationmodels. arXivpreprintarXiv:2506.08982 . Sener, O., Savarese, S.,

work page arXiv

[9] [9]

Wahl,T.,Jain,S.,Bender,J.,Meyers,S.D.,Luther,M.E.,2015

Retrieval & fine-tuning for in-context tabular models, in: Advances in Neural Information Processing Systems. Wahl,T.,Jain,S.,Bender,J.,Meyers,S.D.,Luther,M.E.,2015. Increasing risk of compound flooding from storm surge and rainfall for major US cities. NatureClimateChange5,1093–1097. doi:10.1038/nclimate2736. Wei, K., Iyer, R., Bilmes, J.,

work page doi:10.1038/nclimate2736 2015

[10] [10]

1954–1963

Submodularity in data subset selection and active learning, in: International Conference on Machine Learning, pp. 1954–1963. Xiao, Y., Mostafavi, A.,

1954

[11] [11]

arXiv preprint arXiv:2309.14610

Unsupervised graph deep learning re- veals emergent flood risk profile of urban areas. arXiv preprint arXiv:2309.14610 . Zahura, F.T., Goodall, J.L., Sadler, J.M., Shen, Y., Morsy, M.M., Behl, M.,

work page arXiv

[12] [12]

doi:10.1029/2019WR027038

Training machine learning surrogate models from a high- fidelityphysics-basedmodel:Applicationforreal-timestreet-scaleflood predictioninanurbancoastalcommunity.WaterResourcesResearch56, e2019WR027038. doi:10.1029/2019WR027038. Zscheischler,J.,Westra,S.,vandenHurk,B.J.J.M.,Seneviratne,S.I.,Ward, P.J., Pitman, A., AghaKouchak, A., Bresch, D.N., Leonard, M.,...

work page doi:10.1029/2019wr027038 2018