arxiv: 2605.13689 · v1 · submitted 2026-05-13 · 📊 stat.ME

Recognition: unknown

Moving beyond spatial and random cross-validation in environmental modelling: a call for prediction-domain adaptive evaluation

Jan Linnenbrink , Jakub Nowosad , Hanna Meyer

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:49 UTC · model grok-4.3

classification 📊 stat.ME

keywords cross-validationspatial modellingenvironmental modellingprediction accuracymap evaluationinterpolationextrapolation

0 comments

The pith

Prediction-domain adaptive cross-validation provides reliable accuracy estimates across interpolation and extrapolation scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that standard random and spatial cross-validation methods have limitations because they are suited only to specific extremes of the prediction domain. It proposes a new category called prediction-domain adaptive evaluation that flexibly adapts the evaluation to the particular prediction situation. This is important because most real-world cases in environmental modeling fall between random sampling and full extrapolation, leading to unreliable accuracy estimates otherwise. Sympathetic readers would care as it affects model tuning and the trustworthiness of spatial predictions in ecology. The authors support this by reproducing a simulation study to compare different methods.

Core claim

We advocate for prediction-domain adaptive evaluation as a new category of cross-validation methods that flexibly adapt to the prediction situation, yielding most reliable estimates of map accuracy across different scenarios.

What carries the argument

Prediction-domain adaptive evaluation methods, which adjust cross-validation to the specific prediction domain instead of using fixed random or spatial approaches.

If this is right

Random cross-validation is suitable when training points are randomly distributed in the prediction area.
Spatial cross-validation performs better in extrapolation situations.
Most cases lie on a continuum between these two extremes.
Adaptive methods can give the most reliable accuracy estimates across scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developing concrete algorithms for this adaptive category could improve practical model evaluation in environmental science.
This call might lead to new research on estimating the appropriate adaptation based on data distribution.
Reproducing simulation studies provides a way to empirically test and refine such methods.

Load-bearing premise

Practical methods in the prediction-domain adaptive category can be developed that outperform random and spatial cross-validation in providing reliable accuracy estimates.

What would settle it

A study demonstrating that adaptive methods do not provide superior or more reliable accuracy estimates compared to existing methods across the continuum would falsify the proposal.

Figures

Figures reproduced from arXiv: 2605.13689 by Hanna Meyer, Jakub Nowosad, Jan Linnenbrink.

**Figure 2.** Figure 2: In our simulation study, we generated four training sampling designs (random, biased, clustered, and [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

With the growing application of spatial predictive modeling in ecology, the question of how to appropriately evaluate the resulting maps has gained increasing attention. While there is consensus that map accuracy is ideally estimated using an independent probability sample of the prediction area, there is still no agreement on the most appropriate way to conduct an evaluation for the common case when such a sample is not available. Cross-validation, which involves multiple train-test splits, is commonly applied not only to estimate final model accuracy but also to guide model tuning and selection. Many different spatial and non-spatial approaches to cross-validation have been proposed, and approaches in both groups have faced substantial criticism. It has been shown that random cross-validation methods are suitable when the training points are randomly distributed in the prediction area, while spatial cross-validation is better suited towards extrapolation situations. In practice, however, there is a continuum and most cases are between those two extremes. To address this gap, we advocate for a new category of cross-validation methods to account for this: prediction-domain adaptive evaluation. Methods in this category flexibly adapt to the prediction situation, yielding most reliable estimates of map accuracy across different scenarios. To ground this perspective empirically, we reproduce a simulation study that was used in earlier research and systematically compare different evaluation methods and discuss their purpose.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper calls for prediction-domain adaptive cross-validation to handle the interpolation-extrapolation middle ground but delivers only a literature synthesis and a reproduced simulation, with no new methods or tests.

read the letter

The core point is that random cross-validation works for interpolation while spatial versions suit extrapolation, yet most practical cases fall in between on a continuum. The authors argue we need a new category of methods that adapt to the actual prediction domain to give better accuracy estimates, and they back this by re-running an earlier simulation study to compare approaches. That diagnosis is straightforward and matches what many spatial modelers have run into. They pull the criticisms together cleanly and keep the focus on the practical gap rather than overclaiming. The reproduction helps ground the discussion without introducing new data or code. The main limitation is that nothing in the new category is actually built or compared here. The claim that adaptive methods will yield the most reliable estimates rests on the hope that such techniques can be developed and will outperform the options we already have. Without at least one concrete implementation or a head-to-head result showing lower error in accuracy estimation, the advocacy stays conceptual. Readers in ecology or environmental modeling who spend time choosing validation schemes will recognize the problem and might use the paper as a prompt for their own experiments. Someone looking for a ready algorithm or benchmark numbers will come away empty. The work is coherent on its own terms and engages the literature directly, so it is worth sending out for peer review. Referees could push for a worked example or clearer next steps, but the underlying issue is real enough to discuss.

Referee Report

3 major / 1 minor

Summary. The manuscript argues that random cross-validation suits interpolation while spatial cross-validation suits extrapolation, but most real-world cases lie on a continuum between these extremes. It therefore advocates a new category of 'prediction-domain adaptive evaluation' methods that flexibly adapt to the prediction situation to yield more reliable map-accuracy estimates. The perspective is grounded by reproducing an earlier simulation study that systematically compares existing evaluation methods.

Significance. If concrete, implementable methods belonging to the proposed prediction-domain adaptive category can be developed and shown to reduce error in accuracy estimation relative to random or spatial cross-validation across the interpolation-extrapolation continuum, the work would meaningfully advance model-evaluation practice in spatial environmental modeling. The reproduction of the prior simulation study supplies a useful empirical anchor for the discussion and highlights the practical limitations of current approaches.

major comments (3)

[Abstract] Abstract: the claim that prediction-domain adaptive methods 'flexibly adapt to the prediction situation, yielding most reliable estimates of map accuracy across different scenarios' is presented as the central recommendation, yet the manuscript introduces no new algorithm, pseudocode, or quantitative comparison demonstrating superiority of any member of this category.
[Simulation study section] Reproduced simulation study: while the study compares known random and spatial cross-validation methods, it contains no member of the advocated prediction-domain adaptive category, so the manuscript provides no direct empirical evidence that such methods outperform existing ones on the continuum.
[Discussion] Discussion: the call for prediction-domain adaptive evaluation rests on the untested premise that practical methods in this category can be constructed and will demonstrably improve accuracy estimation; no feasibility argument, example implementation, or falsifiable prediction is supplied to support this premise.

minor comments (1)

[Abstract] The term 'prediction-domain adaptive evaluation' is introduced without a concise operational definition; adding one sentence that distinguishes it from both random and spatial CV would improve clarity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the need for clearer distinctions between advocacy and empirical demonstration in our perspective. We address each major comment below, proposing targeted revisions to better align the manuscript's claims with its scope as a call for future methodological development rather than a presentation of new methods.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that prediction-domain adaptive methods 'flexibly adapt to the prediction situation, yielding most reliable estimates of map accuracy across different scenarios' is presented as the central recommendation, yet the manuscript introduces no new algorithm, pseudocode, or quantitative comparison demonstrating superiority of any member of this category.

Authors: We agree that the abstract phrasing implies a stronger empirical foundation than the manuscript provides. The paper is a perspective advocating for the development of prediction-domain adaptive evaluation methods, not introducing or testing any specific member of that category. We will revise the abstract to frame the recommendation as a call for future research to create and validate such adaptive methods, grounded in the identified shortcomings of random and spatial cross-validation. revision: yes
Referee: [Simulation study section] Reproduced simulation study: while the study compares known random and spatial cross-validation methods, it contains no member of the advocated prediction-domain adaptive category, so the manuscript provides no direct empirical evidence that such methods outperform existing ones on the continuum.

Authors: The reproduced simulation study is included solely to demonstrate the limitations of existing random and spatial cross-validation approaches across interpolation-extrapolation scenarios, providing an empirical basis for why a new category is needed. It is not designed to test prediction-domain adaptive methods, as none are developed here. We will add explicit language in the simulation section to clarify its role as an anchor for the perspective rather than a comparative evaluation of the proposed category. revision: partial
Referee: [Discussion] Discussion: the call for prediction-domain adaptive evaluation rests on the untested premise that practical methods in this category can be constructed and will demonstrably improve accuracy estimation; no feasibility argument, example implementation, or falsifiable prediction is supplied to support this premise.

Authors: We acknowledge that the manuscript offers no concrete implementation or direct test of feasibility for prediction-domain adaptive methods. As a perspective piece, its primary aim is to identify the gap and encourage development of such methods. We will expand the discussion to include a brief outline of potential directions (e.g., adapting domain-adaptation or distance-weighted sampling techniques to the prediction domain) to provide initial feasibility grounding and falsifiable hypotheses for future work, without claiming to resolve the challenge. revision: yes

Circularity Check

0 steps flagged

No significant circularity; perspective paper advocates new category based on reproduced external simulation without self-referential derivations

full rationale

The manuscript contains no equations, fitted parameters, or derivation chain. It reproduces a simulation study from earlier external research and compares existing random and spatial cross-validation methods. The proposal for 'prediction-domain adaptive evaluation' is a conceptual category without any self-definition, fitted-input prediction, or load-bearing self-citation that reduces the central claim to its own inputs by construction. The argument relies on external literature and empirical comparison of known approaches, making it self-contained against the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The claim depends on the domain assumption that adaptive methods are feasible and superior, plus the newly introduced conceptual category itself, which lacks independent evidence or implementation.

axioms (2)

domain assumption Independent probability samples of the prediction area are ideal for accuracy estimation but frequently unavailable
Explicitly stated in the opening of the abstract.
domain assumption Real prediction situations form a continuum between random and spatial extremes
Stated directly: 'in practice, however, there is a continuum and most cases are between those two extremes.'

invented entities (1)

prediction-domain adaptive evaluation no independent evidence
purpose: A new category of cross-validation methods that flexibly adapt to the prediction situation
Introduced in the abstract as the advocated solution; no specific algorithm or independent evidence provided.

pith-pipeline@v0.9.0 · 5534 in / 1366 out tokens · 58239 ms · 2026-05-14T17:49:23.597654+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Nature Communications , author =

Spatial validation reveals poor predictive performance of large-scale ecological mapping models , volume =. Nature Communications , author =. 2020 , pages =

2020
[2]

Nature Communications , author =

Machine learning-based global maps of ecological variables and the challenge of assessing them , volume =. Nature Communications , author =. 2022 , pages =

2022
[3]

Methods in Ecology and Evolution , volume =

Meyer, Hanna and Pebesma, Edzer , title =. Methods in Ecology and Evolution , volume =. doi:https://doi.org/10.1111/2041-210X.13650 , url =. https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.13650 , abstract =

work page doi:10.1111/2041-210x.13650 2041
[4]

Wadoux and Gerard B.M

Alexandre M.J.-C. Wadoux and Gerard B.M. Heuvelink and Sytze. Spatial cross-validation is not the right way to evaluate map accuracy , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.ecolmodel.2021.109692 , url =

work page doi:10.1016/j.ecolmodel.2021.109692 2021
[5]

Brenning , title =

A. Brenning , title =. 2012 IEEE International Geoscience and Remote Sensing Symposium , year =. doi:10.1109/IGARSS.2012.6352393 , file =

work page doi:10.1109/igarss.2012.6352393 2012
[6]

and Bahn, Volker and Ciuti, Simone and Boyce, Mark S

Roberts, David R. and Bahn, Volker and Ciuti, Simone and Boyce, Mark S. and Elith, Jane and Guillera-Arroita, Gurutzeta and Hauenstein, Severin and Lahoz-Monfort, Jos\'es J. and Schr. Ecography , year =. doi:10.1111/ecog.02881 , publisher =

work page doi:10.1111/ecog.02881
[7]

Ecological Modelling , year =

Patrick Schratz and Jannes Muenchow and Eugenia Iturritxa and Jakob Richter and Alexander Brenning , title =. Ecological Modelling , year =. doi:https://doi.org/10.1016/j.ecolmodel.2019.06.002 , keywords =

work page doi:10.1016/j.ecolmodel.2019.06.002 2019
[8]

Environmental Modelling & Software , year =

Hanna Meyer and Christoph Reudenbach and Tomislav Hengl and Marwan Katurji and Thomas Nauss , title =. Environmental Modelling & Software , year =. doi:https://doi.org/10.1016/j.envsoft.2017.12.001 , keywords =

work page doi:10.1016/j.envsoft.2017.12.001 2017
[9]

and Guillera-Arroita, Gurutzeta , title =

Valavi, Roozbeh and Elith, Jane and Lahoz-Monfort, Jose J. and Guillera-Arroita, Gurutzeta , title =. bioRxiv , year =. doi:10.1101/357798 , eprint =

work page doi:10.1101/357798
[10]

International Journal of Geographical Information Science , year =

Jonne Pohjankukka and Tapio Pahikkala and Paavo Nevalainen and Jukka Heikkonen , title =. International Journal of Geographical Information Science , year =. doi:10.1080/13658816.2017.1346255 , eprint =

work page doi:10.1080/13658816.2017.1346255 2017
[11]

Methods in Ecology and Evolution , author =

Nearest neighbour distance matching. Methods in Ecology and Evolution , author =. 2022 , note =. doi:10.1111/2041-210X.13851 , language =

work page doi:10.1111/2041-210x.13851 2022
[12]

Ecological Modelling , author =

Importance of spatial predictor variable selection in machine learning applications –. Ecological Modelling , author =. 2019 , pages =. doi:10.1016/j.ecolmodel.2019.108815 , abstract =

work page doi:10.1016/j.ecolmodel.2019.108815 2019
[13]

and Mil\`a, C

Linnenbrink, J. and Mil\`a, C. and Ludwig, M. and Meyer, H. , TITLE =. Geoscientific Model Development , VOLUME =. 2024 , NUMBER =

2024
[14]

Journal of Statistical Software , author=

mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R , volume=. Journal of Statistical Software , author=. 2024 , pages=. doi:10.18637/jss.v111.i07 , number=

work page doi:10.18637/jss.v111.i07 2024
[15]

2023 , eprint =

Assessing the performance of spatial cross-validation approaches for models of spatially structured data , author =. 2023 , eprint =. doi:10.48550/arXiv.2303.07334 , url =

work page doi:10.48550/arxiv.2303.07334 2023
[16]

2023 , url =

Walid Ghariani , title =. 2023 , url =

2023
[17]

Journal of Open Source Software , volume =

Uieda, Leonardo , year =. Journal of Open Source Software , volume =
[18]

Nature Climate Change , author =

Estimated carbon dioxide emissions from tropical deforestation improved by carbon-density maps , volume =. Nature Climate Change , author =. 2012 , pages =. doi:10.1038/nclimate1354 , language =

work page doi:10.1038/nclimate1354 2012
[19]

Ecological Informatics , author =

Dealing with clustered samples for assessing map accuracy by cross-validation , volume =. Ecological Informatics , author =. 2022 , note =. doi:10.1016/j.ecoinf.2022.101665 , language =

work page doi:10.1016/j.ecoinf.2022.101665 2022
[20]

International Journal of Applied Earth Observation and Geoinformation , author =

Spatial+:. International Journal of Applied Earth Observation and Geoinformation , author =. 2023 , pages =. doi:10.1016/j.jag.2023.103364 , language =

work page doi:10.1016/j.jag.2023.103364 2023
[21]

Ecological Informatics , author =

A dissimilarity-adaptive cross-validation method for evaluating geospatial machine learning predictions with clustered samples , volume =. Ecological Informatics , author =. 2025 , pages =. doi:10.1016/j.ecoinf.2025.103287 , language =

work page doi:10.1016/j.ecoinf.2025.103287 2025
[22]

Global Ecology and Biogeography , volume =

Ludwig, Marvin and Moreno-Martinez, Alvaro and Hölzel, Norbert and Pebesma, Edzer and Meyer, Hanna , title =. Global Ecology and Biogeography , volume =. doi:https://doi.org/10.1111/geb.13635 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/geb.13635 , abstract =

work page doi:10.1111/geb.13635
[23]

and Jandt, Ute and Jansen, Florian and Jiménez-Alfaro, Borja and Kattge, Jens and Levesley, Aurora and Pillar, Valério D

Sabatini, Francesco Maria and Lenoir, Jonathan and Hattab, Tarek and Arnst, Elise Aimee and Chytrý, Milan and Dengler, Jürgen and De Ruffray, Patrice and Hennekens, Stephan M. and Jandt, Ute and Jansen, Florian and Jiménez-Alfaro, Borja and Kattge, Jens and Levesley, Aurora and Pillar, Valério D. and Purschke, Oliver and Sandel, Brody and Sultana, Fahmida...

work page doi:10.1111/geb.13346
[24]

Batjes, N. H. and Ribeiro, E. and van Oostrum, A. and Leenaars, J. and Hengl, T. and Mendes de Jesus, J. , TITLE =. Earth System Science Data , VOLUME =. 2017 , NUMBER =

2017
[25]

Scientific Data , author =

The. Scientific Data , author =. 2020 , pages =. doi:10.1038/s41597-020-0534-3 , abstract =

work page doi:10.1038/s41597-020-0534-3 2020
[26]

GBIF Home Page , year =
[27]

Nature , author =

Soil nematode abundance and functional group composition at a global scale , volume =. Nature , author =. 2019 , pages =. doi:10.1038/s41586-019-1418-6 , language =

work page doi:10.1038/s41586-019-1418-6 2019
[28]

Science , author =

The global tree restoration potential , volume =. Science , author =. 2019 , pages =. doi:10.1126/science.aax0848 , language =

work page doi:10.1126/science.aax0848 2019
[29]

New Phytologist , author =

Global models and predictions of plant diversity based on advanced machine learning techniques , volume =. New Phytologist , author =. 2023 , pages =. doi:10.1111/nph.18533 , language =

work page doi:10.1111/nph.18533 2023
[30]

Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome , year =. The
[31]

International Journal of Remote Sensing , author =

Sampling designs for accuracy assessment of land cover , volume =. International Journal of Remote Sensing , author =. 2009 , pages =. doi:10.1080/01431160903131000 , language =

work page doi:10.1080/01431160903131000 2009
[32]

Remote Sensing of Environment , author =

Practical. Remote Sensing of Environment , author =. 2000 , pages =. doi:10.1016/S0034-4257(99)00090-5 , language =

work page doi:10.1016/s0034-4257(99)00090-5 2000
[33]

and Orr, Michael C

Hughes, Alice C. and Orr, Michael C. and Ma, Keping and Costello, Mark J. and Waller, John and Provoost, Pieter and Yang, Qinmin and Zhu, Chaodong and Qiao, Huijie , title =. Ecography , volume =. doi:https://doi.org/10.1111/ecog.05926 , url =. https://nsojournals.onlinelibrary.wiley.com/doi/pdf/10.1111/ecog.05926 , year =

work page doi:10.1111/ecog.05926
[34]

European Journal of Soil Science , author =

Sampling for validation of digital soil maps , volume =. European Journal of Soil Science , author =. 2011 , pages =. doi:10.1111/j.1365-2389.2011.01364.x , language =

work page doi:10.1111/j.1365-2389.2011.01364.x 2011
[35]

2025 , note =

CAST: 'caret' Applications for Spatial-Temporal Models , author =. 2025 , note =

2025
[36]

Natural Hazards and Earth System Sciences , author =

Spatial prediction models for landslide hazards: review, comparison and evaluation , volume =. Natural Hazards and Earth System Sciences , author =. 2005 , pages =. doi:10.5194/nhess-5-853-2005 , language =

work page doi:10.5194/nhess-5-853-2005 2005
[37]

Ecography , volume =

Huang, Hongwei and Zhang, Zhixin and Bede-Fazekas, Ákos and Mammola, Stefano and Gu, Jiqi and Zhou, Jinxin and Qu, Junmei and Lin, Qiang , title =. Ecography , volume =. doi:https://doi.org/10.1111/ecog.07354 , year =

work page doi:10.1111/ecog.07354
[38]

Ecography , author =

Projecting spatiotemporal bioclimatic niche dynamics of endemic. Ecography , author =. 2026 , pages =. doi:10.1002/ecog.08067 , language =

work page doi:10.1002/ecog.08067 2026
[39]

Ecology and Evolution , author =

Modeling. Ecology and Evolution , author =. 2026 , pages =. doi:10.1002/ece3.72031 , language =

work page doi:10.1002/ece3.72031 2026
[40]

Kuhn, Max and Johnson, Kjell , year =. Applied. doi:10.1007/978-1-4614-6849-3 , language =

work page doi:10.1007/978-1-4614-6849-3
[41]

Statistics for spatial data , isbn =
[42]

Brus and J.J

D.J. Brus and J.J. Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion) , journal =. 1997 , issn =. doi:https://doi.org/10.1016/S0016-7061(97)00072-4 , url =

work page doi:10.1016/s0016-7061(97)00072-4 1997
[43]

Nature Communications , author =

Crowdsourced biodiversity monitoring fills gaps in global plant trait mapping , volume =. Nature Communications , author =. 2026 , pages =. doi:10.1038/s41467-026-68996-y , language =

work page doi:10.1038/s41467-026-68996-y 2026
[44]

Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Brenning, Alexander and Suesse, Thomas , year =. Aligning. doi:10.48550/ARXIV.2603.29981 , urldate =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.29981
[45]

Xue, Peipei and Minasny, Budiman and Román Dobarco, Mercedes and Wadoux, Alexandre M. J.-C. and Padarian Campusano, Jose and Bissett, Andrew and de Caritat, Patrice and McBratney, Alex , title =. Global Change Biology , volume =. doi:https://doi.org/10.1111/gcb.70268 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.70268 , note =

work page doi:10.1111/gcb.70268