pith. sign in

arxiv: 2606.21475 · v1 · pith:7HXD4VXSnew · submitted 2026-06-19 · 💻 cs.LG

Deep Learning for Soil Moisture Estimation: Fusing Satellite Data with Optimally-Lagged Meteorological Features

Pith reviewed 2026-06-26 14:32 UTC · model grok-4.3

classification 💻 cs.LG
keywords soil moisture estimationdeep learningsatellite remote sensingmeteorological lagscross-correlationCNN-LSTMprecision agriculturesubsurface moisture
0
0 comments X

The pith

Incorporating time-lagged meteorological data and soil depth information improves deep learning models for estimating soil moisture from satellite observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether accounting for the time it takes meteorological conditions to affect soil moisture, and how moisture moves between soil depths, can enhance predictions made by neural networks that also use satellite data. It develops a method using cross-correlation to pick the best lags for each variable and depth. The authors apply this to data from seven plots in Spain and compare three neural network designs under different feature sets. If the approach works, it would mean more accurate soil moisture maps for farming in dry areas without needing extra sensors everywhere.

Core claim

The study shows that determining optimal temporal lags (0-30 days) for meteorological variables and inter-depth lags (0-15 days) using cross-correlation, and incorporating them into CNN, LSTM, and CNN-LSTM models, leads to better soil moisture prediction, with the hybrid model achieving R^2 of 0.930 on held-out data.

What carries the argument

The Cross-Correlation Function (CCF) methodology to determine optimal temporal lags between meteorological variables and soil moisture, as well as inter-depth lags describing vertical moisture propagation from the surface to deeper layers.

If this is right

  • Meteorological variables with optimal lags improve performance compared to using satellite data alone.
  • Including subsurface depth information is decisive for accurate predictions across all tested model architectures.
  • A per-pixel CNN achieves the strongest single-patch result with R^2 of 0.877.
  • A pooled CNN-LSTM hybrid reaches the highest overall performance with R^2 of 0.930.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The lag selection process might transfer to estimating other delayed environmental processes such as groundwater response.
  • Retraining the lag finder on new regions could be necessary if physical conditions differ substantially from the original plots.
  • The multi-patch training strategy indicates that pooling data from multiple sites aids generalization within similar agricultural settings.

Load-bearing premise

The lags selected via cross-correlation on the training plots capture generalizable physical delays rather than dataset-specific correlations or noise.

What would settle it

Retraining and testing the models on data from a different semi-arid region without re-computing the lags from cross-correlation, and checking if the performance improvement over the satellite-only baseline holds or vanishes.

Figures

Figures reproduced from arXiv: 2606.21475 by Adrian Canovas-Rodriguez, Antonio F. Skarmeta, Aurora Gonz\'alez Vidal.

Figure 2
Figure 2. Figure 2: Aerial views of the seven study plots (Patch-1 to Patch-7) located in the Region of Murcia, Spain. 2. Materials and Methods 2.1. Study Area and Experimental Plots The study was conducted in the Region of Murcia, southeastern Spain (approximately 37° 55′ N, 1° 28′ W), a semi-arid Mediterranean environment with hot summers, mild winters, a mean annual temperature near 18°C, and scarce, irregular rainfall (30… view at source ↗
Figure 1
Figure 1. Figure 1: Location of the seven study plots on the orthophoto of the Region of Murcia. Seven rainfed and irrigated plots (hereafter Patch-1 to Patch-7), distributed across the La Palmera, Torre Pacheco and “PSB Producci’on Vegetal” estates, were instrumented [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall processing workflow from Sentinel-2 and meteorological inputs through feature engineering (including CCF-based lag estimation) to deep learning models and evaluation. near-instantaneous surface–subsurface coupling on a well￾connected plot (Patch-3, 𝑟 ≈ 0.95 to 50 cm) but 14–15-day lags and weak correlation on a plot with a distinct, poorly connected vertical profile (Patch-1, |𝑟| < 0.26). Aligning … view at source ↗
Figure 4
Figure 4. Figure 4: Temporal evolution of soil moisture measured by depth sensors across the study plots. Each line represents a different measurement depth (10–50 cm). Satellite Image Acquisition and Processing Sentinel-2 imagery was acquired via the SentinelHub library, which interfaces with the Copernicus API through the SHConfig() authentication mechanism. For each acquisi￾tion date and plot, a GeoTIFF raster containing a… view at source ↗
read the original abstract

Accurate soil moisture estimation in semi-arid agricultural regions requires integrating remote sensing and meteorological information while accounting for the delayed response of soil moisture to atmospheric forcing. This study introduces a Cross-Correlation Function (CCF) methodology to determine optimal temporal lags (0-30 days) between meteorological variables and soil moisture, as well as inter-depth lags (0-15 days) describing vertical moisture propagation from the surface (10 cm) to deeper layers (20-50 cm). The approach was validated across seven agricultural plots in southeastern Spain. Three deep learning architectures, each targeting a distinct prediction granularity, were evaluated under five feature configurations ranging from satellite-only to full satellite-meteorology-depth fusion: a CNN for per-pixel estimation within each plot, an LSTM for frame-level (daily plot-mean) prediction, and a CNN-LSTM hybrid operating on sliding windows with pooled multi-patch training. Models were assessed on held-out data to measure genuine generalisation. Meteorological variables improved performance over the satellite-only baseline, while subsurface depth information proved decisive across all architectures. The per-pixel CNN achieved the strongest single-patch result (R^2 = 0.877, RMSE = 2.28), with a seven-patch average R^2 of 0.535, representing an improvement of +1.00 over the satellite-only baseline. The pooled CNN-LSTM hybrid obtained the highest overall performance (R^2 = 0.930, CVRMSE = 8.0%). These results demonstrate that explicitly modelling atmospheric and vertical subsurface delays substantially improves soil moisture estimation for precision agriculture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that a Cross-Correlation Function (CCF) approach can identify optimal temporal lags (0-30 days for meteorological variables, 0-15 days for inter-depth propagation) between satellite observations, meteorological drivers, and soil moisture at multiple depths. These lagged features are then fused into three deep-learning architectures (per-pixel CNN, LSTM, and CNN-LSTM hybrid) and evaluated on held-out data from seven agricultural plots in southeastern Spain. The central empirical result is that adding the optimally lagged meteorological and subsurface-depth features produces substantial gains over a satellite-only baseline, with the pooled CNN-LSTM hybrid reaching R² = 0.930 and the per-pixel CNN achieving a seven-plot average R² = 0.535 (+1.00 over baseline).

Significance. If the reported gains are shown to arise from physically transferable delay modeling rather than plot-specific lag selection, the work would provide concrete evidence that explicit incorporation of atmospheric and vertical propagation delays improves deep-learning soil-moisture retrievals for precision agriculture. The concrete held-out metrics and the comparison across five feature configurations constitute a clear, falsifiable demonstration of the value of the lagged-feature strategy.

major comments (2)
  1. [Methods (CCF methodology)] Methods (CCF lag selection): The manuscript does not state whether the cross-correlation functions used to select the 0-30-day meteorological lags and 0-15-day inter-depth lags were computed exclusively on the training subset of the seven plots or on the full dataset. Because the reported performance lift (e.g., +1.00 R² for the per-pixel CNN) is attributed to these lags, any leakage from held-out plots into lag choice would render the generalization claim circular.
  2. [Experimental setup and validation] Experimental design (seven-plot validation): With only seven plots and no description of nested cross-validation or lag-sensitivity analysis, it remains possible that the chosen lags capture plot-specific irrigation schedules or soil heterogeneity rather than generalizable physical response times. A leave-one-plot-out protocol with fixed lags would be required to substantiate the claim that delay modeling drives the observed improvements.
minor comments (1)
  1. [Abstract] Abstract and results: The seven-patch average R² = 0.535 is presented as an improvement of +1.00 over the satellite-only baseline; the baseline value itself should be reported explicitly for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and indicate the revisions that will be made to the manuscript.

read point-by-point responses
  1. Referee: [Methods (CCF methodology)] Methods (CCF lag selection): The manuscript does not state whether the cross-correlation functions used to select the 0-30-day meteorological lags and 0-15-day inter-depth lags were computed exclusively on the training subset of the seven plots or on the full dataset. Because the reported performance lift (e.g., +1.00 R² for the per-pixel CNN) is attributed to these lags, any leakage from held-out plots into lag choice would render the generalization claim circular.

    Authors: We agree that this procedural detail is not explicitly stated and should be clarified. The CCF lag selections were performed exclusively on the training subsets of each plot to avoid any leakage from held-out data. We will add a clear statement to this effect in the Methods section of the revised manuscript. revision: yes

  2. Referee: [Experimental setup and validation] Experimental design (seven-plot validation): With only seven plots and no description of nested cross-validation or lag-sensitivity analysis, it remains possible that the chosen lags capture plot-specific irrigation schedules or soil heterogeneity rather than generalizable physical response times. A leave-one-plot-out protocol with fixed lags would be required to substantiate the claim that delay modeling drives the observed improvements.

    Authors: Our current validation uses held-out data portions within the seven plots to evaluate generalization. We acknowledge that the small number of plots and lack of explicit leave-one-plot-out or lag-sensitivity analysis leaves room for plot-specific effects. To address this, we will add a leave-one-plot-out evaluation (with lags fixed from the original training procedure) and a brief lag-sensitivity analysis in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: lag selection and model evaluation follow standard non-circular supervised ML pipeline on held-out data.

full rationale

The paper selects lags via CCF on training plots then trains DL models to predict soil moisture from the lagged features, evaluating on held-out plots. This is ordinary feature engineering followed by supervised training and generalization testing; the reported R² values are not equivalent to the CCF correlations by construction, nor do any equations reduce the target prediction to a fitted parameter or self-citation. No self-definitional steps, no uniqueness theorems, and no load-bearing self-citations appear in the provided text. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that cross-correlation reliably identifies physically meaningful lags and that standard DL training on the described feature sets produces generalizable improvements; no new entities are postulated and no additional free parameters beyond standard model training are introduced in the abstract.

axioms (1)
  • domain assumption Cross-correlation function applied to the plot data identifies lags that reflect genuine atmospheric and vertical propagation delays rather than spurious correlations.
    Invoked to justify the 0-30 day meteorological and 0-15 day inter-depth lag selections used in all feature configurations.

pith-pipeline@v0.9.1-grok · 5828 in / 1513 out tokens · 39030 ms · 2026-06-26T14:32:08.579506+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references

  1. [1]

    Manrique-Alba, S

    À. Manrique-Alba, S. Ruiz-Yanetti, H. Moutahir, K. Novak, M. De Luis, J. Bellot, Soil moisture and its role in growth-climate relationships across an aridity gradient in semiarid pinus halepensis forests, Science of the Total Environment 574 (2017) 982–990

  2. [2]

    A. V. Ines, N. N. Das, J. W. Hansen, E. G. Njoku, Assimilation of remotely sensed soil moisture and vegetation with a crop simulation model for maize yield prediction, Remote Sensing of Environment 138 (2013) 149–164

  3. [3]

    R. S. Ayers, D. W. Westcot, et al., Water quality for agriculture, volume 29, Food and agriculture organization of the United Nations Rome, 1985

  4. [4]

    D. A. Robinson, S. B. Jones, J. M. Wraith, D. Or, S. P. Friedman, A review of advances in dielectric and electrical conductivity measure- ment in soils using time domain reflectometry, Vadose zone journal 2 (2003) 444–475. A. Canovas-Rodriguez et al.:Preprint submitted to ElsevierPage 14 of 15 Deep Learning for Soil Moisture Estimation

  5. [5]

    M. S. Farooq, S. Riaz, A. Abid, K. Abid, M. A. Naeem, A survey on theroleofiotinagriculturefortheimplementationofsmartfarming, IEEE access 7 (2019) 156237–156271

  6. [6]

    A. Garg, V. Sreshta, N. Mehta, Application of soil moisture sensors in agriculture: A review, International Journal of Research in Engi- neering and Applied Sciences 6 (2016) 55–64

  7. [7]

    Y. H. Kerr, P. Waldteufel, J.-P. Wigneron, S. Delwart, F. Cabot, J. Boutin, M.-J. Escorihuela, J. Font, N. Reul, C. Gruhier, et al., The smos mission: New tool for monitoring key elements ofthe global water cycle, Proceedings of the IEEE 98 (2010) 666–687

  8. [8]

    Entekhabi, E

    D. Entekhabi, E. G. Njoku, P. E. O’neill, K. H. Kellogg, W. T. Crow, W.N.Edelstein,J.K.Entin,S.D.Goodman,T.J.Jackson,J.Johnson, et al., The soil moisture active passive (smap) mission, Proceedings of the IEEE 98 (2010) 704–716

  9. [9]

    Drusch, U

    M. Drusch, U. Del Bello, S. Carlier, O. Colin, V. Fernandez, F. Gas- con,B.Hoersch,C.Isola,P.Laberinti,P.Martimort,etal., Sentinel-2: Esa’s optical high-resolution mission for gmes operational services, Remote sensing of Environment 120 (2012) 25–36

  10. [10]

    E.G.Njoku,T.J.Jackson,V.Lakshmi,T.K.Chan,S.V.Nghiem,Soil moistureretrievalfromamsr-e, IEEEtransactionsonGeoscienceand remote sensing 41 (2003) 215–229

  11. [11]

    Y. Wang, W. Wang, Z. Ma, M. Zhao, W. Li, X. Hou, J. Li, F. Ye, W. Ma, A deep learning approach based on physical constraints for predicting soil moisture in unsaturated zones, Water Resources Research 59 (2023) e2023WR035194

  12. [12]

    E. H. Hegazi, A. A. Samak, L. Yang, R. Huang, J. Huang, Prediction of soil moisture content from sentinel-2 images using convolutional neural network (cnn), Agronomy 13 (2023) 656

  13. [13]

    Q. Geng, S. Yan, Q. Li, C. Zhang, Enhancing data-driven soil moisture modeling with physically-guided lstm networks, Frontiers in Forests and Global Change 7 (2024) 1353011

  14. [14]

    J. Yu, X. Zhang, L. Xu, J. Dong, L. Zhangzhong, A hybrid cnn-gru model for predicting soil moisture in maize root zone, Agricultural Water Management 245 (2021) 106649

  15. [15]

    A. Rani, N. Kumar, J. Kumar, N. K. Sinha, Machine learning for soil moisture assessment, in: Deep learning for sustainable agriculture, Elsevier, 2022, pp. 143–168

  16. [16]

    Ahmad, A

    S. Ahmad, A. Kalra, H. Stephen, Estimating soil moisture using remotesensingdata:Amachinelearningapproach,Advancesinwater resources 33 (2010) 69–80

  17. [17]

    C.S.Lee,E.Sohn,J.D.Park,J.-D.Jang, Estimationofsoilmoisture using deep learning based on satellite data: A case study of south korea, GIScience & Remote Sensing 56 (2019) 43–67

  18. [18]

    J. Wei, R. Song, Spatiotemporal characteristics of soil moisture memory: an integrated analysis using multiple metrics and datasets, Climate Dynamics 63 (2025) 228

  19. [19]

    Z. Bai, S. Jia, G. Wang, M. Huang, W. Zhang, Near real-time reconstruction of 0–200 cm soil moisture profiles in croplands using shallow-layer monitoring and multi-day meteorological accumula- tions, Agronomy 15 (2025) 2864

  20. [20]

    M.Rahmati,W.Amelung,C.Brogi,J.Dari,A.Flammini,H.Bogena, L.Brocca,H.Chen,J.Groh,R.D.Koster,etal., Soilmoisturemem- ory:State-of-the-art andtheway forward, ReviewsofGeophysics 62 (2024) e2023RG000828

  21. [21]

    T. Wu, L. Xu, Y. Lv, R. Cai, Z. Pan, X. Zhang, X. Zhang, N. Chen, Integratingcausalinferencewithconvlstmnetworksforspatiotempo- ral forecasting of root zone soil moisture, Journal of Hydrology 659 (2025) 133246

  22. [22]

    Kapoor, A

    S. Kapoor, A. Narayanan, Leakage and the reproducibility crisis in machine-learning-based science, Patterns 4 (2023)

  23. [23]

    D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera- Arroita,S.Hauenstein,J.J.Lahoz-Monfort,B.Schröder,W.Thuiller, et al., Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography 40 (2017) 913– 929

  24. [24]

    Le Rest, D

    K. Le Rest, D. Pinaud, P. Monestiez, J. Chadoeuf, V. Bretagnolle, Spatial leave-one-out cross-validation for variable selection in the presenceofspatialautocorrelation, Globalecologyandbiogeography 23 (2014) 811–820

  25. [25]

    M. Shah, M. S. Raval, S. Divakaran, A systematic review on deep learning for atmospheric correction of satellite images, Archives of Computational Methods in Engineering (2025) 1–31

  26. [26]

    Hochreiter, J

    S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (1997) 1735–1780

  27. [27]

    D.F.Kandamali,E.Porter,W.M.Porter,A.McLemore,D.O.Kiobia, A.P.Tavandashti,G.C.Rains, Hybridlstmmethodformultistepsoil moisture prediction using historical soil moisture and weather data, AgriEngineering 7 (2025) 260

  28. [28]

    J.Li,D.Hong,L.Gao,J.Yao,K.Zheng,B.Zhang,J.Chanussot,Deep learninginmultimodalremotesensingdatafusion:Acomprehensive review, International Journal of Applied Earth Observation and Geoinformation 112 (2022) 102926

  29. [29]

    J. Liu, Z. Hao, J. Ding, Y. Zhang, Z. Miao, Y. Zheng, A. Alimu, H. Cheng, X. Li, Ensemble machine-learning-based framework for estimatingsurfacesoilmoistureusingSentinel-1/2data:Acasestudy of an arid oasis in China, Land 13 (2024) 1635

  30. [30]

    Chatenoux, J.-P

    B. Chatenoux, J.-P. Richard, D. Small, C. Roeoesli, V. Wingate, C. Poussin, D. Rodila, P. Peduzzi, C. Steinmeier, C. Ginzler, A. Pso- mas, M. E. Schaepman, G. Giuliani, The Swiss data cube, analysis readydataarchiveusingearthobservationsofSwitzerland, Scientific Data 8 (2021) 295

  31. [31]

    Australia, Digital earth australia (2024)

    G. Australia, Digital earth australia (2024). A. Canovas-Rodriguez et al.:Preprint submitted to ElsevierPage 15 of 15