Graph-based automated discovery of concise soil hydraulic functions from data: beyond the Mualem - van Genuchten model
Pith reviewed 2026-05-20 03:10 UTC · model grok-4.3
The pith
A graph-based discovery method finds explicit soil hydraulic functions that predict unsaturated conductivity more accurately than the Mualem-van Genuchten model on 249 soil samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applied to the original datasets used in the development of the Mualem-van Genuchten model, the graph-based automated model discovery framework identifies a concise soil water retention function and its associated unsaturated hydraulic conductivity function whose mathematical structure differs fundamentally from classical empirical forms; across 249 real soil samples spanning diverse textural classes, the discovered functions achieve more accurate predictions of unsaturated hydraulic conductivity than the MvG model, and the fitted parameters exhibit correlations with soil physical properties.
What carries the argument
The graph-based automated model discovery framework that generates and evaluates candidate explicit functional forms directly from experimental soil data without relying on predefined empirical assumptions.
If this is right
- The discovered functions can serve as drop-in replacements for the MvG model in vadose-zone flow simulations to reduce prediction error.
- Correlations between the new function parameters and measurable soil properties enable estimation of hydraulic behavior from basic texture data.
- Data-driven discovery can generate compact constitutive models that remain robust across a wider range of soil textures than hand-derived empirical forms.
- Explicit functions identified by the method maintain mathematical simplicity while improving accuracy on independent data.
Where Pith is reading between the lines
- If the functions remain accurate outside the tested range, they could be adopted in regional groundwater or climate models to lower uncertainty in soil-water flux estimates.
- The same graph-based search procedure could be applied to discover constitutive relations for related processes such as solute transport or gas flow in porous media.
- The observed parameter-soil property correlations open a route to hybrid models that predict function coefficients from easily measured attributes like sand-silt-clay fractions.
Load-bearing premise
The graph-based search applied to the original MvG development datasets produces functional forms that are both structurally different from classical models and genuinely more predictive on independent soil samples rather than merely fitting the same data better through added flexibility.
What would settle it
A direct comparison of root-mean-square errors or other prediction metrics between the discovered functions and the MvG model on a fresh collection of at least 100 soil samples drawn from textural classes not emphasized in the 249-sample test set.
Figures
read the original abstract
Soil hydraulic functions are fundamental to modelling water flow and transport in vadose-zone hydrology and are central to a wide range of hydrological and geoscientific applications. Yet in practice, these functions are still predominantly specified through expert-designed empirical formulations, such as the Mualem-van Genuchten (MvG) model. Although such models have proved highly influential, their derivation relies on predefined functional assumptions that make it difficult to simultaneously achieve accuracy, compactness, and robustness across diverse soil textures. Here we present a graph-based automated model discovery framework for discovering explicit soil hydraulic functions directly from experimental data. Applied to the original datasets used in the development of the MvG model, the method identifies a concise soil water retention function and its associated unsaturated hydraulic conductivity function whose mathematical structure differs fundamentally from classical empirical forms. Across 249 real soil samples spanning diverse textural classes, the discovered functions achieve more accurate predictions of unsaturated hydraulic conductivity than the MvG model. The fitted parameters also exhibit correlations with soil physical properties. This work demonstrates that data-driven model discovery can move beyond traditional empirical derivation and provide a promising route for developing accurate and explicit constitutive models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a graph-based automated model discovery framework that derives explicit, concise soil water retention and unsaturated hydraulic conductivity functions directly from experimental data. Applied to the original datasets used to develop the Mualem-van Genuchten (MvG) model, the method identifies new functional forms whose structure differs from classical empirical models. The central claim is that these discovered functions yield more accurate predictions of unsaturated hydraulic conductivity than the MvG model across 249 real soil samples spanning diverse textural classes, with fitted parameters showing correlations to soil physical properties.
Significance. If the quantitative superiority and independence of the test set are confirmed, the work would offer a data-driven route to improved constitutive relations for vadose-zone flow modeling, potentially enhancing accuracy in hydrological simulations while preserving explicit mathematical forms suitable for implementation in existing codes. The demonstration of automated discovery on a well-studied dataset also provides a template for similar efforts in other porous-media transport problems.
major comments (3)
- [Abstract and §3] Abstract and §3 (Results): The claim that the discovered functions achieve 'more accurate predictions' on 249 samples is stated without any reported error metrics (RMSE, MAE, or R²), error bars, cross-validation protocol, or direct numerical comparison to MvG under identical fitting conditions. This absence prevents assessment of whether the reported gains exceed what would be expected from added functional flexibility alone.
- [§2 and §4] §2 (Methods) and §4 (Data): The manuscript applies the discovery procedure to the original MvG development datasets yet evaluates on a 249-sample collection; it is not stated whether these 249 samples are fully disjoint from the discovery data or whether any overlap exists. Without an explicit statement of the train/test split and confirmation that the test samples were never seen during graph search or parameter tuning, the risk of circularity cannot be ruled out.
- [§3.2] §3.2 (Model comparison): Both the discovered functions and the MvG model must be refitted to the 249 samples with identical optimization settings, regularization, and effective degrees of freedom before any accuracy comparison is meaningful. The current description does not specify the number of free parameters in the discovered retention and conductivity pair or the regularization strategy used, leaving open the possibility that performance differences arise from differing model complexity rather than structural superiority.
minor comments (3)
- [Figure 2] Figure 2: Axis labels and units for the conductivity curves are missing; add consistent notation (e.g., K(θ) in cm day⁻¹) to allow direct visual comparison with MvG.
- [Table 1] Table 1: The correlation coefficients between discovered parameters and soil texture are reported without p-values or confidence intervals; include these to substantiate the claimed physical interpretability.
- [§2.1] Notation: The symbol θ_r is used both for residual water content in the discovered model and in the MvG reference; a brief clarifying sentence in §2.1 would avoid reader confusion.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. The comments highlight important aspects of clarity and rigor in presenting our results. We address each major comment below and have revised the manuscript to strengthen the quantitative support for our claims while preserving the core contributions of the graph-based discovery framework.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Results): The claim that the discovered functions achieve 'more accurate predictions' on 249 samples is stated without any reported error metrics (RMSE, MAE, or R²), error bars, cross-validation protocol, or direct numerical comparison to MvG under identical fitting conditions. This absence prevents assessment of whether the reported gains exceed what would be expected from added functional flexibility alone.
Authors: We agree that explicit quantitative metrics are necessary to substantiate the accuracy claim. In the revised manuscript, §3 now includes a table with RMSE, MAE, and R² values for both the discovered functions and the MvG model evaluated on the 249 samples. We report mean values with standard-deviation error bars across textural classes and describe the 5-fold cross-validation protocol used during graph search and parameter fitting. These additions enable direct assessment and show that the observed improvements exceed those attributable to functional flexibility alone. revision: yes
-
Referee: [§2 and §4] §2 (Methods) and §4 (Data): The manuscript applies the discovery procedure to the original MvG development datasets yet evaluates on a 249-sample collection; it is not stated whether these 249 samples are fully disjoint from the discovery data or whether any overlap exists. Without an explicit statement of the train/test split and confirmation that the test samples were never seen during graph search or parameter tuning, the risk of circularity cannot be ruled out.
Authors: The 249 samples are drawn from an independent public database (the UNSODA soil hydraulic database) and are fully disjoint from the original MvG development datasets used for model discovery. We have added explicit statements in §2 (Methods) and §4 (Data) describing the train/test split, confirming that none of the 249 test samples participated in the graph search, symbolic regression, or hyperparameter tuning. This clarification removes any ambiguity regarding circularity. revision: yes
-
Referee: [§3.2] §3.2 (Model comparison): Both the discovered functions and the MvG model must be refitted to the 249 samples with identical optimization settings, regularization, and effective degrees of freedom before any accuracy comparison is meaningful. The current description does not specify the number of free parameters in the discovered retention and conductivity pair or the regularization strategy used, leaving open the possibility that performance differences arise from differing model complexity rather than structural superiority.
Authors: We accept this point and have performed the requested refitting. In the revised §3.2, both the discovered functions (5 free parameters for the retention-conductivity pair) and the MvG model (6 parameters) are refitted to the 249 samples using identical optimization settings, L2 regularization with the same strength, and the same convergence criteria. We now explicitly state the parameter counts and regularization strategy. The accuracy advantage of the discovered functions remains after these controls, supporting that the improvement stems from structural differences rather than complexity. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper applies a graph-based automated discovery method to the original MvG development datasets to identify new explicit functional forms for soil water retention and unsaturated hydraulic conductivity. These forms are then evaluated for predictive accuracy on a separate collection of 249 real soil samples spanning diverse textures, where they outperform the fixed MvG structure. No load-bearing step reduces by construction to its own inputs: the discovery process is data-driven rather than self-definitional, the test set is presented as external to the discovery data, and no self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked to force the result. The central claim rests on empirical comparison against an independent benchmark rather than renaming or refitting the same quantities.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters of the discovered retention and conductivity functions
axioms (1)
- domain assumption A graph-based search over functional forms can identify concise, explicit hydraulic functions that are more accurate than expert-derived empirical models.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the method identifies a concise soil water retention function ... tangent family ... arctan transformation ... Eq. (4) ... Eq. (8)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
N. G. Patil and S. K. Singh, Pedotransfer functions for estimating soil hydraulic properties: A review, Pedosphere 26, 417 (2016)
work page 2016
-
[2]
S. Assouline and D. Or, Conceptual and parametric representation of soil hydraulic properties: A review, Vadose Zone Journal 12, vzj2013 (2013)
work page 2013
-
[4]
M. T. Van Genuchten, A closed‐form equation for predicting the hydraulic conductivity of unsaturated soils, Soil Science Society of America Journal 44, 892 (1980)
work page 1980
-
[5]
K. J. Bergen, P. A. Johnson, M. V de Hoop, and G. C. Beroza, Machine learning for data-driven discovery in solid Earth geoscience, Science. 363, eaau0323 (2019)
work page 2019
-
[6]
S. N. Araya and T. A. Ghezzehei, Using machine learning for prediction of saturated hydraulic conductivity and its sensitivity to soil structural perturbations, Water Resour. Res. 55, 5715 (2019)
work page 2019
- [7]
-
[8]
H. Mozaffari, M. Pakjoo, M. A. Nematollahi, S. Forouzan, and A. A. Moosavi, Predicting Soil Hydraulic Conductivity: A Review of Artificial Neural Networks Applications, Artificial Intelligence Applications for a Sustainable Environment 441 (2025)
work page 2025
-
[9]
M. Schmidt and H. Lipson, Distilling free -form natural laws from experimental data, Science. 324, 81 (2009)
work page 2009
-
[10]
N. Makke and S. Chawla, Interpretable scientific discovery with symbolic regression: a review, Artif. Intell. Rev. 57, (2024)
work page 2024
-
[11]
H. Xu, J. Zeng, and D. Zhang, Discovery of Partial Differential Equations from Highly Noisy and Sparse Data with Physics -Informed Information Criterion, Research 6, 1 (2023)
work page 2023
-
[12]
S. M. Udrescu and M. Tegmark, AI Feynman: A physics-inspired method for symbolic regression, Sci. Adv. 6, (2020)
work page 2020
-
[13]
T. N. Mundhenk, C. P. Santiago, M. Landajuela, D. M. Faissol, R. Glatt, and B. K. Petersen, Symbolic Regression via Neural -Guided Genetic Programming Population Seeding, Adv. Neural Inf. Process. Syst. 30, 24912 (2021)
work page 2021
-
[14]
S. L. Brunton, J. L. Proctor, J. N. Kutz, and W. Bialek, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. U. S. A. 113, 3932 (2016)
work page 2016
- [15]
-
[16]
M. Tang, W. Liao, R. Kuske, and S. H. Kang, WeakIdent: Weak formulation for identifying differential equation using narrow-fit and trimming, J. Comput. Phys. 483, 112069 (2023)
work page 2023
-
[17]
N. Burdine, Relative permeability calculations from pore size distribution data, Journal of Petroleum Technology 5, 71 (1953). 24
work page 1953
-
[18]
Y . Mualem, A new model for predicting the hydraulic conductivity of unsaturated porous media, Water Resour. Res. 12, 513 (1976)
work page 1976
-
[19]
W. Song, L. Shi, L. Wang, Y . Wang, and X. Hu, Data-Driven Discovery of Soil Moisture Flow Governing Equation: A Sparse Regression Framework, Water Resour. Res. 58, (2022)
work page 2022
-
[20]
H. Chang and D. Zhang, Machine learning subsurface flow equations from data, Comput. Geosci. 23, 895 (2019)
work page 2019
-
[21]
H. Chang and D. Zhang, Identification of physical processes via combined data-driven and data-assimilation methods, J. Comput. Phys. 393, 337 (2019)
work page 2019
-
[22]
W. Song, S. Jiang, G. Camps -Valls, M. Williams, L. Zhang, M. Reichstein, H. Vereecken, L. He, X. Hu, and L. Shi, Towards data -driven discovery of governing equations in geosciences, Commun. Earth Environ. 5, 589 (2024)
work page 2024
-
[23]
Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
M. Cranmer, Interpretable machine learning for science with PySR and SymbolicRegression. jl, ArXiv Preprint ArXiv:2305.01582 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Y . Chen, Y . Luo, Q. Liu, H. Xu, and D. Zhang, Symbolic genetic algorithm for discovering open -form partial differential equations (SGA -PDE), Phys. Rev. Res. 4, (2022)
work page 2022
-
[25]
A. d Nemes, M. G. Schaap, F. J. Leij, and J. H. M. Wösten, Description of the unsaturated soil hydraulic database UNSODA version 2.0, J. Hydrol. (Amst). 251, 151 (2001)
work page 2001
-
[26]
D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program. 45, 503 (1989)
work page 1989
- [27]
-
[28]
Y . Mualem, A catalogue of the hydraulic properties of unsaturated soils., Technical Report, Israel Institute of Technology 28 (1976)
work page 1976
-
[29]
R. H. Brooks, Hydraulic Properties of Porous Media (Colorado State University, 1965)
work page 1965
-
[30]
G. S. Campbell, A simple method for determining unsaturated conductivity from moisture retention data, Soil Sci. 117, 311 (1974)
work page 1974
-
[31]
Kosugi, Three‐parameter lognormal distribution model for soil water retention, Water Resour
K. Kosugi, Three‐parameter lognormal distribution model for soil water retention, Water Resour. Res. 30, 891 (1994)
work page 1994
-
[32]
D. G. Fredlund and A. Xing, Equations for the soil-water characteristic curve, Canadian Geotechnical Journal 31, 521 (1994)
work page 1994
-
[34]
O. Ippisch, H.-J. V ogel, and P. Bastian, Validity limits for the van Genuchten–Mualem model and implications for parameter estimation and numerical simulation, Adv. Water Resour. 29, 1780 (2006)
work page 2006
-
[35]
D. Russo, Determining soil hydraulic properties by parameter estimation: On the selection of a model for the hydraulic properties, Water Resour. Res. 24, 453 (1988)
work page 1988
-
[36]
W. R. Gardner, Some steady-state solutions of the unsaturated moisture flow equation with application to evaporation from a water table, Soil Sci. 85, 228 (1958)
work page 1958
-
[37]
Zhang, Stochastic Methods for Flow in Porous Media: Coping with Uncertainties 25 (Elsevier, 2011)
D. Zhang, Stochastic Methods for Flow in Porous Media: Coping with Uncertainties 25 (Elsevier, 2011)
work page 2011
-
[38]
L. Luckner, M. T. Van Genuchten , and D. R. Nielsen, A consistent set of parametric models for the two‐phase flow of immiscible fluids in the subsurface, Water Resour. Res. 25, 2187 (1989)
work page 1989
- [39]
-
[40]
M. G. Schaap and M. T. van Genuchten, A Modified Mualem –van Genuchten Formulation for Improved Description of the Hydraulic Conductivity Near Saturation, Vadose Zone Journal 5, 27 (2006)
work page 2006
-
[41]
Durner, Hydraulic conductivity estimation for soils with heterogeneous pore structure, Water Resour
W. Durner, Hydraulic conductivity estimation for soils with heterogeneous pore structure, Water Resour. Res. 30, 211 (1994)
work page 1994
- [42]
-
[43]
T. W. Sturm, Open Channel Hydraulics, V ol. 1 (McGraw-Hill New York, 2001)
work page 2001
-
[44]
A. D. Howard and G. Kerby, Channel changes in badlands, Geol. Soc. Am. Bull. 94, 739 (1983)
work page 1983
-
[45]
D. Angelis, F. Sofos, and T. E. Karakasidis, Artificial intelligence in physical sciences: Symbolic regression trends and perspectives, Archives of Computational Methods in Engineering 30, 3845 (2023)
work page 2023
-
[46]
M. G. Schaap and M. Th. van Genuchten, A Modified Mualem –van Genuchten Formulation for Improved Description of the Hydraulic Conductivity Near Saturation, Vadose Zone Journal 5, 27 (2006)
work page 2006
-
[47]
A. Ghorbani, M. Sadeghi, M. Tuller, W. Durner, and S. B. Jones, A generalized van Genuchten model for unsaturated soil hydraulic conductivity, Vadose Zone Journal (2024)
work page 2024
- [48]
-
[49]
K. Seki, N. Toride, and M. Th. van Genuchten, Closed‐form hydraulic conductivity equations for multimodal unsaturated soil hydraulic properties, Vadose Zone Journal 21, e20168 (2022)
work page 2022
-
[50]
K. Kosugi, General model for unsaturated hydraulic conductivity for soils with lognormal pore‐size distribution, Soil Science Society of America Journal 63, 270 (1999)
work page 1999
-
[51]
K. Seki, N. Toride, and M. T. Van Genuchten, Evaluation of a general model for multimodal unsaturated soil hydraulic properties, Journal of Hydrology and Hydromechanics 71, 22 (2023)
work page 2023
-
[52]
Domingos, The role of Occam’s razor in knowledge discovery, Data Min
P. Domingos, The role of Occam’s razor in knowledge discovery, Data Min. Knowl. Discov. 3, 409 (1999)
work page 1999
-
[53]
M. G. Schaap, F. J. Leij, and M. T. Van Genuchten, Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions, J. Hydrol. (Amst). 251, 163 (2001)
work page 2001
-
[54]
J. H. M. Wösten, A. Lilly, A. Nemes, and C. Le Bas, Development and use of a database 26 of hydraulic properties of European soils, Geoderma 90, 169 (1999)
work page 1999
-
[55]
R. F. Carsel and R. S. Parrish, Developing joint probability distributions of soil water retention characteristics, Water Resour. Res. 24, 755 (1988). 27 Supplementary Materials for Graph-based automated discovery of concise soil hydraulic functions from data: beyond the Mualem–van Genuchten model Hao Xu1,2, Jinshen Sun3,4, Yuntian Chen1,5,*, and Dongxiao...
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.