pith. sign in

arxiv: 2605.17581 · v1 · pith:PV36GWPCnew · submitted 2026-05-17 · ❄️ cond-mat.soft · cs.LG

Topological Data Analysis combined with Machine Learning for Predicting Permeability of Porous Media

Pith reviewed 2026-05-19 22:13 UTC · model grok-4.3

classification ❄️ cond-mat.soft cs.LG
keywords topological data analysismachine learningpermeability predictionporous mediasynthetic structuresfluid flowconnectivity featuresnetwork modeling
0
0 comments X

The pith

Topological data analysis supplies effective features for machine learning models that predict permeability in porous media from structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Porous media flow resists direct analytical or numerical solution because of its geometric complexity, yet synthetic examples are straightforward to generate and supply both structures and exact permeability values for training. The paper extracts geometric structure measures, connectivity descriptions from topology, and simplified network properties from these examples to use as inputs to standard machine learning algorithms. When the models are trained against the separately computed true permeability, they learn to forecast the property from the input features. Topological data analysis measures stand out among the feature sets because they combine readily with the others and produce meaningful prediction results. This route matters because it lets researchers probe which aspects of the porous geometry most strongly govern fluid transport without having to solve the full flow problem for every new structure.

Core claim

Features that describe the geometry of synthetic porous media, their topological connectivity, and their representation as pore networks can be fed as inputs into machine learning algorithms together with exact permeability ground truth; topological data analysis features in particular form a useful set that yields meaningful permeability predictions.

What carries the argument

Topological data analysis features that capture connectivity of the pore space, supplied as inputs to machine learning models trained on exact permeability values.

If this is right

  • Different combinations of structural, topological, and network features can be compared to identify which structural aspects most influence predicted permeability.
  • Topological data analysis features integrate easily into existing machine learning pipelines for this prediction task.
  • The trained models can estimate permeability for new porous structures without repeating full flow simulations.
  • Results clarify the relative utility of geometric versus connectivity measures for flow properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feature pipeline could be applied to real experimental samples to test whether predictions track physical measurements outside the synthetic training distribution.
  • Inverting the trained model might allow design of pore geometries that achieve a target permeability without exhaustive trial simulations.
  • The approach could extend to forecasting other transport or mechanical properties in complex media once suitable ground-truth data become available.

Load-bearing premise

Features extracted from synthetic porous media are sufficient for a machine learning model to learn the physical relationships that determine permeability rather than fitting only to patterns in the training examples.

What would settle it

Train the model on one collection of synthetic porous media, then test its permeability predictions against direct measurements made on laboratory samples that share comparable structural statistics.

Figures

Figures reproduced from arXiv: 2605.17581 by Aakash Karlekar, Catherin Neena Lalu, Ebru Dagdelen, Jonathan Jaquette, Linda Cummings, Lou Kondic, Manav Arora, Matthew Illingworth.

Figure 1
Figure 1. Figure 1: FIG. 1: (a) 3D visualization of a typical synthetic porous medium dataset of 50 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: (a) A void-rich 20 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Persistence diagrams (PDs) resulting from [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5: (a) The 2D sample of Fig. 2(a); and (b) the corresponding Euclidean Distance Transform, where each void [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6: Persistence diagrams (PDs) resulting from EDT (a) the dimension-0, and (b) the dimension-1. The [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7: (a) The 2D sample of Fig. 2(a); and (b) the corresponding network obtained using SNOW2 algorithm in [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8: (a) 3D visualization of a typical synthetic porous medium dataset of 50 [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9: Predicted permeability values log [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10: Model Test Error (%) vs. Feature Extraction Time (s). Note that ‘persistence alpha’ and ‘all persistence [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
read the original abstract

Flow in porous media is difficult to address using standard analytical or numerical methods due to its complexity. However, since synthetic representations of porous media are easy to produce and data from physical experiments are becoming more widely available, the problem is well-suited to studies that include machine learning (ML) techniques. We discuss a number of features that can be extracted from such data, and their utility as input variables into a standard ML algorithm. These features include structural measures describing the geometry of the porous media, topological measures describing the connectivity, and network measures obtained by modeling the porous media as simplified pore networks. These features enable the prediction of the permeability of the considered (synthetic) porous materials using ML techniques that also leverage the separately computed exact permeability (ground truth). Comparing results obtained using different input variables helps develop a better understanding of the utility of various measures for predicting permeability based on the porous media structure. We show, in particular, that topological data analysis (TDA) provides a useful set of features that can be easily combined with ML to yield meaningful results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes using topological data analysis (TDA) in combination with machine learning to predict permeability in synthetic porous media. Features are extracted from the media including structural geometry measures, topological descriptors (via TDA such as persistence diagrams and Betti numbers), and network measures from pore-network models; these are fed into standard supervised regressors trained against separately computed exact permeability values as ground truth. The central claim is that TDA features are particularly useful and yield meaningful predictive results when combined with ML.

Significance. If validated, the work could offer a practical data-driven route to permeability estimation for complex porous structures where direct simulation is costly. Credit is due for the reproducible synthetic data generation pipeline and the explicit use of exact permeability as supervised target, which avoids circularity in the prediction task itself. However, the significance is limited by the absence of evidence that TDA features improve generalization or reflect flow physics beyond in-sample correlations on the synthetic ensemble.

major comments (2)
  1. [§4.3] §4.3 (Results): the reported cross-validated R² and MAE values for models including TDA features are presented without an ablation that removes the TDA descriptors while retaining structural and network features; this omission prevents assessment of whether TDA adds incremental predictive power or merely correlates with the synthetic generation process.
  2. [§5] §5 (Discussion): the claim that the approach 'captures the underlying physics' is not supported by any comparison against direct numerical simulation of Stokes flow on the identical geometries or by testing on experimental porous samples; all metrics remain within the synthetic distribution used for training.
minor comments (2)
  1. [Figure 2] Figure 2: axis labels and color scales for the persistence diagrams are not defined in the caption, making it difficult to interpret the topological features being extracted.
  2. [§3.2] §3.2: the vectorization procedure that converts persistence diagrams into fixed-length feature vectors for the ML input should be stated explicitly (e.g., which summary statistics or kernel are used).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We provide point-by-point responses to the major comments below, indicating the revisions made to address them.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (Results): the reported cross-validated R² and MAE values for models including TDA features are presented without an ablation that removes the TDA descriptors while retaining structural and network features; this omission prevents assessment of whether TDA adds incremental predictive power or merely correlates with the synthetic generation process.

    Authors: We agree with the referee that an ablation study is required to properly assess the contribution of the TDA features. Accordingly, we have performed and included an ablation analysis in the revised Section 4.3. We report the cross-validated R² and MAE for models using structural and network features with and without the TDA descriptors. The results demonstrate that TDA features do provide incremental predictive power on top of the other features. revision: yes

  2. Referee: [§5] §5 (Discussion): the claim that the approach 'captures the underlying physics' is not supported by any comparison against direct numerical simulation of Stokes flow on the identical geometries or by testing on experimental porous samples; all metrics remain within the synthetic distribution used for training.

    Authors: We thank the referee for this important point. The ground truth permeability is computed using direct numerical simulation, but we accept that the original discussion may have overstated the physical interpretation. In the revised manuscript, we have modified the discussion in Section 5 to avoid claiming that the approach 'captures the underlying physics'. Instead, we note that the topological features are selected for their relevance to pore space connectivity, which is physically linked to permeability. We have also added a statement that validation on experimental samples is left for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: supervised ML uses independently computed ground-truth permeability

full rationale

The paper extracts geometric, topological (including TDA), and network features from synthetic porous media and trains standard regressors to predict permeability, with the target obtained via separate exact computation serving as ground truth. This is a conventional supervised learning pipeline with no self-definitional reduction, no fitted parameter renamed as a prediction, and no load-bearing self-citation chain. The claim that TDA features are useful rests on comparative performance across feature sets rather than any equation that equates the output to the inputs by construction. The setup remains falsifiable against external exact solvers and held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the work relies on standard assumptions of ML supervised learning and topological data analysis applied to image or voxel data of porous media.

axioms (1)
  • domain assumption Synthetic porous media generated by the authors' procedure are representative enough for ML training to capture permeability trends.
    Implicit in the use of synthetic data as training set with exact permeability as ground truth.

pith-pipeline@v0.9.0 · 5745 in / 1264 out tokens · 38764 ms · 2026-05-19T22:13:12.186692+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages

  1. [1]

    and Richardson, G

    Sanaei, P. and Richardson, G. W. and Witelski, T. and Cummings, L. J. , title =. J. Fluid Mech. , volume =. 2016 , doi =

  2. [2]

    Edelsbrunner, H. and M. Three-dimensional alpha shapes , journal =. 1994 , doi =

  3. [3]

    and others , title =

    Gostick, J. and others , title =. J. Open Source Softw. , volume =. 2019 , doi =

  4. [4]

    and others , title =

    Gostick, J. and others , title =. Comput. Sci. Eng. , volume =. 2016 , doi =

  5. [5]

    and others , title =

    Araya-Polo, M. and others , title =. ECMOR XVI , year =

  6. [6]

    and Matyka, M

    Graczyk, K. and Matyka, M. , title =. Sci. Rep. , volume =. 2020 , doi =

  7. [7]

    and others , title =

    Paszke, A. and others , title =. Advances in Neural Information Processing Systems , year =

  8. [8]

    Ferguson, J. C. and others , title =. SoftwareX , volume =. 2021 , doi =

  9. [9]

    GUDHI User and Reference Manual , year =

  10. [10]

    and others , title =

    Suzuki, A. and others , title =. Sci. Rep. , volume =. 2021 , doi =

  11. [11]

    , title =

    Obayashi, I. , title =. 2025 , howpublished =

  12. [12]

    and others , title =

    Zhang, J. and others , title =. Water Resour. Res. , volume =. 2024 , doi =

  13. [13]

    Predicting permeability via statistical learning on higher-order microstructural information , journal =

    R. Predicting permeability via statistical learning on higher-order microstructural information , journal =. 2020 , doi =

  14. [14]

    Vittadello, S. T. and Stumpf, M. P. H. , title =. R. Soc. Open Sci. , volume =. 2021 , doi =

  15. [15]

    and Cummings, L

    Sanaei, P. and Cummings, L. J. , title =. J. Fluid Mech. , volume =. 2017 , doi =

  16. [16]

    and Kondic, L

    Gu, B. and Kondic, L. and Cummings, L. J. , title =. Phys. Rev. Fluids , volume =. 2023 , doi =

  17. [17]

    and Cummings, L

    Sanaei, P. and Cummings, L. J. , title =. Phys. Rev. Fluids , volume =. 2018 , doi =

  18. [18]

    and Cummings, L

    Sanaei, P. and Cummings, L. J. , title =. Phys. Rev. Fluids , volume =. 2019 , doi =

  19. [19]

    and others , title =

    Sun, Y. and others , title =. Phys. Rev. Fluids , volume =. 2020 , doi =

  20. [20]

    and Kondic, L

    Gu, B. and Kondic, L. and Cummings, L. J. , title =. J. Membr. Sci. , volume =. 2022 , doi =

  21. [21]

    Cummings, L. J. and Gu, B. and Kondic, L. , title =. Annu. Rev. Fluid Mech. , volume =. 2026 , doi =

  22. [22]

    and Leskovec, J

    Grover, A. and Leskovec, J. , title =. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , doi =

  23. [23]

    and others , title =

    Otter, N. and others , title =. EPJ Data Science , volume =. 2017 , doi =

  24. [24]

    and others , title =

    Abadi, M. and others , title =. 2015 , howpublished =

  25. [25]

    Kingma, D. P. and Ba, J. , title =. International Conference on Learning Representations , year =

  26. [26]

    Agarap, A. F. , title =. 2018 , archivePrefix =

  27. [27]

    and others , title =

    Ali, D. and others , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2023 , doi =

  28. [28]

    Influence of topology on performance of pore networks in membrane filters , author =. Phys. Rev. E , volume =. 2026 , month =. doi:10.1103/s8n8-vxzx , url =

  29. [29]

    Japan Journal of Industrial and Applied Mathematics , volume=

    A topological measurement of protein compressibility , author=. Japan Journal of Industrial and Applied Mathematics , volume=. 2015 , publisher=

  30. [30]

    and Asaad, A.T

    Tanaka, A.M. and Asaad, A.T. and Cooper, R. and Nanda, V. , journal=. 2025 , publisher=

  31. [31]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Skeletonization and partitioning of digital images using discrete morse theory , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2014 , publisher=

  32. [32]

    and Wood, P

    Robins, V. and Wood, P. J. and Sheppard, A. P. , journal =. Theory and. 2011 , doi =

  33. [33]

    and Sok, R

    Sheppard, A. and Sok, R. M. and Averdunk, H. and Robins, V. B. and Ghous, A. , booktitle =. Analysis of rock microstructure using high-resolution

  34. [34]

    Physical Review E , volume=

    Topological similarity of random cell complexes and applications , author=. Physical Review E , volume=. 2016 , publisher=

  35. [35]

    EPJ Data Science , volume=

    Temporal network analysis using zigzag persistence , author=. EPJ Data Science , volume=. 2023 , publisher=

  36. [36]

    ArXiv preprint , volume=

    The Kinetic Hourglass Data Structure for Computing the Bottleneck Distance of Dynamic Data , author=. ArXiv preprint , volume=. 2025 , publisher=

  37. [37]

    Annual Review of Fluid Mechanics , volume=

    Multiphase flow in porous media , author=. Annual Review of Fluid Mechanics , volume=. 1988 , publisher=

  38. [38]

    Scientific Data , volume=

    A dataset of 3D structural and simulated transport properties of complex porous media , author=. Scientific Data , volume=. 2022 , publisher=

  39. [39]

    Ecological modelling , volume=

    Review and comparison of methods to study the contribution of variables in artificial neural network models , author=. Ecological modelling , volume=. 2003 , publisher=

  40. [40]

    arXiv e-prints , pages=

    Efficient topological layer based on persistent landscapes , author=. arXiv e-prints , pages=

  41. [41]

    and Hiraoka, Y

    Kusano, G. and Hiraoka, Y. and Fukumizu, K. , booktitle=

  42. [42]

    and Yamada, M

    Le, T. and Yamada, M. , booktitle=. Persistence Fisher kernel: A

  43. [43]

    Tropical coordinates on the space of persistence barcodes , author=. Found. Comp. Math. , volume=. 2019 , publisher=

  44. [44]

    Statistical topological data analysis using persistence landscapes , author=. The J. Machine Learning Res. , volume=. 2015 , publisher=

  45. [45]

    Proceedings of the 22nd Annual Symposium on Computational Geometry , pages=

    Vines and vineyards by updating persistence in linear time , author=. Proceedings of the 22nd Annual Symposium on Computational Geometry , pages=

  46. [46]

    G. W. Baxter and R. P. Behringer and T. Fagert and G. A. Johnson , title =. Phys. Rev. Lett. , year =

  47. [47]

    Goldman and Harry L

    Daniel I. Goldman and Harry L. Swinney , title =. Phys. Rev. Lett. , year =

  48. [48]

    To and P-Y

    K. To and P-Y. Lai and H. K. Pak , title =. Phys. Rev. Lett. , year =

  49. [49]

    Zuriguel and L

    I. Zuriguel and L. A. Pugnaloni and A. Garcimartin and D. Maza , title =. Phys. Rev. E , year =

  50. [50]

    Zuriguel and A

    I. Zuriguel and A. Garcimartin and D. Maza and L. A. Pugnaloni and J. M. Pastor , title =. Phys. Rev. E , year =

  51. [51]

    Al-Din and D

    N. Al-Din and D. J. Gunn , title =. Chem. Eng. Science , year =

  52. [52]

    Askegaard and J

    V. Askegaard and J. Munch-Andersen , title =. Powder Tech. , year =

  53. [53]

    P. L. Bransby and P. M. Blair-Fish , title =. Powder Tech. , year =

  54. [54]

    Drescher and T

    A. Drescher and T. W. Cousins and P. L. Bransby , title =. G\'. 1978 , volume =

  55. [55]

    Jenike , title =

    A. Jenike , title =. Bulletin of the University of Utah, Experiment Station No. 123 , year =

  56. [56]

    R. L. Michalowski , title =. Powder Tech. , year =

  57. [57]

    L. E. Silbert and D. Ertaz and G. S. Grest and T. C. Halsey and D. Levine , title =. Phys. Rev. E , year =

  58. [58]

    Tuzun and R

    U. Tuzun and R. M. Nedderman , title =. Powder Tech. , year =

  59. [59]

    D. M. Mueth and G. F. Debregeas and G. S. Karczmar and P. J. Eng and S. R. Nagel and H. M. Jaeger , title =. Nature , year =

  60. [60]

    Fenistein and M

    D. Fenistein and M. van Hecke , title =. Nature , year =

  61. [61]

    R. P. Behringer and E. van Doorn and R. R. Hartley and H. K. Pak , title =. Granular Matter , year =

  62. [62]

    Abu-Zaid and G

    S. Abu-Zaid and G. Ahmadi , title =. Powder Technol. , year =

  63. [63]

    Acharyya , title =

    M. Acharyya , title =. J. Phys. I France , year =

  64. [64]

    Aguirre and I

    A. Aguirre and I. Ippolito and A. Calvo and C. Henrique and D. Bideau , title =. , year =

  65. [65]

    Ahmad and I

    K. Ahmad and I. J. Smalley , title =. Powder Technol. , year =

  66. [66]

    Ahn and C

    H. Ahn and C. E. Brennen and R. H. Sabersky , title =. J. Appl. Mech. , year =

  67. [67]

    Aidanp\"a\"a and H.H

    J.O. Aidanp\"a\"a and H.H. Shen and R.B. Gupta and M. Babi\'c , title =. Mechanics of Materials , year =

  68. [68]

    NEPTIS-1 , year =

    Tatsuhiko Aizawa , title =. NEPTIS-1 , year =

  69. [69]

    Akiyama , title =

    T. Akiyama , title =. Int. J. of Mod. Phys. B , volume =. 1993 , exist =

  70. [70]

    Albert and I

    R. Albert and I. Albert and D. Hornbaker and P. Schiffer and A.-L. Barab\'asi , title =. Phys. Rev. E , year =

  71. [71]

    Albert and M

    R. Albert and M. A. Pfeifer and A.-L. Barab\'asi , title =. 1998 , exist =

  72. [72]

    Albert and M

    R. Albert and M. A. Pfeifer and P. Schiffer and A.-L. Barab\'asi , title =. 1998 , exist =

  73. [73]

    B. J. Alder and S. P. Frankel and V. A. Lewinson , title =. J. Chem. Phys. , year =

  74. [74]

    B. J. Alder and T. E. Wainwright , title =. J. Chem. Phys. , year =

  75. [75]

    B. J. Alder and T. E. Wainwright , title =. Phys. Rev. , year =

  76. [76]

    B. J. Alder , title =. Phys. Rev. Lett. , year =

  77. [77]

    B. J. Alder , title =. J. Chem. Phys. , year =

  78. [78]

    Alder and T

    B.-J. Alder and T. E. Wainwright , title =. Phys. Rev. Lett. , volume =. 1967 , exist =

  79. [79]

    B. J. Alder and T. E. Wainwright , title =. Phys. Rev. A , volume =. 1970 , exist =

  80. [80]

    F. J. Alexander and J. L. Lebowitz , title =. J. Phys. A: Math.Gen. , year =

Showing first 80 references.