pith. sign in

arxiv: 2606.09094 · v1 · pith:7ZK72KYOnew · submitted 2026-06-08 · ⚛️ physics.geo-ph

CIG-Bench: A Comprehensive Survey and Benchmark for AI-Driven Subsurface Imaging Understanding

Pith reviewed 2026-06-27 14:21 UTC · model grok-4.3

classification ⚛️ physics.geo-ph
keywords subsurface imagingseismic interpretationdeep learning benchmarkfault segmentationgeobody identificationproperty estimationAI in geophysicsrelative geologic time
0
0 comments X

The pith

A proposed community benchmark unifies evaluation for AI models on four subsurface imaging tasks using mixed synthetic and real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews 652 publications from 2015 to 2025 on deep learning for subsurface imaging and groups them into structural interpretation, geobody identification, seismic facies analysis, and property estimation. It argues that these tasks face distinct difficulties including ambiguous signals and scarce annotations, which produce three main challenges: handling complex geology, generalizing across surveys, and the lack of shared benchmarks. The central response is CIG-Bench, which supplies datasets, pretrained models, and protocols for fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling. A reader would care because the absence of such standards has made it hard to compare methods or reproduce results in applications like resource exploration and hazard assessment.

Core claim

The survey organizes the literature around four tasks and concludes that the missing unified benchmark has been the most consequential barrier; CIG-Bench therefore supplies synthetic volumes for quantitative scoring and real surveys for qualitative checks, together with standardized protocols and initial pretrained models, across fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling.

What carries the argument

CIG-Bench, the benchmark that supplies unified protocols, pretrained models, and datasets mixing synthetic data for metrics with real surveys for realism across the four tasks.

If this is right

  • Researchers can compare methods on the same datasets and metrics instead of each creating private test sets.
  • Pretrained models released with the benchmark become starting points that reduce the need for large new annotations.
  • Uncertainty outputs can be required as part of standard model evaluation rather than added later.
  • Physical and geological constraints can be tested inside a common evaluation framework rather than in isolated studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The benchmark may push the field toward models that treat uncertainty as a required output rather than an optional extra.
  • Shared real-survey portions could make cross-survey generalization a measurable target instead of an informal claim.
  • Similar benchmark designs in other domains with sparse labels, such as medical imaging, succeeded by the same mix of quantitative and qualitative checks.

Load-bearing premise

That combining synthetic data for numbers with real surveys for realism, plus shared protocols, will produce meaningful progress on interpretation under complex conditions and cross-survey generalization.

What would settle it

Release of the benchmark followed by independent tests showing that models ranked highest on its synthetic metrics perform no better than prior methods on entirely new real seismic surveys or that the four tasks remain incomparable despite the shared protocols.

Figures

Figures reproduced from arXiv: 2606.09094 by Haibin Di, Hui Gao, Long Han, Mingliang Liu, Min Jun Park, Robert G. Clapp, Sergey Fomel, Tao Zhao, Xinming Wu, Yimin Dou, Zhixiang Guo, Zhi Zhong.

Figure 1
Figure 1. Figure 1: A structured overview of AI for geophysics-based subsurface imaging understanding. The framework identifies two categories of core challenges: inherent imaging limitations (blurriness, ambiguity, and nonlinear mapping) and data characteristics (massive scale, heterogeneity, and multi-scale features). Against these challenges, AI-driven solutions including deep learning-based perception, probabilistic model… view at source ↗
Figure 2
Figure 2. Figure 2: Adapted from Dou et al. (2025). (a, b) A comparison of task difficulty between orange detection in natural images and subsurface interpretation. Natural images offer rich visual cues, whereas subsurface targets must be inferred from weak and ambiguous seismic responses. (c) A comparison of natural, medical, and subsurface imaging. In contrast to natural and medical images, subsurface imaging has sparse sem… view at source ↗
Figure 3
Figure 3. Figure 3: Citation-network view of the curated corpus. From the 652 papers compiled in this work, panel (a) shows the paper-level citation network with cross-category linkages and a directional citation flow legend that aggregates inter-cluster citations under the manual category labels assigned during curation. Panels (b–f) summarise within-corpus directed citations between categories: (b,c) in-degree and out-degre… view at source ↗
Figure 4
Figure 4. Figure 4: Machine learning in subsurface imaging interpretation (2015–2025). (a) Stacked area chart of 478 ML papers across four targets (Structure, Geobody, Facies, Property), showing approximately tenfold growth from 2015 to a peak in 2023, with Property and Structure as leading categories. (b) Sankey diagram linking interpretation targets to tasks, methods, and publication years, revealing a decisive shift from t… view at source ↗
Figure 5
Figure 5. Figure 5: Representative samples from the CIG-Bench synthetic dataset. Rows (a) and (b) show seismic samples with structural annotations, including raw seismic amplitude volumes, fault labels with diverse fault styles, relative geologic time volumes, and stratigraphic surfaces describing the continuous depositional framework. Rows (c) and (d) show seismic samples with geobody annotations, including channel systems a… view at source ↗
Figure 6
Figure 6. Figure 6: The figure summarizes the technological evolution and paradigm shifts in structural interpretation over the past decade, driven by advances in machine learning and deep learning. Using stage-specific keywords and representative milestones, it delineates the research focus of each phase and its central transitions. move toward domain foundation models develops into a lasting paradigm shift, rather than a tr… view at source ↗
Figure 7
Figure 7. Figure 7: Results of the CIG-Bench-fault model with skip connections. The above raw seismic data are sourced from Nlog and the USGS. 4.2.6 Summary and outlook From an overall perspective, the development of seismic structural interpretation has largely followed a trajectory similar to that of computer vision, evolving from workflows based on manually designed seismic attributes and other handcrafted features toward … view at source ↗
Figure 8
Figure 8. Figure 8: Results of the CIG-Bench-RGT model. We used the training strategy of RGT-Est (Dou et al., 2026). The above raw seismic data are sourced from Nlog and the USGS. The left column shows the seismic data, the middle column shows the RGT volume predicted by the model, and the right column displays the horizons extracted from the RGT volume and overlaid on the seismic data. 4 rgt_vol, used = rgt_predictor.predict… view at source ↗
Figure 9
Figure 9. Figure 9: The figure summarizes the technological evolution and paradigm shifts in geobody interpretation over the past decade, driven by advances in machine learning and deep learning. Using stage-specific keywords and representative milestones, it delineates the research focus of each phase and its central transitions [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: presents the geobody segmentation results of the CIG-Bench-Channel and CIG-Bench-Karst models on three field seismic datasets. Panels (a), (b), and (c) correspond to the New Zealand Parihaka, New Zealand Romney, and USGS G3D datasets, respectively. In each row, the left column shows the original seismic volume, the middle column shows the predicted segmentation results overlaid on seismic data, and the ri… view at source ↗
Figure 11
Figure 11. Figure 11: The figure summarizes the technological evolution and paradigm shifts in facies interpretation over the past decade, driven by advances in machine learning and deep learning. Using stage-specific keywords and representative milestones, it delineates the research focus of each phase and its central transitions [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The figure summarizes the technological evolution and paradigm shifts in property interpretation over the past decade, driven by advances in machine learning and deep learning. Using stage-specific keywords and representative milestones, it delineates the research focus of each phase and its central transitions. A central emphasis was the integration of rock-physics models to link elastic and petrophysica… view at source ↗
Figure 13
Figure 13. Figure 13: Results of the CIG-Bench-Property model. The model adopts a promptable conditional training strategy (Dou et al., 2025), with seismic data and well logs used as inputs. (a) Netherlands F3 seismic data. (b) Acoustic impedance property modeling result. (c) Gamma-ray property modeling result. (d) Lithology property modeling result. (e) Sonic property modeling result. (f) Vp property modeling result. The prop… view at source ↗
read the original abstract

Subsurface imaging understanding bridges observed geophysical data and quantitative geological models, supporting hydrocarbon exploration, CO2 storage site assessment, and geohazard monitoring. Over the past decade, deep learning has substantially reshaped interpretation workflows. To take stock of this progress, we systematically analyze 652 publications from 2015 to 2025 and organize the field into four major tasks: structural interpretation, geobody identification, seismic facies analysis, and property estimation. Yet subsurface imaging interpretation differs fundamentally from other AI-driven tasks, facing ambiguous signals, pronounced interpretive non-uniqueness, sparse semantics, unfixed target locations, and scarce reliable annotations. Building on the reviewed literature, we summarize three interrelated challenges that define its frontier: interpretation under complex geological conditions, cross-survey semantic generalization under low information density, and the absence of reliable benchmarks. Addressing them will hinge on integrating human expertise, physical constraints, and geological priors into model training or inference, and on treating uncertainty quantification as an intrinsic model output. Among these, the lack of unified benchmarks has been particularly consequential, making fair comparison difficult, hindering reproducibility, and fragmenting community efforts. We therefore propose a community benchmark spanning fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling. It integrates unified evaluation protocols, pretrained models, and datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment. By coupling a decade-spanning review with an evolving benchmark, this work offers a timely reference and a reproducible foundation to accelerate future research and deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript surveys 652 publications (2015–2025) on deep learning for subsurface imaging interpretation, organizing them into four tasks (structural interpretation, geobody identification, seismic facies analysis, property estimation). It identifies three challenges—interpretation under complex conditions, cross-survey semantic generalization under low information density, and absence of reliable benchmarks—and proposes CIG-Bench, a community benchmark spanning fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling. The benchmark combines synthetic data for quantitative metrics with real surveys for qualitative assessment, together with unified protocols and pretrained models.

Significance. If the survey methodology proves reproducible and the benchmark protocols demonstrably improve cross-survey generalization and uncertainty handling, the work would consolidate a fragmented literature and supply a practical, evolving reference that could accelerate standardized evaluation and deployment in geophysical AI. The explicit call for community participation and integration of physical priors is a constructive strength.

major comments (3)
  1. [Abstract / Survey methodology] Abstract and survey section: the claim of having 'systematically analyze[d] 652 publications' provides no search strategy, database(s), inclusion/exclusion criteria, or quality-assessment protocol. Without these, the completeness, representativeness, and potential selection bias of the review cannot be evaluated, directly undermining the three challenges derived from it.
  2. [Benchmark proposal] Benchmark proposal (final paragraph): the description states that the benchmark 'integrates unified evaluation protocols' and 'datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment,' yet supplies no concrete definition of those protocols, no specification of how synthetic data will reproduce relevant distribution shifts or non-uniqueness regimes, and no reproducible scoring rubric for the qualitative real-data component. These omissions make the central claim that the benchmark will address cross-survey generalization untestable in its current form.
  3. [Abstract] Abstract: the assertion that 'the lack of unified benchmarks has been particularly consequential' is presented as a conclusion from the 652-paper review, but the text contains no quantitative supporting evidence (e.g., fraction of papers using incompatible datasets or metrics, or citation patterns showing fragmentation). This weakens the justification for prioritizing benchmark development over the other two challenges.
minor comments (2)
  1. [Task organization] The four task categories are introduced without explicit decision rules or overlap statistics, which could be clarified with a small table or flowchart.
  2. [Related work] The manuscript would benefit from citing at least two existing geophysical benchmarks (e.g., in seismic inversion or facies classification) to situate CIG-Bench relative to prior efforts.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will implement.

read point-by-point responses
  1. Referee: [Abstract / Survey methodology] Abstract and survey section: the claim of having 'systematically analyze[d] 652 publications' provides no search strategy, database(s), inclusion/exclusion criteria, or quality-assessment protocol. Without these, the completeness, representativeness, and potential selection bias of the review cannot be evaluated, directly undermining the three challenges derived from it.

    Authors: We agree that the absence of explicit survey methodology details is a significant omission that prevents readers from assessing the review's rigor and potential biases. In the revised manuscript we will add a dedicated 'Survey Methodology' subsection that specifies the search strategy (keywords, Boolean queries, and time filters), databases used (Google Scholar, Web of Science, arXiv, and major geophysics journals), inclusion/exclusion criteria (peer-reviewed or preprint works on deep learning for subsurface imaging 2015–2025, English language, focus on interpretation tasks), and any quality-assessment steps applied. We will also note limitations and how the three challenges were synthesized from the corpus. revision: yes

  2. Referee: [Benchmark proposal] Benchmark proposal (final paragraph): the description states that the benchmark 'integrates unified evaluation protocols' and 'datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment,' yet supplies no concrete definition of those protocols, no specification of how synthetic data will reproduce relevant distribution shifts or non-uniqueness regimes, and no reproducible scoring rubric for the qualitative real-data component. These omissions make the central claim that the benchmark will address cross-survey generalization untestable in its current form.

    Authors: The referee correctly identifies that the benchmark description remains too high-level to be immediately actionable or verifiable. We will expand the CIG-Bench section with: (i) precise evaluation protocols and metrics for each task (e.g., IoU, Dice, and boundary F1 for segmentation; MAE and correlation for property modeling, with reference implementations), (ii) explicit synthetic-data generation procedures designed to induce distribution shifts and non-uniqueness (varying noise, acquisition parameters, and geological complexity), and (iii) a reproducible qualitative rubric for real-data assessment (standardized expert scoring criteria with inter-rater reliability measures). These additions will make the benchmark's contribution to cross-survey generalization testable. revision: yes

  3. Referee: [Abstract] Abstract: the assertion that 'the lack of unified benchmarks has been particularly consequential' is presented as a conclusion from the 652-paper review, but the text contains no quantitative supporting evidence (e.g., fraction of papers using incompatible datasets or metrics, or citation patterns showing fragmentation). This weakens the justification for prioritizing benchmark development over the other two challenges.

    Authors: We accept that the abstract statement would be stronger with quantitative backing. Our review process identified fragmentation qualitatively through the diversity of datasets and metrics, but we did not tabulate exact fractions. In revision we will either extract and report such statistics from the surveyed corpus (e.g., percentage of papers using non-public or mutually incompatible datasets) or rephrase the claim to reflect a qualitative synthesis while still highlighting benchmark development as a high-priority need. This will clarify the relative emphasis among the three challenges. revision: partial

Circularity Check

0 steps flagged

No circularity: survey and benchmark proposal is self-contained

full rationale

The paper is a descriptive literature review of 652 publications plus a new benchmark proposal. It contains no mathematical derivations, equations, fitted parameters, or predictions that reduce to quantities defined by the authors' own choices. The central claims (lack of unified benchmarks as a barrier; proposal of tasks like fault segmentation and RGT estimation) are presented as observations and suggestions rather than derived results. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support load-bearing steps. This matches the default expectation for non-circular survey/benchmark papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the 652-paper corpus is representative of the field and that the proposed benchmark structure will resolve the stated challenges; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Subsurface imaging interpretation can be partitioned into the four tasks of structural interpretation, geobody identification, seismic facies analysis, and property estimation.
    Explicitly stated in the abstract as the organizing framework for the reviewed literature.

pith-pipeline@v0.9.1-grok · 5840 in / 1231 out tokens · 24196 ms · 2026-06-27T14:21:54.537252+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Karianne J

    doi: 10.1029/2021rg000742. Karianne J. Bergen, et al. Machine learning for data-driven discovery in solid earth geoscience.Science, 363(6433),

  2. [2]

    Machine learning for data-driven discovery in solid earth geoscience

    doi: 10.1126/science.aau0323. Dario Grana. Bayesian linearized rock-physics inversion.Geophysics, 81(6):D625–D641, 2016. doi: 10.1190/ geo2016-0161.1. Ziheng Sun, et al. A review of earth artificial intelligence.Computers & Geosciences, 159:105034, 2022. doi: 10.1016/j.cageo.2022.105034. Zhen Wang, et al. Successful leveraging of image processing and mach...

  3. [3]

    Yazeed Alaudah, et al

    doi: 10.1190/geo2019-0252.1. Yazeed Alaudah, et al. A machine-learning benchmark for facies classification.Interpretation, 7(3):SE175–SE187,

  4. [4]

    Feng Qian, et al

    doi: 10.1190/int-2018-0249.1. Feng Qian, et al. Unsupervised seismic facies analysis via deep convolutional autoencoders.Geophysics, 83(3): A39–A43, 2018. doi: 10.1190/geo2017-0524.1. Fangshu Yang and Jianwei Ma. Deep-learning inversion: A next-generation seismic velocity model building method. Geophysics, 84(4):R583–R599, 2019. doi: 10.1190/geo2018-0249....

  5. [5]

    doi: 10.1190/geo2018-0838.1. 33/45 S. Mostafa Mousavi, et al. Applications of deep neural networks in exploration seismology: A technical survey. Geophysics, 89(1):WA95–WA115, 2024. doi: 10.1190/geo2023-0063.1. Reetam Biswas, et al. Prestack and poststack inversion using a physics-guided convolutional neural network. Interpretation, 7(3):SE161–SE174, 2019...

  6. [6]

    Xinming Wu, et al

    doi: 10.1073/pnas.2219573120. Xinming Wu, et al. Faultnet3d: Predicting fault probabilities, strikes, and dips with a single convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 57(11):9138–9155, 2019c. doi: 10.1109/tgrs.2019. 2925003. Yu An, et al. Current state and future directions for deep learning based automatic seismic f...

  7. [7]

    Muhammad Saif Ul Islam and Aamir Wali

    doi: 10.1016/j.earscirev.2021.103876. Muhammad Saif Ul Islam and Aamir Wali. A comprehensive review of deep learning techniques for salt dome segmentation in seismic images.Journal of Applied Geophysics, 230:105504, 2024. doi: 10.1016/j.jappgeo.2024. 105504. JingYi Wang, et al. Deep artificial neural network in seismic inversion.Progress in Geophysics, 38...

  8. [8]

    Gabriel Godefroy, et al

    doi: 10.1190/int-2015-0009.1. Gabriel Godefroy, et al. Seismic interpretation of fault-related deformation using a numerical kinematic model. In SEG Technical Program Expanded Abstracts 2016, pages 1981–1986. Society of Exploration Geophysicists, 2016. doi: 10.1190/segam2016-13880830.1. Ke Wang, et al. 3d seismic horizon extraction with horizon patch cons...

  9. [9]

    Xinming Wu

    doi: 10.1190/tle36030208.1. Xinming Wu. Directional structure-tensor-based coherence to detect seismic faults and channels.Geophysics, 82(2): A 13–A 17, 2017. doi: 10.1190/geo2016-0473.1. Haibin Di and Dengliang Gao. 3d seismic flexure analysis for subsurface fault detection and fracture characterization. Pure and Applied Geophysics, 174(3):747–761, 2017a...

  10. [10]

    Yuji Kim, et al

    doi: 10.1016/j.cageo.2016.12.014. Yuji Kim, et al. Seismic-facies classification using random forest algorithm. InSEG Technical Program Expanded Abstracts 2018. Society of Exploration Geophysicists, 2018. doi: 10.1190/segam2018-2998553.1. Wei Li, et al. Fusing multiple frequency-decomposed seismic attributes with machine learning for thickness prediction ...

  11. [11]

    David Lubo-Robles, et al

    doi: 10.1190/geo2019-0223.1. David Lubo-Robles, et al. Machine learning model interpretability using shap values: Application to a seismic facies classification task. InSEG Technical Program Expanded Abstracts 2020, pages 1460–1464. Society of Exploration Geophysicists, 2020. doi: 10.1190/segam2020-3428275.1. Tao Zhao. Seismic facies classification using ...

  12. [12]

    Runhai Feng, et al

    doi: 10.1109/tgrs.2020.2999365. Runhai Feng, et al. Bayesian convolutional neural networks for seismic facies classification.IEEE Transactions on Geoscience and Remote Sensing, 59(10):8933–8940, 2021b. doi: 10.1109/tgrs.2020.3049012. Ekaterina Tolstaya and Anton Egorov. Deep learning for automated seismic facies classification.Interpretation, 10 (2):SC31–...

  13. [13]

    Zhiguo Wang, et al

    doi: 10.1190/int-2022-0048.1. Zhiguo Wang, et al. Seismic facies segmentation via a segformer-based specific encoder–decoder–hypercolumns scheme.IEEE Transactions on Geoscience and Remote Sensing, 61:1–11, 2023d. doi: 10.1109/TGRS.2023.3244037. Xiaoyu Chen, et al. A stronger baseline for seismic facies classification with less data.IEEE Transactions on Ge...

  14. [14]

    Ali Gholami

    doi: 10.1190/int-2017-0612-spseintro.1. Ali Gholami. A fast automatic multichannel blind seismic inversion for high-resolution impedance recovery.Geophysics, 81(5):V357–V364, 2016. doi: 10.1190/geo2015-0654.1. Shu Li and Zhenming Peng. Seismic acoustic impedance inversion with multi-parameter regularization.Journal of Geophysics and Engineering, 14(3):520...

  15. [15]

    Michelle Chaves Kuroda, et al

    doi: 10.1016/j.petrol.2015.08.001. Michelle Chaves Kuroda, et al. Analysis of porosity, stratigraphy, and structural delineation of a brazilian 42/45 carbonate field by machine learning techniques: A case study.Interpretation, 4(3):T347–T358, 2016. doi: 10.1190/int-2016-0024.1. R. S. Muradov and A. G. Shahtakhtinskiy. Application of artificial neural netw...

  16. [16]

    Nitin Verma, et al

    doi: 10.1190/geo2023-0415.1. Nitin Verma, et al. Seismic inversion based on principal component analysis and probabilistic neural network for prediction of porosity from post-stack seismic data.Earth Science Informatics, 18(1), 2025. doi: 10.1007/ s12145-024-01504-2. Yongjian Zhou, et al. Gamma log inversion of seismic data based on transformer with strat...

  17. [17]

    Seokjoon Moon, et al

    doi: 10.1190/geo2024-0092.1. Seokjoon Moon, et al. Stochastic seismic acoustic impedance inversion via a markov-chain monte carlo method using a single gpu card.Journal of Applied Geophysics, 224:105357, 2024. doi: 10.1016/j.jappgeo.2024.105357. Daniel P Huttenlocher, et al. Comparing images using the hausdorff distance.IEEE Transactions on Pattern Analys...