CIG-Bench: A Comprehensive Survey and Benchmark for AI-Driven Subsurface Imaging Understanding
Pith reviewed 2026-06-27 14:21 UTC · model grok-4.3
The pith
A proposed community benchmark unifies evaluation for AI models on four subsurface imaging tasks using mixed synthetic and real data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The survey organizes the literature around four tasks and concludes that the missing unified benchmark has been the most consequential barrier; CIG-Bench therefore supplies synthetic volumes for quantitative scoring and real surveys for qualitative checks, together with standardized protocols and initial pretrained models, across fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling.
What carries the argument
CIG-Bench, the benchmark that supplies unified protocols, pretrained models, and datasets mixing synthetic data for metrics with real surveys for realism across the four tasks.
If this is right
- Researchers can compare methods on the same datasets and metrics instead of each creating private test sets.
- Pretrained models released with the benchmark become starting points that reduce the need for large new annotations.
- Uncertainty outputs can be required as part of standard model evaluation rather than added later.
- Physical and geological constraints can be tested inside a common evaluation framework rather than in isolated studies.
Where Pith is reading between the lines
- The benchmark may push the field toward models that treat uncertainty as a required output rather than an optional extra.
- Shared real-survey portions could make cross-survey generalization a measurable target instead of an informal claim.
- Similar benchmark designs in other domains with sparse labels, such as medical imaging, succeeded by the same mix of quantitative and qualitative checks.
Load-bearing premise
That combining synthetic data for numbers with real surveys for realism, plus shared protocols, will produce meaningful progress on interpretation under complex conditions and cross-survey generalization.
What would settle it
Release of the benchmark followed by independent tests showing that models ranked highest on its synthetic metrics perform no better than prior methods on entirely new real seismic surveys or that the four tasks remain incomparable despite the shared protocols.
Figures
read the original abstract
Subsurface imaging understanding bridges observed geophysical data and quantitative geological models, supporting hydrocarbon exploration, CO2 storage site assessment, and geohazard monitoring. Over the past decade, deep learning has substantially reshaped interpretation workflows. To take stock of this progress, we systematically analyze 652 publications from 2015 to 2025 and organize the field into four major tasks: structural interpretation, geobody identification, seismic facies analysis, and property estimation. Yet subsurface imaging interpretation differs fundamentally from other AI-driven tasks, facing ambiguous signals, pronounced interpretive non-uniqueness, sparse semantics, unfixed target locations, and scarce reliable annotations. Building on the reviewed literature, we summarize three interrelated challenges that define its frontier: interpretation under complex geological conditions, cross-survey semantic generalization under low information density, and the absence of reliable benchmarks. Addressing them will hinge on integrating human expertise, physical constraints, and geological priors into model training or inference, and on treating uncertainty quantification as an intrinsic model output. Among these, the lack of unified benchmarks has been particularly consequential, making fair comparison difficult, hindering reproducibility, and fragmenting community efforts. We therefore propose a community benchmark spanning fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling. It integrates unified evaluation protocols, pretrained models, and datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment. By coupling a decade-spanning review with an evolving benchmark, this work offers a timely reference and a reproducible foundation to accelerate future research and deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys 652 publications (2015–2025) on deep learning for subsurface imaging interpretation, organizing them into four tasks (structural interpretation, geobody identification, seismic facies analysis, property estimation). It identifies three challenges—interpretation under complex conditions, cross-survey semantic generalization under low information density, and absence of reliable benchmarks—and proposes CIG-Bench, a community benchmark spanning fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling. The benchmark combines synthetic data for quantitative metrics with real surveys for qualitative assessment, together with unified protocols and pretrained models.
Significance. If the survey methodology proves reproducible and the benchmark protocols demonstrably improve cross-survey generalization and uncertainty handling, the work would consolidate a fragmented literature and supply a practical, evolving reference that could accelerate standardized evaluation and deployment in geophysical AI. The explicit call for community participation and integration of physical priors is a constructive strength.
major comments (3)
- [Abstract / Survey methodology] Abstract and survey section: the claim of having 'systematically analyze[d] 652 publications' provides no search strategy, database(s), inclusion/exclusion criteria, or quality-assessment protocol. Without these, the completeness, representativeness, and potential selection bias of the review cannot be evaluated, directly undermining the three challenges derived from it.
- [Benchmark proposal] Benchmark proposal (final paragraph): the description states that the benchmark 'integrates unified evaluation protocols' and 'datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment,' yet supplies no concrete definition of those protocols, no specification of how synthetic data will reproduce relevant distribution shifts or non-uniqueness regimes, and no reproducible scoring rubric for the qualitative real-data component. These omissions make the central claim that the benchmark will address cross-survey generalization untestable in its current form.
- [Abstract] Abstract: the assertion that 'the lack of unified benchmarks has been particularly consequential' is presented as a conclusion from the 652-paper review, but the text contains no quantitative supporting evidence (e.g., fraction of papers using incompatible datasets or metrics, or citation patterns showing fragmentation). This weakens the justification for prioritizing benchmark development over the other two challenges.
minor comments (2)
- [Task organization] The four task categories are introduced without explicit decision rules or overlap statistics, which could be clarified with a small table or flowchart.
- [Related work] The manuscript would benefit from citing at least two existing geophysical benchmarks (e.g., in seismic inversion or facies classification) to situate CIG-Bench relative to prior efforts.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will implement.
read point-by-point responses
-
Referee: [Abstract / Survey methodology] Abstract and survey section: the claim of having 'systematically analyze[d] 652 publications' provides no search strategy, database(s), inclusion/exclusion criteria, or quality-assessment protocol. Without these, the completeness, representativeness, and potential selection bias of the review cannot be evaluated, directly undermining the three challenges derived from it.
Authors: We agree that the absence of explicit survey methodology details is a significant omission that prevents readers from assessing the review's rigor and potential biases. In the revised manuscript we will add a dedicated 'Survey Methodology' subsection that specifies the search strategy (keywords, Boolean queries, and time filters), databases used (Google Scholar, Web of Science, arXiv, and major geophysics journals), inclusion/exclusion criteria (peer-reviewed or preprint works on deep learning for subsurface imaging 2015–2025, English language, focus on interpretation tasks), and any quality-assessment steps applied. We will also note limitations and how the three challenges were synthesized from the corpus. revision: yes
-
Referee: [Benchmark proposal] Benchmark proposal (final paragraph): the description states that the benchmark 'integrates unified evaluation protocols' and 'datasets that combine synthetic data for quantitative evaluation with real surveys for qualitative assessment,' yet supplies no concrete definition of those protocols, no specification of how synthetic data will reproduce relevant distribution shifts or non-uniqueness regimes, and no reproducible scoring rubric for the qualitative real-data component. These omissions make the central claim that the benchmark will address cross-survey generalization untestable in its current form.
Authors: The referee correctly identifies that the benchmark description remains too high-level to be immediately actionable or verifiable. We will expand the CIG-Bench section with: (i) precise evaluation protocols and metrics for each task (e.g., IoU, Dice, and boundary F1 for segmentation; MAE and correlation for property modeling, with reference implementations), (ii) explicit synthetic-data generation procedures designed to induce distribution shifts and non-uniqueness (varying noise, acquisition parameters, and geological complexity), and (iii) a reproducible qualitative rubric for real-data assessment (standardized expert scoring criteria with inter-rater reliability measures). These additions will make the benchmark's contribution to cross-survey generalization testable. revision: yes
-
Referee: [Abstract] Abstract: the assertion that 'the lack of unified benchmarks has been particularly consequential' is presented as a conclusion from the 652-paper review, but the text contains no quantitative supporting evidence (e.g., fraction of papers using incompatible datasets or metrics, or citation patterns showing fragmentation). This weakens the justification for prioritizing benchmark development over the other two challenges.
Authors: We accept that the abstract statement would be stronger with quantitative backing. Our review process identified fragmentation qualitatively through the diversity of datasets and metrics, but we did not tabulate exact fractions. In revision we will either extract and report such statistics from the surveyed corpus (e.g., percentage of papers using non-public or mutually incompatible datasets) or rephrase the claim to reflect a qualitative synthesis while still highlighting benchmark development as a high-priority need. This will clarify the relative emphasis among the three challenges. revision: partial
Circularity Check
No circularity: survey and benchmark proposal is self-contained
full rationale
The paper is a descriptive literature review of 652 publications plus a new benchmark proposal. It contains no mathematical derivations, equations, fitted parameters, or predictions that reduce to quantities defined by the authors' own choices. The central claims (lack of unified benchmarks as a barrier; proposal of tasks like fault segmentation and RGT estimation) are presented as observations and suggestions rather than derived results. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support load-bearing steps. This matches the default expectation for non-circular survey/benchmark papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Subsurface imaging interpretation can be partitioned into the four tasks of structural interpretation, geobody identification, seismic facies analysis, and property estimation.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1029/2021rg000742. Karianne J. Bergen, et al. Machine learning for data-driven discovery in solid earth geoscience.Science, 363(6433),
-
[2]
Machine learning for data-driven discovery in solid earth geoscience
doi: 10.1126/science.aau0323. Dario Grana. Bayesian linearized rock-physics inversion.Geophysics, 81(6):D625–D641, 2016. doi: 10.1190/ geo2016-0161.1. Ziheng Sun, et al. A review of earth artificial intelligence.Computers & Geosciences, 159:105034, 2022. doi: 10.1016/j.cageo.2022.105034. Zhen Wang, et al. Successful leveraging of image processing and mach...
-
[3]
doi: 10.1190/geo2019-0252.1. Yazeed Alaudah, et al. A machine-learning benchmark for facies classification.Interpretation, 7(3):SE175–SE187,
-
[4]
doi: 10.1190/int-2018-0249.1. Feng Qian, et al. Unsupervised seismic facies analysis via deep convolutional autoencoders.Geophysics, 83(3): A39–A43, 2018. doi: 10.1190/geo2017-0524.1. Fangshu Yang and Jianwei Ma. Deep-learning inversion: A next-generation seismic velocity model building method. Geophysics, 84(4):R583–R599, 2019. doi: 10.1190/geo2018-0249....
-
[5]
doi: 10.1190/geo2018-0838.1. 33/45 S. Mostafa Mousavi, et al. Applications of deep neural networks in exploration seismology: A technical survey. Geophysics, 89(1):WA95–WA115, 2024. doi: 10.1190/geo2023-0063.1. Reetam Biswas, et al. Prestack and poststack inversion using a physics-guided convolutional neural network. Interpretation, 7(3):SE161–SE174, 2019...
-
[6]
doi: 10.1073/pnas.2219573120. Xinming Wu, et al. Faultnet3d: Predicting fault probabilities, strikes, and dips with a single convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 57(11):9138–9155, 2019c. doi: 10.1109/tgrs.2019. 2925003. Yu An, et al. Current state and future directions for deep learning based automatic seismic f...
-
[7]
Muhammad Saif Ul Islam and Aamir Wali
doi: 10.1016/j.earscirev.2021.103876. Muhammad Saif Ul Islam and Aamir Wali. A comprehensive review of deep learning techniques for salt dome segmentation in seismic images.Journal of Applied Geophysics, 230:105504, 2024. doi: 10.1016/j.jappgeo.2024. 105504. JingYi Wang, et al. Deep artificial neural network in seismic inversion.Progress in Geophysics, 38...
-
[8]
doi: 10.1190/int-2015-0009.1. Gabriel Godefroy, et al. Seismic interpretation of fault-related deformation using a numerical kinematic model. In SEG Technical Program Expanded Abstracts 2016, pages 1981–1986. Society of Exploration Geophysicists, 2016. doi: 10.1190/segam2016-13880830.1. Ke Wang, et al. 3d seismic horizon extraction with horizon patch cons...
-
[9]
doi: 10.1190/tle36030208.1. Xinming Wu. Directional structure-tensor-based coherence to detect seismic faults and channels.Geophysics, 82(2): A 13–A 17, 2017. doi: 10.1190/geo2016-0473.1. Haibin Di and Dengliang Gao. 3d seismic flexure analysis for subsurface fault detection and fracture characterization. Pure and Applied Geophysics, 174(3):747–761, 2017a...
-
[10]
doi: 10.1016/j.cageo.2016.12.014. Yuji Kim, et al. Seismic-facies classification using random forest algorithm. InSEG Technical Program Expanded Abstracts 2018. Society of Exploration Geophysicists, 2018. doi: 10.1190/segam2018-2998553.1. Wei Li, et al. Fusing multiple frequency-decomposed seismic attributes with machine learning for thickness prediction ...
-
[11]
doi: 10.1190/geo2019-0223.1. David Lubo-Robles, et al. Machine learning model interpretability using shap values: Application to a seismic facies classification task. InSEG Technical Program Expanded Abstracts 2020, pages 1460–1464. Society of Exploration Geophysicists, 2020. doi: 10.1190/segam2020-3428275.1. Tao Zhao. Seismic facies classification using ...
-
[12]
doi: 10.1109/tgrs.2020.2999365. Runhai Feng, et al. Bayesian convolutional neural networks for seismic facies classification.IEEE Transactions on Geoscience and Remote Sensing, 59(10):8933–8940, 2021b. doi: 10.1109/tgrs.2020.3049012. Ekaterina Tolstaya and Anton Egorov. Deep learning for automated seismic facies classification.Interpretation, 10 (2):SC31–...
-
[13]
doi: 10.1190/int-2022-0048.1. Zhiguo Wang, et al. Seismic facies segmentation via a segformer-based specific encoder–decoder–hypercolumns scheme.IEEE Transactions on Geoscience and Remote Sensing, 61:1–11, 2023d. doi: 10.1109/TGRS.2023.3244037. Xiaoyu Chen, et al. A stronger baseline for seismic facies classification with less data.IEEE Transactions on Ge...
-
[14]
doi: 10.1190/int-2017-0612-spseintro.1. Ali Gholami. A fast automatic multichannel blind seismic inversion for high-resolution impedance recovery.Geophysics, 81(5):V357–V364, 2016. doi: 10.1190/geo2015-0654.1. Shu Li and Zhenming Peng. Seismic acoustic impedance inversion with multi-parameter regularization.Journal of Geophysics and Engineering, 14(3):520...
-
[15]
doi: 10.1016/j.petrol.2015.08.001. Michelle Chaves Kuroda, et al. Analysis of porosity, stratigraphy, and structural delineation of a brazilian 42/45 carbonate field by machine learning techniques: A case study.Interpretation, 4(3):T347–T358, 2016. doi: 10.1190/int-2016-0024.1. R. S. Muradov and A. G. Shahtakhtinskiy. Application of artificial neural netw...
-
[16]
doi: 10.1190/geo2023-0415.1. Nitin Verma, et al. Seismic inversion based on principal component analysis and probabilistic neural network for prediction of porosity from post-stack seismic data.Earth Science Informatics, 18(1), 2025. doi: 10.1007/ s12145-024-01504-2. Yongjian Zhou, et al. Gamma log inversion of seismic data based on transformer with strat...
-
[17]
doi: 10.1190/geo2024-0092.1. Seokjoon Moon, et al. Stochastic seismic acoustic impedance inversion via a markov-chain monte carlo method using a single gpu card.Journal of Applied Geophysics, 224:105357, 2024. doi: 10.1016/j.jappgeo.2024.105357. Daniel P Huttenlocher, et al. Comparing images using the hausdorff distance.IEEE Transactions on Pattern Analys...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.