GeoVolDiff: Taming 3D Geological Volumes with Latent Diffusion

Hongling Chen; Jinghuai Gao; Qi Pang

arxiv: 2606.03572 · v1 · pith:TP3STZYKnew · submitted 2026-06-02 · ⚛️ physics.geo-ph

GeoVolDiff: Taming 3D Geological Volumes with Latent Diffusion

Qi Pang , Hongling Chen , Jinghuai Gao This is my paper

Pith reviewed 2026-06-28 07:26 UTC · model grok-4.3

classification ⚛️ physics.geo-ph

keywords latent diffusion models3D geological volumesseismic impedance inversiongenerative modelsphysics-based simulationgeophysical data synthesissurrogate training data

0 comments

The pith

A latent diffusion model trained on physics-simulated 3D geological volumes produces surrogate data that trains inversion networks to competitive performance on both synthetic and real field datasets without added priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the lack of labeled 3D geophysical data by building a corpus through physics-based forward simulation, training a latent diffusion model on that corpus to learn the distribution of geological structures, and then using the model to generate new volumes at scale. These generated volumes are fed to a downstream seismic impedance inversion network. The resulting network reaches competitive accuracy on both held-out synthetic cases and actual field data even though no physical or geological constraints were added during inversion training. A reader would care because real field labels are expensive and often unavailable, so the generated volumes could act as a practical stand-in for training data.

Core claim

Without incorporating any additional physical or geological prior, inversion networks pre-trained exclusively on synthesized data attain competitive performance on both synthetic and field datasets, indicating that data synthesised by the generative model can serve as an effective surrogate for costly field-acquired labels.

What carries the argument

Latent Diffusion Model (LDM) that learns the statistical distribution of 3D geological structures from a physics-based forward-simulated corpus and then generates new structurally plausible volumes.

If this is right

Inversion networks can be pre-trained solely on generated volumes and still reach competitive accuracy on field data.
No extra physical or geological priors need to be injected into the inversion stage for the performance to hold.
The generative pipeline supplies training data at a scale that would be prohibitive to acquire directly in the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same synthesis pipeline could be tested on other geophysical tasks such as velocity model building or fault detection where labeled volumes are also scarce.
Performance gaps between synthetic and field results would point to mismatches in the forward simulation rather than to the diffusion model itself.
Increasing the resolution or diversity of the initial physics-simulated corpus would likely improve the quality of the generated surrogate volumes.

Load-bearing premise

The physics-based forward simulation produces a training corpus whose statistical distribution of 3D geological structures is sufficiently representative of real field conditions for the latent diffusion model to generate useful surrogate data.

What would settle it

If inversion networks trained only on the synthesized volumes show markedly lower accuracy than networks trained on real labeled field data when both are evaluated on the same held-out field dataset, the surrogate-data claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.03572 by Hongling Chen, Jinghuai Gao, Qi Pang.

**Figure 1.** Figure 1: Overview of the GeoVolDiff framework. Three sequential stages: forward simulation of 3D geological volumes with paired condition labels, training of a 3D latent diffusion model, and large-scale data synthesis for downstream geophysical tasks. subsurface exists, because the earth cannot be excavated for verification. The network is therefore left to generalise across most of the volume with no direct superv… view at source ↗

**Figure 2.** Figure 2: Parameterised forward-simulation workflow. Stratigraphic modelling, RGT volume construction, attribute interpolation along the RGT scaffold, and fault-network embedding, yielding the geological volume M together with paired labels. 2.1 3D Forward Simulation Framework An ideal training corpus for 3D geological volume generation should satisfy three requirements simultaneously: (i) geophysical plausibility—t… view at source ↗

**Figure 3.** Figure 3: Architecture of the 3D-VAE. 3D-convolutional encoder–decoder with axial-attention modules at the bottleneck. 3D Variational Autoencoder A high-fidelity VAE is essential within the LDM framework: it compresses high-resolution volumetric data into a low-dimensional latent representation while regularising the latent distribution toward an isotropic Gaussian prior, thereby supporting stable training of the do… view at source ↗

**Figure 4.** Figure 4: 3D conditional latent diffusion model. Denoising network operating in the VAE latent space, with ControlNet branch injecting fault-mask conditioning through trainable residual connections. 3D Conditional Latent Diffusion Model With the VAE parameters frozen, the diffusion model operates entirely in the latent space z = E(x) produced by the encoder E(·) [20]. The denoising network is a UNet built primarily… view at source ↗

**Figure 5.** Figure 5: Pretrain–finetune pipeline for downstream seismic impedance inversion. Impedance synthesis with GeoVolDiff, paired-data construction via 1D convolutional forward modelling, pretraining on synthetic data, and fine-tuning with field well logs. without any real field data. In Stage 3, the pre-trained network is fine-tuned using a small number of real well-log labels together with the corresponding field seis… view at source ↗

**Figure 6.** Figure 6: Representative 3D geological volumes produced by the forward-simulation workflow. 3.2 3D Latent Diffusion Model 3D VAE Reconstruction. The forward-simulated volumes are first randomly cropped into 1283 sub-volumes and then augmented by depth-axis flipping, in-plane rotation within the inline–crossline plane, and amplitude scaling and shifting, yielding a final VAE training set of 5,000 sub-volumes at 1283 … view at source ↗

**Figure 7.** Figure 7: 3D-VAE reconstruction on out-of-training-set volumes. Each pair shows the ground-truth volume and its encode–decode reconstruction. Reconstruction fidelity is evaluated on volumes generated independently by forward simulation and then passed through the trained encoder–decoder pipeline ( [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Unconditional generation results. Synthesized 3D volumes with inline, crossline, and time-slice cross-sections. To increase training diversity, 100 acoustic-impedance volumes of size 1283 are synthesized by unconditional sampling. The corresponding synthetic seismic data are obtained by trace-wise convolution with a 25 Hz Ricker wavelet, with coherent noise added across a range of signal-to-noise ratios to… view at source ↗

**Figure 9.** Figure 9: Unconditional and fault-conditioned generation results of GeoVolDiff. The first three rows show unconditional samples, exhibiting diverse stratigraphic configurations and strong lateral continuity. The bottom row presents fault-conditioned generation: the leftmost volume displays the input fault mask, and the remaining three volumes show the corresponding generated results. The red dashed box highlights vo… view at source ↗

**Figure 10.** Figure 10: Synthetic test case. (a) Ground-truth impedance model with well locations (dashed). (b) Synthetic seismic data at 10 dB SNR. (c) Low-frequency background impedance used as initial model for USTNet. (a) (b) (c) [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Inversion results on the synthetic case at 10 dB SNR. (a) Pre-trained network applied directly without fine-tuning. (b) Proposed pretrain–finetune framework. (c) USTNet baseline. Arrows mark the far-well region. thin-layer sequences. USTNet serves as the comparison baseline on Field dataset 1, and the inversion result provided by the dataset originator is taken as the reference on Field dataset 2. Field d… view at source ↗

**Figure 12.** Figure 12: Inversion results under stronger noise. (a) USTNet at 5 dB. (b) Proposed framework at 5 dB. (c) USTNet at 0 dB. (d) Proposed framework at 0 dB. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 13.** Figure 13: Field dataset 1. (a) Observed seismic profile with well locations; Well-2 (red) is the validation well, Well-1/3/4 (black) are used for fine-tuning. (b) Low-frequency background impedance from well-log interpolation (used by USTNet only). Field dataset 2. To assess the robustness and transferability of the GeoVolDiff-generated pre-training data under a larger synthetic-to-field distribution gap, we experi… view at source ↗

**Figure 14.** Figure 14: Inversion results on Field dataset 1. (a) Pre-trained network without fine-tuning. (b) Proposed pretrain–finetune framework after well-log fine-tuning. (c) USTNet baseline with lowfrequency background and field-estimated wavelet. Dashed box marks the vicinity of the blind well Well-2. The field seismic wavelet differs substantially from the Ricker wavelet used at pre-training in phase, side-lobe structur… view at source ↗

**Figure 15.** Figure 15: Field dataset 2 (F3 inter-well profile). (a) Observed seismic profile. (b) Reference inversion result published with the dataset. (c) Associated low-frequency background impedance. els—data synthesis—to expand the geological training corpus. Unlike forward simulation, which requires considerable domain expertise and careful parameter selection, the trained diffusion model produces diverse, structurally co… view at source ↗

**Figure 16.** Figure 16: Inversion on the F3 profile with Ricker-wavelet pre-training. (a) Pre-trained network without fine-tuning. (b) After fine-tuning with Well-1 to Well-4. can be deployed as a data-augmentation module under resource-constrained conditions and, as conditioning information becomes richer, can plausibly be extended into a full geological modelling methodology. We further emphasise that no real-field information… view at source ↗

**Figure 17.** Figure 17: Inversion on the F3 profile with wavelet-adapted pre-training. (a) Wavelet-adapted pre-trained network without fine-tuning. (b) After fine-tuning with Well-1 to Well-4. (c) Blind-well test with Well-3 (red) held out. In summary, GeoVolDiff shows encouraging potential as a generative pipeline for 3D geological volumes. Non-trivial challenges remain on the path to real-world deployment—most notably training… view at source ↗

**Figure 18.** Figure 18: Distributional comparison between pre-training data and F3 field observations. Top row: histogram distributions of acoustic impedance (left) and seismic amplitude (right) on a logarithmic density scale. Bottom row: corresponding Q–Q plots. References [1] Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, et… view at source ↗

read the original abstract

Deep learning has become a prevailing paradigm across a wide range of geophysical applications. Yet most existing studies concentrate on methodological refinements -- novel network architectures, physics-informed constraints, or taskspecific loss functions -- while paying comparatively little attention to a more fundamental challenge of any data-driven approach: the availability and representativeness of high-quality training data. This limitation is especially pronounced in geophysics. Unlike computer vision, which benefits from large-scale, well-curated benchmarks such as ImageNet, comparably abundant and reliably labelled geophysical data are prohibitively expensive to acquire and, in most field settings, lack accessible ground-truth supervision. To alleviate this data deficiency, we propose GeoVolDiff, a generative framework for three-dimensional geological volumes. It comprises three coupled stages: (i) constructing a foundational training corpus through physics-based forward simulation; (ii) training a Latent Diffusion Model (LDM) to capture the statistical distribution of 3D geological structures; and (iii) synthesizing diverse, structurally plausible volumes at scale for downstream geophysical tasks. We examine the utility of the synthesized data on a representative downstream task, seismic impedance inversion. Without incorporating any additional physical or geological prior, inversion networks pre-trained exclusively on synthesized data attain competitive performance on both synthetic and field datasets, indicating that data synthesised by the generative model can serve as an effective surrogate for costly field-acquired labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's claim that LDM-generated volumes from physics simulations can serve as effective surrogate labels for field-data inversion rests on an unverified assumption about simulation representativeness, with no metrics shown in the abstract.

read the letter

The main thing to know is that this work trains a latent diffusion model on physics-simulated 3D geological volumes and reports that inversion networks trained only on the generated data reach competitive results on both synthetic and real field datasets for impedance inversion. The pipeline is simulation corpus, LDM training, then downstream use without added priors.

What is new is the direct application of an LDM to 3D geo-volume generation in this specific setting and the test on a real geophysical task. The framing of the data-scarcity problem is clear and the three-stage structure makes practical sense for anyone already running forward simulations.

The soft spots are the missing evidence. The abstract states competitive performance but supplies no numbers, baselines, error bars, or dataset descriptions. There is also no reported check that the initial simulated volumes match the statistical properties of real geology, such as variogram ranges, facies proportions, or spectral content. If that match is poor, the field-data transfer could be an artifact rather than a genuine surrogate benefit. The stress-test note correctly flags this as the load-bearing assumption.

This paper is aimed at applied geophysicists who want to use data-driven inversion but lack labeled field examples. A reader already working on generative models for scientific volumes might pick up the domain adaptation angle. It deserves a serious referee because the underlying problem is genuine and the approach is straightforward enough to evaluate once the quantitative results and validation steps are supplied.

Referee Report

2 major / 1 minor

Summary. The paper proposes GeoVolDiff, a three-stage generative framework for 3D geological volumes: (i) physics-based forward simulation to construct a foundational training corpus, (ii) training a Latent Diffusion Model (LDM) to capture the statistical distribution of geological structures, and (iii) synthesizing diverse volumes at scale. The central empirical claim is that inversion networks pre-trained exclusively on the LDM-synthesized data attain competitive performance on seismic impedance inversion for both synthetic and field datasets, without any additional physical or geological priors, indicating that the generated data can serve as an effective surrogate for costly field-acquired labels.

Significance. If the transfer results hold under proper validation, the work addresses a core practical bottleneck in geophysical machine learning by demonstrating scalable surrogate data generation. A strength is the explicit focus on downstream field-data transfer using only synthesized volumes rather than architectural innovations alone.

major comments (2)

[Abstract] Abstract: the statement that inversion networks 'attain competitive performance' on field datasets supplies no quantitative metrics, baselines, error bars, or dataset details, which is load-bearing for assessing whether the surrogate-data claim is supported.
[Training corpus construction] The description of the training corpus (stage i) provides no quantitative comparison (e.g., variogram, facies proportions, or spectral statistics) between the physics-simulated volumes and real field data, nor any sensitivity analysis on simulation parameters; this directly undermines the representativeness assumption required for the field-data transfer result.

minor comments (1)

The abstract could include a short statement of the LDM architecture, conditioning mechanism, or loss used in stage (ii) to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our claims. We address each major point below and will revise the manuscript to strengthen the supporting evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that inversion networks 'attain competitive performance' on field datasets supplies no quantitative metrics, baselines, error bars, or dataset details, which is load-bearing for assessing whether the surrogate-data claim is supported.

Authors: We agree that the abstract would benefit from explicit quantitative support. The body of the manuscript reports the relevant metrics (including error values, baselines, and dataset descriptions) for the field-data experiments. In revision we will condense and incorporate key quantitative results and dataset details into the abstract while remaining within length limits. revision: yes
Referee: [Training corpus construction] The description of the training corpus (stage i) provides no quantitative comparison (e.g., variogram, facies proportions, or spectral statistics) between the physics-simulated volumes and real field data, nor any sensitivity analysis on simulation parameters; this directly undermines the representativeness assumption required for the field-data transfer result.

Authors: The corpus is generated from standard physics-based forward modeling. The manuscript relies on downstream transfer performance as indirect evidence of utility rather than direct statistical matching. We accept that explicit comparisons would strengthen the argument and will add variogram, facies-proportion, and spectral analyses between the simulated volumes and available field data, together with a sensitivity study on the main simulation parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation stands independent of training distribution

full rationale

The manuscript describes a three-stage pipeline (physics simulation corpus → LDM training → synthesis) whose utility is asserted solely via downstream empirical performance of inversion networks on held-out synthetic and field datasets. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the abstract or described framework; the performance numbers are external measurements rather than algebraic identities or re-labeled fits. The representativeness assumption is an empirical premise subject to falsification by the field-data results themselves, not a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger is therefore minimal and provisional. The central assumption is the representativeness of simulated data.

axioms (1)

domain assumption Physics-based forward simulation produces volumes whose statistical distribution matches real 3D geological structures well enough for downstream utility.
Invoked to justify training the LDM on simulated data as a surrogate for field data.

pith-pipeline@v0.9.1-grok · 5773 in / 1228 out tokens · 24448 ms · 2026-06-28T07:26:13.031750+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-Condition Guided Diffusion Model for Controllable Elastic Parameter Synthesis
physics.geo-ph 2026-06 unverdicted novelty 6.0

A diffusion model framework with iterative refinement, adapter conditioning, and DPS-projection guidance generates elastic parameters consistent with multi-source conditions and improves seismic inversion predictions ...

Reference graph

Works this paper leans on

32 extracted references · 5 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22563–22575, 2023. 2

2023
[3]

Unsupervised seismic acoustic impedance inversion based on generative diffusion model.Geophysics, 90(4): M109–M121, 2025

Hongling Chen, Jie Chen, Mauricio D Sacchi, Jinghuai Gao, and Ping Yang. Unsupervised seismic acoustic impedance inversion based on generative diffusion model.Geophysics, 90(4): M109–M121, 2025. 1

2025
[4]

F3 Demo Dataset

dGB Earth Sciences. F3 Demo Dataset. Open Seismic Repository, 2009. URL https: //terranubis.com/datainfo/F3-Demo-2020. Accessed: 2024. 11

2009
[5]

Channelseg3d: Channel simulation and deep learning for channel interpretation in 3d seismic images.Geophysics, 86(4):IM73–IM83, 2021

Hang Gao, Xinming Wu, and Guofeng Liu. Channelseg3d: Channel simulation and deep learning for channel interpretation in 3d seismic images.Geophysics, 86(4):IM73–IM83, 2021. 1

2021
[6]

Fault detection on seismic structural images using a nested residual u-net.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2021

Kai Gao, Lianjie Huang, and Yingcai Zheng. Fault detection on seismic structural images using a nested residual u-net.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2021. 1 16

2021
[7]

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, et al. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 5

2020
[9]

High-fidelity seismic super-resolution using prior-informed deep learning with 3d awareness.IEEE Transactions on Image Processing, 2026

Jintao Li, Xinming Wu, Xianwen Zhang, Xin Du, Xiaoming Sun, Bao Deng, and Guangyu Wang. High-fidelity seismic super-resolution using prior-informed deep learning with 3d awareness.IEEE Transactions on Image Processing, 2026. 1

2026
[10]

Gligen: Open-set grounded text-to-image generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22511–22521,
[11]

Improving vertical resolution of vintage seismic data by a weakly supervised method based on cycle generative adversarial network.Geophysics, 88(6):V445–V458, 2023

Dawei Liu, Wenli Niu, Xiaokai Wang, Mauricio D Sacchi, Wenchao Chen, and Cheng Wang. Improving vertical resolution of vintage seismic data by a weakly supervised method based on cycle generative adversarial network.Geophysics, 88(6):V445–V458, 2023. 1

2023
[12]

Seismic random noise attenua- tion based on non-iid pixel-wise gaussian noise modeling.IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022

Chuangji Meng, Jinghuai Gao, Yajun Tian, and Zhiqiang Wang. Seismic random noise attenua- tion based on non-iid pixel-wise gaussian noise modeling.IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022. 1

2022
[13]

Chuangji Meng, Jinghuai Gao, Wenting Shang, and Yajun Tian. A self-supervised method for attenuating seismic random and tracewise coherent noise under the nonpixelwise independence assumption.IEEE Transactions on Geoscience and Remote Sensing, 63:1–12, 2025. doi: 10.1109/TGRS.2025.3571390. 1

work page doi:10.1109/tgrs.2025.3571390 2025
[14]

Posterior sampling for random noise attenuation via score-based generative models.Geophysics, 90(2):V83–V95,

Chuangji Meng, Jinghuai Gao, Baohai Wu, Hongling Chen, and Yajun Tian. Posterior sampling for random noise attenuation via score-based generative models.Geophysics, 90(2):V83–V95,
[15]

Synthetic seismic data for training deep learning networks.Interpretation, 10(3):SE31–SE39, 2022

Tom P Merrifield, Donald P Griffith, S Ahmad Zamanian, Stephane Gesbert, Satyakee Sen, Jorge De La Torre Guzman, R David Potter, and Henning Kuehl. Synthetic seismic data for training deep learning networks.Interpretation, 10(3):SE31–SE39, 2022. 2

2022
[16]

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 4296–4304, 2024. 2

2024
[17]

Scaling the subsurface: Deep generative synthesis of 3d seismic properties

Q Pang, H Chen, J Gao, and X Sun. Scaling the subsurface: Deep generative synthesis of 3d seismic properties. In87th EAGE Annual Conference & Exhibition, volume 2026, pages 1–5. European Association of Geoscientists & Engineers, 2026. 2, 7

2026
[18]

Iterative gradient corrected semi-supervised seismic impedance inversion via swin transformer.IEEE Transactions on Geoscience and Remote Sensing, 2025

Qi Pang, Hongling Chen, Jinghuai Gao, Zhiqiang Wang, and Ping Yang. Iterative gradient corrected semi-supervised seismic impedance inversion via swin transformer.IEEE Transactions on Geoscience and Remote Sensing, 2025. 1, 8

2025
[19]

An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985

Ken Perlin. An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985. 3

1985
[20]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 4, 5

2022
[21]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Well-logging constrained seismic inversion based on closed-loop convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 58(8):5564–5574, 2020

Yuqing Wang, Qiang Ge, Wenkai Lu, and Xinfei Yan. Well-logging constrained seismic inversion based on closed-loop convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 58(8):5564–5574, 2020. 1 17

2020
[23]

Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation

Xinming Wu, Luming Liang, Yunzhi Shi, and Sergey Fomel. Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation. Geophysics, 84(3):IM35–IM45, 2019. 1

2019
[24]

Building realistic structure models to train convolutional neural networks for seismic structural interpretation.Geophysics, 85(4):W A27–W A39, 2020

Xinming Wu, Zhicheng Geng, Yunzhi Shi, Nam Pham, Sergey Fomel, and Guillaume Caumon. Building realistic structure models to train convolutional neural networks for seismic structural interpretation.Geophysics, 85(4):W A27–W A39, 2020. 2, 3

2020
[25]

Deep learning for multidimensional seismic impedance inversion.Geophysics, 86(5):R735–R745, 2021

Xinming Wu, Shangsheng Yan, Zhengfa Bi, Sibo Zhang, and Hongjie Si. Deep learning for multidimensional seismic impedance inversion.Geophysics, 86(5):R735–R745, 2021. 1

2021
[26]

Seismic resolution enhancement using physics-assisted seismic deconvolution network and domain adaptation.Geophysics, 90(3):R113–R125, 2025

Yang Yang, Zhuo Wang, Naihao Liu, Yuxin Zhang, Rongchang Liu, and Jinghuai Gao. Seismic resolution enhancement using physics-assisted seismic deconvolution network and domain adaptation.Geophysics, 90(3):R113–R125, 2025. 1

2025
[27]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2, 5

2023
[28]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 4

2018
[29]

Deep-learning full-waveform inversion using seismic migration images.IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2021

Wei Zhang and Jinghuai Gao. Deep-learning full-waveform inversion using seismic migration images.IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2021. 1

2021
[30]

Adjoint-driven deep-learning seismic full-waveform inversion.IEEE Transactions on Geoscience and Remote Sensing, 59 (10):8913–8932, 2020

Wei Zhang, Jinghuai Gao, Zhaoqi Gao, and Hongling Chen. Adjoint-driven deep-learning seismic full-waveform inversion.IEEE Transactions on Geoscience and Remote Sensing, 59 (10):8913–8932, 2020. 1

2020
[31]

Regularized elastic full-waveform inversion using deep learning

Zhendong Zhang and Tariq Alkhalifah. Regularized elastic full-waveform inversion using deep learning. InAdvances in subsurface data analytics, pages 219–250. Elsevier, 2022. 1

2022
[32]

Ca-diffseg: Cross- attention guided diffusion model for seismic facies segmentation.IEEE Transactions on Geoscience and Remote Sensing, 64:1–15, 2026

Lin Zhou, Jinghuai Gao, Jihao Yang, Hongling Chen, and Chuangji Meng. Ca-diffseg: Cross- attention guided diffusion model for seismic facies segmentation.IEEE Transactions on Geoscience and Remote Sensing, 64:1–15, 2026. doi: 10.1109/TGRS.2025.3612494. 1 18

work page doi:10.1109/tgrs.2025.3612494 2026

[1] [1]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22563–22575, 2023. 2

2023

[3] [3]

Unsupervised seismic acoustic impedance inversion based on generative diffusion model.Geophysics, 90(4): M109–M121, 2025

Hongling Chen, Jie Chen, Mauricio D Sacchi, Jinghuai Gao, and Ping Yang. Unsupervised seismic acoustic impedance inversion based on generative diffusion model.Geophysics, 90(4): M109–M121, 2025. 1

2025

[4] [4]

F3 Demo Dataset

dGB Earth Sciences. F3 Demo Dataset. Open Seismic Repository, 2009. URL https: //terranubis.com/datainfo/F3-Demo-2020. Accessed: 2024. 11

2009

[5] [5]

Channelseg3d: Channel simulation and deep learning for channel interpretation in 3d seismic images.Geophysics, 86(4):IM73–IM83, 2021

Hang Gao, Xinming Wu, and Guofeng Liu. Channelseg3d: Channel simulation and deep learning for channel interpretation in 3d seismic images.Geophysics, 86(4):IM73–IM83, 2021. 1

2021

[6] [6]

Fault detection on seismic structural images using a nested residual u-net.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2021

Kai Gao, Lianjie Huang, and Yingcai Zheng. Fault detection on seismic structural images using a nested residual u-net.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2021. 1 16

2021

[7] [7]

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, et al. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 5

2020

[9] [9]

High-fidelity seismic super-resolution using prior-informed deep learning with 3d awareness.IEEE Transactions on Image Processing, 2026

Jintao Li, Xinming Wu, Xianwen Zhang, Xin Du, Xiaoming Sun, Bao Deng, and Guangyu Wang. High-fidelity seismic super-resolution using prior-informed deep learning with 3d awareness.IEEE Transactions on Image Processing, 2026. 1

2026

[10] [10]

Gligen: Open-set grounded text-to-image generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22511–22521,

[11] [11]

Improving vertical resolution of vintage seismic data by a weakly supervised method based on cycle generative adversarial network.Geophysics, 88(6):V445–V458, 2023

Dawei Liu, Wenli Niu, Xiaokai Wang, Mauricio D Sacchi, Wenchao Chen, and Cheng Wang. Improving vertical resolution of vintage seismic data by a weakly supervised method based on cycle generative adversarial network.Geophysics, 88(6):V445–V458, 2023. 1

2023

[12] [12]

Seismic random noise attenua- tion based on non-iid pixel-wise gaussian noise modeling.IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022

Chuangji Meng, Jinghuai Gao, Yajun Tian, and Zhiqiang Wang. Seismic random noise attenua- tion based on non-iid pixel-wise gaussian noise modeling.IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022. 1

2022

[13] [13]

Chuangji Meng, Jinghuai Gao, Wenting Shang, and Yajun Tian. A self-supervised method for attenuating seismic random and tracewise coherent noise under the nonpixelwise independence assumption.IEEE Transactions on Geoscience and Remote Sensing, 63:1–12, 2025. doi: 10.1109/TGRS.2025.3571390. 1

work page doi:10.1109/tgrs.2025.3571390 2025

[14] [14]

Posterior sampling for random noise attenuation via score-based generative models.Geophysics, 90(2):V83–V95,

Chuangji Meng, Jinghuai Gao, Baohai Wu, Hongling Chen, and Yajun Tian. Posterior sampling for random noise attenuation via score-based generative models.Geophysics, 90(2):V83–V95,

[15] [15]

Synthetic seismic data for training deep learning networks.Interpretation, 10(3):SE31–SE39, 2022

Tom P Merrifield, Donald P Griffith, S Ahmad Zamanian, Stephane Gesbert, Satyakee Sen, Jorge De La Torre Guzman, R David Potter, and Henning Kuehl. Synthetic seismic data for training deep learning networks.Interpretation, 10(3):SE31–SE39, 2022. 2

2022

[16] [16]

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 4296–4304, 2024. 2

2024

[17] [17]

Scaling the subsurface: Deep generative synthesis of 3d seismic properties

Q Pang, H Chen, J Gao, and X Sun. Scaling the subsurface: Deep generative synthesis of 3d seismic properties. In87th EAGE Annual Conference & Exhibition, volume 2026, pages 1–5. European Association of Geoscientists & Engineers, 2026. 2, 7

2026

[18] [18]

Iterative gradient corrected semi-supervised seismic impedance inversion via swin transformer.IEEE Transactions on Geoscience and Remote Sensing, 2025

Qi Pang, Hongling Chen, Jinghuai Gao, Zhiqiang Wang, and Ping Yang. Iterative gradient corrected semi-supervised seismic impedance inversion via swin transformer.IEEE Transactions on Geoscience and Remote Sensing, 2025. 1, 8

2025

[19] [19]

An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985

Ken Perlin. An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985. 3

1985

[20] [20]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 4, 5

2022

[21] [21]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Well-logging constrained seismic inversion based on closed-loop convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 58(8):5564–5574, 2020

Yuqing Wang, Qiang Ge, Wenkai Lu, and Xinfei Yan. Well-logging constrained seismic inversion based on closed-loop convolutional neural network.IEEE Transactions on Geoscience and Remote Sensing, 58(8):5564–5574, 2020. 1 17

2020

[23] [23]

Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation

Xinming Wu, Luming Liang, Yunzhi Shi, and Sergey Fomel. Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation. Geophysics, 84(3):IM35–IM45, 2019. 1

2019

[24] [24]

Building realistic structure models to train convolutional neural networks for seismic structural interpretation.Geophysics, 85(4):W A27–W A39, 2020

Xinming Wu, Zhicheng Geng, Yunzhi Shi, Nam Pham, Sergey Fomel, and Guillaume Caumon. Building realistic structure models to train convolutional neural networks for seismic structural interpretation.Geophysics, 85(4):W A27–W A39, 2020. 2, 3

2020

[25] [25]

Deep learning for multidimensional seismic impedance inversion.Geophysics, 86(5):R735–R745, 2021

Xinming Wu, Shangsheng Yan, Zhengfa Bi, Sibo Zhang, and Hongjie Si. Deep learning for multidimensional seismic impedance inversion.Geophysics, 86(5):R735–R745, 2021. 1

2021

[26] [26]

Seismic resolution enhancement using physics-assisted seismic deconvolution network and domain adaptation.Geophysics, 90(3):R113–R125, 2025

Yang Yang, Zhuo Wang, Naihao Liu, Yuxin Zhang, Rongchang Liu, and Jinghuai Gao. Seismic resolution enhancement using physics-assisted seismic deconvolution network and domain adaptation.Geophysics, 90(3):R113–R125, 2025. 1

2025

[27] [27]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2, 5

2023

[28] [28]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 4

2018

[29] [29]

Deep-learning full-waveform inversion using seismic migration images.IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2021

Wei Zhang and Jinghuai Gao. Deep-learning full-waveform inversion using seismic migration images.IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2021. 1

2021

[30] [30]

Adjoint-driven deep-learning seismic full-waveform inversion.IEEE Transactions on Geoscience and Remote Sensing, 59 (10):8913–8932, 2020

Wei Zhang, Jinghuai Gao, Zhaoqi Gao, and Hongling Chen. Adjoint-driven deep-learning seismic full-waveform inversion.IEEE Transactions on Geoscience and Remote Sensing, 59 (10):8913–8932, 2020. 1

2020

[31] [31]

Regularized elastic full-waveform inversion using deep learning

Zhendong Zhang and Tariq Alkhalifah. Regularized elastic full-waveform inversion using deep learning. InAdvances in subsurface data analytics, pages 219–250. Elsevier, 2022. 1

2022

[32] [32]

Ca-diffseg: Cross- attention guided diffusion model for seismic facies segmentation.IEEE Transactions on Geoscience and Remote Sensing, 64:1–15, 2026

Lin Zhou, Jinghuai Gao, Jihao Yang, Hongling Chen, and Chuangji Meng. Ca-diffseg: Cross- attention guided diffusion model for seismic facies segmentation.IEEE Transactions on Geoscience and Remote Sensing, 64:1–15, 2026. doi: 10.1109/TGRS.2025.3612494. 1 18

work page doi:10.1109/tgrs.2025.3612494 2026