Accelerating Nonlinear Time-History Analysis with Complex Constitutive Laws via Heterogeneous Memory Management: From 3D Seismic Simulation to Neural Network Training

Hideaki Ito; Kohei Fujita; Lalith Maddegedara; Muneo Hori; Tsuyoshi Ichimura

arxiv: 2604.02755 · v1 · submitted 2026-04-03 · 💻 cs.DC

Accelerating Nonlinear Time-History Analysis with Complex Constitutive Laws via Heterogeneous Memory Management: From 3D Seismic Simulation to Neural Network Training

Tsuyoshi Ichimura , Kohei Fujita , Hideaki Ito , Muneo Hori , Lalith Maddegedara This is my paper

Pith reviewed 2026-05-13 18:57 UTC · model grok-4.3

classification 💻 cs.DC

keywords heterogeneous memory managementGPU accelerationnonlinear time-history analysisconstitutive lawsseismic simulationneural network surrogatememory wallensemble simulation

0 comments

The pith

A heterogeneous memory management framework lets GPUs run memory-intensive nonlinear time-history simulations by actively using host CPU memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that moves state variables between GPU and large host CPU memory during nonlinear time-history evolution with complex constitutive laws. By exploiting high-bandwidth CPU-GPU links, the method keeps the GPU busy while storing the bulk of the data on the CPU, removing the hard GPU-memory ceiling that previously blocked large ensemble runs. Performance tests show clear gains in both wall-clock time and energy use compared with standard GPU-only implementations. The same framework is then used to generate the massive datasets needed to train a neural-network surrogate for the original high-fidelity model.

Core claim

The central claim is that a heterogeneous memory management scheme, which keeps only active working sets on the GPU while paging the remainder to host memory, overcomes the GPU memory wall for general nonlinear time-history problems. When the CPU-GPU interconnect bandwidth is sufficient, this approach delivers both faster time-to-solution and lower energy-to-solution than conventional GPU-resident codes, and the generated data volumes are large enough to train accurate neural-network surrogates for 3D seismic problems.

What carries the argument

heterogeneous memory management framework that pages state variables between GPU and host CPU memory while maximizing GPU throughput

If this is right

Larger numbers of state variables per element become feasible without shrinking the spatial domain.
Ensemble sizes can be increased until the limiting factor is host memory rather than GPU memory.
Generated datasets become large enough to train surrogate models that replace the full physics solver for many queries.
The same memory strategy applies directly to other time-stepping problems that carry large per-step state vectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique could be combined with multi-GPU or distributed-memory systems to scale even further in both space and ensemble size.
Similar paging logic might reduce energy costs in other latency-tolerant HPC workloads that currently hit device-memory limits.
Once trained, the neural surrogate could be used inside optimization loops that would otherwise be too expensive with the full 3D model.

Load-bearing premise

High-bandwidth CPU-GPU interconnects can supply data from host memory fast enough that the added transfers do not erase the performance gains from larger problem sizes.

What would settle it

Run the identical nonlinear time-history ensemble once with the heterogeneous scheme and once with a pure GPU-resident code of the same size; if the heterogeneous version shows no reduction in time-to-solution or energy-to-solution, the performance claim is false.

Figures

Figures reproduced from arXiv: 2604.02755 by Hideaki Ito, Kohei Fujita, Lalith Maddegedara, Muneo Hori, Tsuyoshi Ichimura.

**Figure 1.** Figure 1: (a) 3D ground structure model with line A-B and point C. The x, ycoordinates of A, B, and C are (848, 1400), (848, 1900), and (848, 1648) m, respectively. (b) Close-up view of the region around line A-B. (c) Material properties of the soil structure. tion in this study is sufficiently optimized to a level comparable to that in [3]. To ensure the baseline methods achieve adequate performance, we employ a 3… view at source ↗

**Figure 2.** Figure 2: Elapsed time per case [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Maximum velocity norm distribution at the surface for Kobe wave input surface interface 1400 1500 1600 1700 1800 1900 40 50 60 70 80 y coordinate (m) elevation (m) A B C a) 3D 1D 1400 1500 1600 1700 1800 1900 0.4 0.6 0.8 1.0 1.2 1.4 y coordinate (m) max. velocity (m/s) b) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Cross section of ground structure at line A-B. The ”surface” indicate the ground surface, while ”interface” indicates the interface between the first and bedrock layers. C indicate the obervation point. (b) Maximum velocity response in the xdirection along line A-B. The black dot indicates the response estimated by the NNs. mates the soil as a horizontally layered structure, effectively reducing a 3D … view at source ↗

**Figure 5.** Figure 5: Reponse at point C for Kobe wave. (a) and (b) indicate responses to 3D and 1D dynamic nonlinear analysis, while (c) shows the estimated response by NNs. (d) indicate the velocity response spectra (h = 0.05) for waves in (a), (b), and (c). C. As shown in Figs. 5(a) and (b), the 1D analysis underestimates the waveform amplitude at point C, and its potency as an external force is also markedly underestimated… view at source ↗

read the original abstract

Nonlinear time-history evolution problems employing high-fidelity physical models are essential in numerous scientific domains. However, these problems face a critical dual bottleneck: the immense computational cost of time-stepping and the massive memory requirements for maintaining a vast array of state variables. To address these challenges, we propose a novel framework based on heterogeneous memory management for massive ensemble simulations of general nonlinear time-history problems with complex constitutive laws. Taking advantage of recent advancements in CPU-GPU interconnect bandwidth, our approach actively leverages the large capacity of host CPU memory while simultaneously maximizing the throughput of the GPU. This strategy effectively overcomes the GPU memory wall, enabling memory-intensive simulations. We evaluate the performance of the proposed method through comparisons with conventional implementations, demonstrating significant improvements in time-to-solution and energy-to-solution. Furthermore, we demonstrate the practical utility of this framework by developing a Neural Network-based surrogate model using the generated massive datasets. The results highlight the effectiveness of our approach in enabling high-fidelity 3D evaluations and its potential for broader applications in data-driven scientific discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable heterogeneous memory scheme to run big nonlinear time-history ensembles on GPU+host setups for seismic work and then feed the output into a neural surrogate, but the performance numbers and transfer details stay too high-level to judge the gains.

read the letter

The main thing to know is that they tackle the memory wall in nonlinear time-stepping by keeping most state variables in CPU memory and streaming what the GPU needs over modern interconnects, which lets them run larger 3D ensembles with complex constitutive laws than a pure GPU approach would allow. They also show the generated data being used to train a surrogate model, which is a reasonable downstream use case. That combination is the practical contribution here. What they do well is lay out the dual problem of compute cost and state memory in these simulations and give a clear motivation for why active host-GPU management makes sense when interconnect bandwidth has improved. The seismic application and the surrogate step both feel grounded in real workflow needs rather than abstract benchmarks. The soft spots are in the evidence. The abstract and description give no transfer volume figures, no overlap ratios between data movement and kernel time, and no roofline or bandwidth measurements to show that the interconnect traffic stays sub-dominant during the repeated state updates required by nonlinear steps. The stress-test point about possible latency or effective bandwidth shortfalls is therefore still open; without those numbers the claimed improvements in time-to-solution and energy-to-solution are hard to evaluate. The memory-management idea itself is not new in HPC, so the novelty sits mostly in the specific constitutive-law application and the ensemble scale they target. This is for computational seismologists and HPC groups who already run large finite-element time-history codes and are hitting GPU memory limits, or for people who need big physics datasets for surrogate training. A reader in that space would get usable ideas even if the performance claims need tightening. It deserves peer review because the problem is concrete, the proposed direction is sensible, and the work is coherent on its own terms, but any referee will want the missing quantitative checks on data movement before accepting the speed and energy results.

Referee Report

2 major / 0 minor

Summary. The paper proposes a heterogeneous memory management framework for massive ensemble simulations of nonlinear time-history problems with complex constitutive laws. By actively leveraging host CPU memory through high-bandwidth CPU-GPU interconnects while maximizing GPU throughput, the approach aims to overcome GPU memory limitations, yielding improvements in time-to-solution and energy-to-solution over conventional methods. The framework is further applied to generate large datasets for training neural network surrogate models, with claimed utility for high-fidelity 3D seismic evaluations and data-driven discovery.

Significance. If the performance claims hold with supporting measurements, the work could enable previously intractable large-scale nonlinear simulations in computational seismology and mechanics by relaxing GPU memory constraints, while also providing a pathway to data-driven surrogates; the emphasis on interconnect-aware heterogeneous management aligns with emerging hardware trends but requires concrete validation to establish impact.

major comments (2)

[Abstract] Abstract: the central claim of significant improvements in time-to-solution and energy-to-solution from comparisons with conventional implementations is unsupported, as no benchmark numbers, transfer-volume measurements, overlap ratios, or roofline analysis are supplied to demonstrate that CPU-GPU interconnect traffic for repeated per-element state-variable updates remains sub-dominant during nonlinear time-stepping.
[Abstract] Abstract: the assumption that recent CPU-GPU interconnects (e.g., NVLink or PCIe) allow active host-memory use without negating gains is load-bearing for the framework, yet no analysis addresses potential latency or effective-bandwidth bottlenecks when loading/updating large state arrays each time step; this directly engages the skeptic concern and leaves the memory-wall-overcoming claim unevaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract needs to more explicitly support the performance claims with quantitative data and analysis of interconnect behavior. We will make revisions to address these points.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of significant improvements in time-to-solution and energy-to-solution from comparisons with conventional implementations is unsupported, as no benchmark numbers, transfer-volume measurements, overlap ratios, or roofline analysis are supplied to demonstrate that CPU-GPU interconnect traffic for repeated per-element state-variable updates remains sub-dominant during nonlinear time-stepping.

Authors: The full paper includes detailed benchmark results in the evaluation section showing specific improvements (e.g., reduced time-to-solution by factors depending on ensemble size). However, to make the abstract self-supporting, we will revise it to include key numbers such as measured transfer volumes per time step, overlap ratios achieved through asynchronous transfers, and a summary of roofline analysis confirming that interconnect traffic is sub-dominant. This revision will be made. revision: yes
Referee: [Abstract] Abstract: the assumption that recent CPU-GPU interconnects (e.g., NVLink or PCIe) allow active host-memory use without negating gains is load-bearing for the framework, yet no analysis addresses potential latency or effective-bandwidth bottlenecks when loading/updating large state arrays each time step; this directly engages the skeptic concern and leaves the memory-wall-overcoming claim unevaluated.

Authors: We recognize the importance of addressing potential bottlenecks explicitly. In the revised manuscript, we will incorporate an analysis of effective bandwidth and latency for state array updates, including measurements on the target hardware (e.g., NVLink bandwidth utilization rates) and discussion of how the heterogeneous management strategy mitigates these issues. This will provide the requested evaluation of the memory-wall claim. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical comparisons without self-referential derivations

full rationale

The manuscript proposes a heterogeneous memory framework and evaluates it via direct performance comparisons against conventional implementations, plus a downstream NN surrogate demonstration. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the text. The central claims (overcoming GPU memory wall, time/energy improvements) are presented as outcomes of the proposed method rather than reductions to prior inputs or self-citations by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5508 in / 979 out tokens · 43225 ms · 2026-05-13T18:57:07.055302+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

PCIe 5.0 specification,https://pcisig.com/, (last Accessed 28 January 2026)

work page 2026
[2]

NVIDIA GH200 Grace Hopper Superchip Architecture.https://docs.n vidia.com/gh200-superchip-benchmark-guide.pdf, (last Accessed 28 January 2026)

work page 2026
[3]

Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D Na- tional Fusion Facility Using Leadership Class Computing Resources

Tsuyoshi Ichimura, Kohei Fujita, Muneo Hori, Lalith Maddegedara, Jack Wells, Alan Gray, Ian Karlin, John Linford. Heterogeneous computing in a strongly-connected CPU-GPU environment: fast multiple time-evolution equation-based modeling accelerated using data-driven approach. 2024. SC24-W: Workshops of the International Conference for High Perfor- mance Co...

work page doi:10.1109/scw63240.2024.00246 2024
[4]

Three-Dimensional Nonlinear Seismic Ground Response Analy- sis of Local Site Effects for Estimating Seismic Behavior of Buried Pipelines

Tsuyoshi Ichimura, Kohei Fujita, Muneo Hori, Takashi Sakanoue, Ryo Hamanaka. Three-Dimensional Nonlinear Seismic Ground Response Analy- sis of Local Site Effects for Estimating Seismic Behavior of Buried Pipelines. 2014.J. Pressure Vessel Technol., ASME, 136-4, 041702 (8 pages).https: //doi.org/10.1115/1.4026208

work page doi:10.1115/1.4026208 2014
[5]

Three dimensional formulation and objectivity of a strain space multiple mechanism model for sand

Susumu Iai. Three dimensional formulation and objectivity of a strain space multiple mechanism model for sand. 1993.Soils and Foundations, 33-1, 192–199,https://doi.org/10.3208/sandf1972.33.192

work page doi:10.3208/sandf1972.33.192 1993
[6]

Idriss, Ricardo Dobry, Ram D

Izzat M. Idriss, Ricardo Dobry, Ram D. Singh. Nonlinear Behavior of Soft Clays During Cyclic Loading. 1978.Journal of the Geotechnical Engineering Division, 104, 1427–1447,https://doi.org/10.1061/AJGEB6.0000727 15

work page doi:10.1061/ajgeb6.0000727 1978
[7]

G. Masing. Eigenspannungen und Verfestigung beim Messing. 1926.Pro- ceedings of the 2nd International Congress of Applied Mechanics, 332–335

work page 1926
[8]

Winget, Thomas J.R

James M. Winget, Thomas J.R. Hughes. Solution algorithms for nonlinear transient heat conduction analysis employing element-by-element iterative strategies. 1985.Computer Methods in Applied Mechanics and Engineering, 52, 711–815

work page 1985
[9]

Physics-based urban earth- quake simulation enhanced by 10.7 BlnDOF x 30 K time-step unstructured FE non-linear seismic wave simulation

Tsuyoshi Ichimura, Kohei Fujita, Seizo Tanaka, Muneo Hori, Maddegedara Lalith, Yoshihisa Shizawa, Hiroshi Kobayashi. Physics-based urban earth- quake simulation enhanced by 10.7 BlnDOF x 30 K time-step unstructured FE non-linear seismic wave simulation. 2014.SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, St...

work page 2014
[10]

A Hybrid Multiresolu- tion Meshing Technique for Finite Element Three-Dimensional Earthquake Ground Motion Modeling in Basins Including Topography

Tsuyoshi Ichimura, Muneo Hori, Jacobo Bielak. A Hybrid Multiresolu- tion Meshing Technique for Finite Element Three-Dimensional Earthquake Ground Motion Modeling in Basins Including Topography. 2009.Geophys- ical Journal International, 177, 1221–1232,https://doi.org/10.1111/ j.1365-246X.2009.04154.x

work page arXiv 2009
[11]

Wijerathne, Seizo Tanaka, A Quick Earthquake Disaster Estimation System with Fast Urban Earthquake Simulation and Interactive Visualization

Kohei Fujita, Tsuyoshi Ichimura, Muneo Hori, M.L.L. Wijerathne, Seizo Tanaka, A Quick Earthquake Disaster Estimation System with Fast Urban Earthquake Simulation and Interactive Visualization. 2014.Procedia Com- puter Science, 29, 866–876,https://doi.org/10.1016/j.procs.2014.0 5.078

work page doi:10.1016/j.procs.2014.0 2014
[12]

Japan Meteorological Agency,https://www.data.jma.go.jp/eqev/data /kyoshin/jishin/, (last Accessed 28 January 2026)

work page 2026
[13]

Akiba, S

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama. Optuna: A Next-generation Hyperparameter Optimization Framework. 2019.Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, 2623–2631,https: //doi.org/10.1145/3292500.3330701 16

work page doi:10.1145/3292500.3330701 2019

[1] [1]

PCIe 5.0 specification,https://pcisig.com/, (last Accessed 28 January 2026)

work page 2026

[2] [2]

NVIDIA GH200 Grace Hopper Superchip Architecture.https://docs.n vidia.com/gh200-superchip-benchmark-guide.pdf, (last Accessed 28 January 2026)

work page 2026

[3] [3]

Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D Na- tional Fusion Facility Using Leadership Class Computing Resources

Tsuyoshi Ichimura, Kohei Fujita, Muneo Hori, Lalith Maddegedara, Jack Wells, Alan Gray, Ian Karlin, John Linford. Heterogeneous computing in a strongly-connected CPU-GPU environment: fast multiple time-evolution equation-based modeling accelerated using data-driven approach. 2024. SC24-W: Workshops of the International Conference for High Perfor- mance Co...

work page doi:10.1109/scw63240.2024.00246 2024

[4] [4]

Three-Dimensional Nonlinear Seismic Ground Response Analy- sis of Local Site Effects for Estimating Seismic Behavior of Buried Pipelines

Tsuyoshi Ichimura, Kohei Fujita, Muneo Hori, Takashi Sakanoue, Ryo Hamanaka. Three-Dimensional Nonlinear Seismic Ground Response Analy- sis of Local Site Effects for Estimating Seismic Behavior of Buried Pipelines. 2014.J. Pressure Vessel Technol., ASME, 136-4, 041702 (8 pages).https: //doi.org/10.1115/1.4026208

work page doi:10.1115/1.4026208 2014

[5] [5]

Three dimensional formulation and objectivity of a strain space multiple mechanism model for sand

Susumu Iai. Three dimensional formulation and objectivity of a strain space multiple mechanism model for sand. 1993.Soils and Foundations, 33-1, 192–199,https://doi.org/10.3208/sandf1972.33.192

work page doi:10.3208/sandf1972.33.192 1993

[6] [6]

Idriss, Ricardo Dobry, Ram D

Izzat M. Idriss, Ricardo Dobry, Ram D. Singh. Nonlinear Behavior of Soft Clays During Cyclic Loading. 1978.Journal of the Geotechnical Engineering Division, 104, 1427–1447,https://doi.org/10.1061/AJGEB6.0000727 15

work page doi:10.1061/ajgeb6.0000727 1978

[7] [7]

G. Masing. Eigenspannungen und Verfestigung beim Messing. 1926.Pro- ceedings of the 2nd International Congress of Applied Mechanics, 332–335

work page 1926

[8] [8]

Winget, Thomas J.R

James M. Winget, Thomas J.R. Hughes. Solution algorithms for nonlinear transient heat conduction analysis employing element-by-element iterative strategies. 1985.Computer Methods in Applied Mechanics and Engineering, 52, 711–815

work page 1985

[9] [9]

Physics-based urban earth- quake simulation enhanced by 10.7 BlnDOF x 30 K time-step unstructured FE non-linear seismic wave simulation

Tsuyoshi Ichimura, Kohei Fujita, Seizo Tanaka, Muneo Hori, Maddegedara Lalith, Yoshihisa Shizawa, Hiroshi Kobayashi. Physics-based urban earth- quake simulation enhanced by 10.7 BlnDOF x 30 K time-step unstructured FE non-linear seismic wave simulation. 2014.SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, St...

work page 2014

[10] [10]

A Hybrid Multiresolu- tion Meshing Technique for Finite Element Three-Dimensional Earthquake Ground Motion Modeling in Basins Including Topography

Tsuyoshi Ichimura, Muneo Hori, Jacobo Bielak. A Hybrid Multiresolu- tion Meshing Technique for Finite Element Three-Dimensional Earthquake Ground Motion Modeling in Basins Including Topography. 2009.Geophys- ical Journal International, 177, 1221–1232,https://doi.org/10.1111/ j.1365-246X.2009.04154.x

work page arXiv 2009

[11] [11]

Wijerathne, Seizo Tanaka, A Quick Earthquake Disaster Estimation System with Fast Urban Earthquake Simulation and Interactive Visualization

Kohei Fujita, Tsuyoshi Ichimura, Muneo Hori, M.L.L. Wijerathne, Seizo Tanaka, A Quick Earthquake Disaster Estimation System with Fast Urban Earthquake Simulation and Interactive Visualization. 2014.Procedia Com- puter Science, 29, 866–876,https://doi.org/10.1016/j.procs.2014.0 5.078

work page doi:10.1016/j.procs.2014.0 2014

[12] [12]

Japan Meteorological Agency,https://www.data.jma.go.jp/eqev/data /kyoshin/jishin/, (last Accessed 28 January 2026)

work page 2026

[13] [13]

Akiba, S

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama. Optuna: A Next-generation Hyperparameter Optimization Framework. 2019.Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, 2623–2631,https: //doi.org/10.1145/3292500.3330701 16

work page doi:10.1145/3292500.3330701 2019