Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

Environmental Engineering; Heejung Youn (Department of Civil; Hongik University; Jihoon Kim; Republic of Korea); Seoul

arxiv: 2603.10453 · v2 · pith:VS6QT3NUnew · submitted 2026-03-11 · 💻 cs.LG

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

Jihoon Kim , Heejung Youn (Department of Civil , Environmental Engineering , Hongik University , Seoul , Republic of Korea) This is my paper

Pith reviewed 2026-05-25 06:45 UTC · model grok-4.3

classification 💻 cs.LG

keywords ConvLSTMensemble learningretaining wall deformationmulti-resolution forecastingerror accumulationgeotechnical time seriesPLAXIS simulationstacking meta-learner

0 comments

The pith

A stacking ensemble of ConvLSTM models at multiple input resolutions reduces error buildup in long-horizon forecasts of retaining wall deformation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that training separate ConvLSTM networks on time series at different temporal resolutions and then combining their outputs with a neural-network meta-learner produces more stable multi-step predictions than any single model. The demonstration rests on 2,000 simulated deflection profiles generated under varied soil and excavation conditions, plus checks against real field data. A reader would care because retaining-wall monitoring during staged excavation is safety-critical and because error growth has historically limited how far ahead machine-learning forecasts can be trusted. The central mechanism is the deliberate use of scale diversity to counteract the compounding inaccuracies that appear when a model is rolled out step by step.

Core claim

Three ConvLSTM models, each fed input sequences at a distinct temporal resolution, are stacked through a fully connected neural network that learns to weight their predictions; the resulting ensemble yields lower cumulative error and better generalization than any of the component models when forecasting lateral wall displacements over many excavation stages, as measured on both the 2,000 PLAXIS2D profiles and on independent field records.

What carries the argument

Multi-resolution ConvLSTM stacking ensemble: three networks trained on different input time scales whose outputs are fused by a fully connected meta-learner to produce the final displacement forecast.

If this is right

Long-term multi-step forecasts of wall movement become more reliable because the ensemble limits the propagation of per-step errors.
Generalization improves when the same ensemble is tested on both simulated profiles and actual field measurements.
Predictive stability in geotechnical time-series tasks rises when models jointly exploit input sequences at several temporal scales.
The stacking approach offers a concrete way to combine complementary temporal views without increasing model complexity inside any single network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-resolution stacking idea could be tested on other civil-engineering monitoring tasks where sensor data arrive at irregular intervals.
If the simulation-to-reality gap proves small, the framework might support real-time decision support during excavation by flagging when predicted displacements approach serviceability limits.
Extending the ensemble to include spatial resolution diversity alongside temporal diversity might further reduce error in three-dimensional deformation fields.

Load-bearing premise

The 2,000 simulated deflection profiles, generated by varying geotechnical parameters inside PLAXIS2D, are representative enough of actual soil-structure interactions that performance gains observed on them will carry over to real excavation sites.

What would settle it

On a new collection of field measurements from staged excavations not used in training or validation, the ensemble shows no statistically significant reduction in root-mean-square error relative to the best single-resolution ConvLSTM after twenty or more prediction steps.

read the original abstract

This study proposes a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) ensemble framework that leverages diverse temporal input resolutions to mitigate error accumulation and improve long-horizon forecasting of retaining-structure behavior during staged excavation. An extensive database of lateral wall displacement responses was generated through PLAXIS2D simulations incorporating five-layered soil stratigraphy, two excavation depths (14 and 20 m), and stochastically varied geotechnical and structural parameters, yielding 2,000 time-series deflection profiles. Three ConvLSTM models trained at different input resolutions were integrated using a fully connected neural network meta-learner to construct the ensemble model. Validation using both numerical results and field measurements demonstrated that the ensemble approach consistently outperformed the standalone ConvLSTM models, particularly in long-term multi-step prediction, exhibiting reduced error propagation and improved generalization. These findings underscore the potential of multi-resolution ensemble strategies that jointly exploit diverse temporal input scales to enhance predictive stability and accuracy in AI-driven geotechnical forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a multi-resolution ConvLSTM ensemble to retaining wall deformation forecasting and reports gains on simulated plus field data, but the abstract supplies no numbers and the synthetic data coverage is unproven.

read the letter

The core contribution is taking an existing ConvLSTM architecture, training separate models on different temporal input resolutions, and stacking them with a neural meta-learner to reduce error growth in long-horizon predictions of excavation-induced wall movement. They built a database of 2,000 time-series from PLAXIS2D runs that vary soil parameters across five layers and two excavation depths, then tested the ensemble against held-out simulations and some field measurements. The claim is that the stacked model beats the single-resolution versions, especially when forecasting many steps ahead. That is a straightforward domain extension rather than a new algorithm. The use of both numerical and real measurements is a positive step for an applied forecasting paper. The multi-resolution idea itself makes sense for capturing different time scales in staged excavation sequences. The main weaknesses sit in the missing details. No error values, no ablation tables, and no description of how the three resolutions were picked or how the meta-learner was trained appear in the abstract. Without those, it is hard to judge whether the reported improvement is stable or just the result of extra tuning. The bigger open question is whether the stochastic sampling in the 2,000 PLAXIS2D profiles actually spans the joint distributions and three-dimensional effects present in the field cases. If the synthetic distribution is narrower, strong field performance could reflect overlap rather than genuine robustness to error accumulation. This work is aimed at researchers who apply deep learning to geotechnical monitoring. A reader already working on time-series models for physical systems could extract the ensemble construction and test it elsewhere, but only after the quantitative gaps are filled. The paper is coherent enough on its own terms to deserve referee time; the authors should be asked to supply the metrics, ablation results, and sampling diagnostics before any stronger claims are accepted.

Referee Report

2 major / 1 minor

Summary. The paper proposes a multi-resolution ConvLSTM stacking ensemble for long-horizon forecasting of retaining wall lateral displacements during staged excavation. It generates an extensive database of 2,000 time-series profiles via PLAXIS2D simulations of five-layer soil stratigraphy with stochastically varied geotechnical and structural parameters (two excavation depths), trains three ConvLSTM models at different temporal input resolutions, and combines them via a fully connected neural network meta-learner. The central claim is that this ensemble consistently outperforms the individual ConvLSTM models on both held-out simulated cases and external field measurements, with particular gains in long-term multi-step prediction due to reduced error accumulation and improved generalization.

Significance. If the performance gains and generalization hold under proper validation, the work would demonstrate a practical way to leverage multi-scale temporal inputs for more stable spatio-temporal forecasting in geotechnical applications. The use of both large-scale synthetic data generation and real field measurements for validation is a constructive element that strengthens the applied relevance, though the overall impact hinges on whether the synthetic ensemble truly spans the variability encountered in practice.

major comments (2)

[Abstract] Abstract: the claim that the ensemble 'consistently outperformed the standalone ConvLSTM models, particularly in long-term multi-step prediction, exhibiting reduced error propagation' is presented without any quantitative error metrics (RMSE, MAE, or similar), ablation results comparing ensemble vs. single-resolution models, or details on how the three input resolutions were chosen and how the meta-learner was trained. These omissions make it impossible to assess the magnitude or statistical significance of the reported improvement.
[Abstract] Abstract: the generalization claim to field measurements rests on the unexamined assumption that the 2,000 PLAXIS2D profiles (five-layer 2D model, stochastic variation of geotechnical parameters, 14 m and 20 m excavations) adequately represent the range of real-world soil-structure interactions and excavation conditions. No coverage diagnostics, parameter-distribution comparisons between the simulated ensemble and the field cases, or discussion of 3D effects are provided; this coverage gap is load-bearing for the central outperformance and reduced-error-accumulation claim.

minor comments (1)

[Abstract] Abstract: the description of the meta-learner as a 'fully connected neural network' is too vague; the full manuscript should specify its architecture, training procedure, and loss function.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments on the abstract and the generalization claims. Both points identify areas where the manuscript can be strengthened with additional quantitative detail and explicit diagnostics. We will revise the abstract and add supporting material as described below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the ensemble 'consistently outperformed the standalone ConvLSTM models, particularly in long-term multi-step prediction, exhibiting reduced error propagation' is presented without any quantitative error metrics (RMSE, MAE, or similar), ablation results comparing ensemble vs. single-resolution models, or details on how the three input resolutions were chosen and how the meta-learner was trained. These omissions make it impossible to assess the magnitude or statistical significance of the reported improvement.

Authors: We agree that the abstract should contain the key quantitative results and methodological details needed to evaluate the central claims. The body of the manuscript already reports RMSE and MAE values for the ensemble versus the three individual ConvLSTM models (Section 4.2), ablation experiments isolating the contribution of each resolution (Section 5.1), the specific input resolutions (daily, 3-day, and 7-day windows chosen to match common excavation monitoring intervals), and the training procedure for the fully-connected meta-learner (Section 3.3). In the revised manuscript we will condense these results into the abstract, adding representative error metrics and a one-sentence description of the resolution selection and meta-learner. revision: yes
Referee: [Abstract] Abstract: the generalization claim to field measurements rests on the unexamined assumption that the 2,000 PLAXIS2D profiles (five-layer 2D model, stochastic variation of geotechnical parameters, 14 m and 20 m excavations) adequately represent the range of real-world soil-structure interactions and excavation conditions. No coverage diagnostics, parameter-distribution comparisons between the simulated ensemble and the field cases, or discussion of 3D effects are provided; this coverage gap is load-bearing for the central outperformance and reduced-error-accumulation claim.

Authors: We acknowledge that the current manuscript does not include explicit coverage diagnostics or side-by-side parameter-distribution plots. The 2,000 simulations were generated by sampling geotechnical parameters from literature-derived distributions for a five-layer profile, and the two field cases (one 14 m and one 20 m excavation) fall inside those ranges; however, this is stated only qualitatively. We will add a new subsection (Section 2.4) that (i) tabulates the min/max/mean parameter values in the synthetic ensemble versus the field sites, (ii) reports coverage metrics (e.g., percentage of field parameters within the simulated 5th–95th percentiles), and (iii) briefly discusses the 2D modeling assumption and its known limitations for long walls, while noting that the 2D plane-strain idealization remains standard practice for such structures. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical validation is independent of fitted inputs

full rationale

The paper generates 2000 synthetic time-series via PLAXIS2D with stochastically varied parameters, trains three ConvLSTM models at different resolutions, and combines them via a meta-learner. Performance is then measured on held-out numerical cases and separate field measurements. No equations, self-citations, or ansatzes are invoked that would make the reported error-reduction or generalization equivalent to quantities defined inside the training loop by construction. The coverage of the synthetic distribution for real-world conditions is an external modeling assumption, not a definitional reduction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the fidelity of the simulation database and the ability of the meta-learner to combine resolution-specific predictions without introducing new error sources; these rest on domain assumptions about numerical modeling rather than new mathematical derivations.

free parameters (2)

temporal input resolutions
Three distinct resolutions selected for the ConvLSTM models; exact values and selection criterion not stated in abstract.
meta-learner hyperparameters
Architecture and training details of the fully connected neural network meta-learner are unspecified.

axioms (1)

domain assumption PLAXIS2D finite-element simulations with stochastic parameter variation produce deflection profiles representative of real staged-excavation behavior
The entire training and validation database is generated from these simulations; field measurements are mentioned only for final validation.

pith-pipeline@v0.9.0 · 5719 in / 1287 out tokens · 49061 ms · 2026-05-25T06:45:44.042630+00:00 · methodology

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)