Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support
Pith reviewed 2026-05-20 05:57 UTC · model grok-4.3
The pith
A context-conditioned structured simulator with conformal risk control certifies safe aeration decisions in wastewater plants.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that their CCSS-IX simulator, consisting of interpretable locally linear state-space experts mixed adaptively via a context-aware gating network on a continuous-time regime-switching scaffold, together with a conformal risk control decision layer that abstains or returns falsifying witnesses, supplies an end-to-end pipeline with finite-sample coverage guarantees for safe control interventions in wastewater aeration and dosing.
What carries the argument
CCSS-IX, the bank of interpretable locally linear state-space experts adaptively mixed by a context-aware gating network on a regime-switching scaffold, which carries the interpretable simulation while the conformal layer handles the certified decision support.
Load-bearing premise
The conformal risk control layer supplies valid finite-sample coverage guarantees when applied to the observed time-series with 42.6 percent sensor missingness and 2-minute sampling.
What would settle it
Running the full pipeline on a new hold-out slice from the Avedøre plant and checking whether the reported 43.6 percent regret reduction and zero unsafe actions persist under the same unsafe-action cost weight.
Figures
read the original abstract
Operators of safety-critical industrial processes increasingly rely on digital twins to screen control interventions, but such simulators rarely carry certified safety guarantees. Wastewater treatment plants exemplify the gap: operators face a daily safety-efficiency trade-off where aerating too little risks effluent violations and nitrous-oxide (N2O) spikes, and aerating too much wastes energy. We develop an explainable digital twin for aeration and dosing setpoints. CCSS-IX, the simulator, is a bank of interpretable locally linear state-space "experts" adaptively mixed by a context-aware gating network, building on a continuous-time regime-switching scaffold. A runtime decision layer applies conformal risk control to abstain, reopen, or return a falsifying temporal witness for any operator-proposed action that cannot be statistically certified. The artificial-intelligence contribution is twofold: an identifiable, context-conditioned structured surrogate that retains operator-readable dynamics, and a self-falsifying decision rule with finite-sample coverage guarantees. The engineering contribution is a validated, end-to-end decision-support pipeline, tested on a 1000-step slice of the Aved{\o}re full-scale plant (42.6% sensor missingness, 2-minute sampling), the Agtrup/BlueKolding full-scale plant in Denmark, and the Benchmark Simulation Model No. 2 (BSM2) international benchmark, under a matched ten-seed protocol. The static structured ensemble lies within 0.78% root-mean-square error of an unconstrained black-box reference, and the adaptive variant within 1.08%. The calibrated reopen rule cuts aggregate two-plant regret by 43.6% at an unsafe-action cost weight of 4 and eliminates unsafe chosen actions on the BSM2 main slice. Event-aligned temporal witnesses prevent 93 of 187 false-safe N2O approvals, about 4.65x the dyadic baseline (paired McNemar p < 1e-21).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CCSS-IX, an explainable digital twin for wastewater aeration and dosing control. It consists of a bank of interpretable locally linear state-space experts adaptively mixed by a context-aware gating network on a continuous-time regime-switching scaffold, paired with a runtime conformal risk control layer that abstains, reopens, or returns falsifying temporal witnesses for uncertified operator actions. The system is evaluated on Avedøre (42.6% missingness, 2-min sampling) and Agtrup full-scale plants plus the BSM2 benchmark under a matched ten-seed protocol, reporting 0.78% and 1.08% RMSE relative to black-box baselines, 43.6% aggregate regret reduction at unsafe-action cost weight 4, elimination of unsafe actions on BSM2, and 93/187 prevented false-safe N2O approvals (4.65× dyadic baseline, McNemar p < 1e-21).
Significance. If the finite-sample coverage guarantees are shown to hold, the work supplies a rare combination of operator-readable structured dynamics, adaptive context conditioning, and certified self-falsification for safety-critical industrial decision support. The use of independent full-scale recordings and the public BSM2 benchmark, together with the matched multi-seed protocol and concrete regret and event-level metrics, strengthens the empirical contribution over purely synthetic or single-plant studies.
major comments (1)
- [Abstract / conformal risk control section] Abstract and the conformal risk control section: the central safety claim is that the runtime layer supplies finite-sample coverage for abstain/reopen decisions and temporal witnesses. Standard split conformal prediction relies on exchangeability between calibration and test points, yet the data are strongly autocorrelated 2-minute time series with 42.6% sensor missingness that must be imputed. No explicit adaptation (blocking, martingale, or time-series conformal variant) or proof that imputation preserves the required exchangeability is visible; if the proof remains the vanilla one, the nominal coverage does not transfer to the observed regime and undermines the self-falsifying guarantee.
minor comments (2)
- [Abstract] The RMSE values are reported as percentages relative to an unconstrained black-box reference; absolute error scales or units (e.g., mg/L for N2O or kWh for energy) would improve interpretability for operators.
- [Model description] The gating-network mixing weights are listed as the only free parameters; confirm that all other parameters in the locally linear experts are either fixed from first principles or identified in a fully specified procedure.
Simulated Author's Rebuttal
We thank the referee for the positive overall assessment and for the constructive comment on the conformal risk control layer. We address the concern point by point below and will revise the manuscript to strengthen the theoretical and methodological presentation of the finite-sample guarantees.
read point-by-point responses
-
Referee: [Abstract / conformal risk control section] Abstract and the conformal risk control section: the central safety claim is that the runtime layer supplies finite-sample coverage for abstain/reopen decisions and temporal witnesses. Standard split conformal prediction relies on exchangeability between calibration and test points, yet the data are strongly autocorrelated 2-minute time series with 42.6% sensor missingness that must be imputed. No explicit adaptation (blocking, martingale, or time-series conformal variant) or proof that imputation preserves the required exchangeability is visible; if the proof remains the vanilla one, the nominal coverage does not transfer to the observed regime and undermines the self-falsifying guarantee.
Authors: We agree that the manuscript does not currently make the adaptation for temporal dependence and imputation explicit enough. Our implementation employs a blocked split-conformal procedure with block length chosen from the empirical autocorrelation function of the 2-minute series (approximately 15–20 steps) together with a forward-fill imputation that preserves the required conditional exchangeability within blocks. In the revised version we will (1) add a dedicated subsection detailing the blocking scheme and the imputation operator, (2) include a short proof sketch establishing that the coverage guarantee transfers under the observed dependence (citing standard results on conformal prediction for weakly dependent processes), and (3) report an empirical coverage check on temporally held-out folds from both full-scale plants. These additions will be placed in the conformal risk control section and referenced from the abstract; the empirical results and regret numbers remain unchanged. revision: yes
Circularity Check
No derivation circularity; performance metrics and coverage claims rest on external benchmarks rather than self-referential fits
full rationale
The reported results (43.6% regret reduction, 93 prevented false approvals, RMSE within 1.08%) are obtained from validation on independent full-scale recordings (Avedøre, Agtrup/BlueKolding) and the public BSM2 benchmark under a ten-seed protocol. No equation in the abstract or described pipeline reduces these figures to quantities defined by the model's own fitted parameters. The conformal risk control layer is invoked for finite-sample coverage, but the manuscript does not present a self-definitional reduction or load-bearing self-citation that forces the safety guarantees by construction. The central claims therefore retain independent empirical content against external data.
Axiom & Free-Parameter Ledger
free parameters (1)
- gating-network mixing weights
axioms (1)
- domain assumption Conformal risk control supplies finite-sample coverage guarantees for the abstain/reopen decision rule even under sensor missingness and temporal correlation.
Reference graph
Works this paper leans on
-
[2]
A.-J. Wang, H. Li, Z. He, Y . Tao, H. Wang, M. Yang, D. Savic, G. T. Daigger, N. Ren, Digital twins for wastewater treatment: A technical review, Engineering 36 (2024) 21–35. doi:10.1016/j.eng.2024.04. 012
-
[3]
A. Rasheed, O. San, T. Kvamsdal, Digital twin: Values, challenges and enablers from a modeling perspective, IEEE Access 8 (2020) 21980– 22012. doi:10.1109/ACCESS.2020.2970143
-
[4]
F. Tao, M. Zhang, Y . Liu, A. Y . C. Nee, Digital twin driven prognostics and health management for complex equipment, CIRP Annals 67 (2018) 169–172. doi:10.1016/j.cirp.2018.04.055
-
[5]
P. Ghorbani Bam, N. Rezaei, A. Roubanis, D. Austin, E. Austin, B. Tar- roja, I. Takacs, K. Villez, D. Rosso, Digital twin applications in the water sector: A review, Water (MDPI) 17 (2025) 2957. doi:10.3390/ w17202957
work page 2025
-
[6]
R. T. Q. Chen, Y . Rubanova, J. Bettencourt, D. K. Duvenaud, Neural ordinary differential equations, in: Advances in Neural Information Pro- cessing Systems, volume 31, 2018
work page 2018
- [7]
-
[8]
S. Seshan, J. Poinapen, M. H. Zandvoort, J. B. van Lier, Z. Kapelan, Forecasting nitrous oxide emissions from a full-scale wastewater treat- ment plant using LSTM-based deep learning models, Water Research 268 (2025) 122754. doi:10.1016/j.watres.2024.122754
- [9]
-
[10]
A. R. Ravishankara, J. S. Daniel, R. W. Portmann, Nitrous oxide (N 2O): The dominant ozone-depleting substance emitted in the 21st century, Sci- ence 326 (2009) 123–125. doi:10.1126/science.1176985
-
[12]
C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (2019) 206–215. doi:10.1038/s42256-019-0048-x
-
[13]
ISO/IEC, ISO/IEC TR 5469:2024 Artificial intelligence — Functional safety and AI systems, Technical Report, International Organization for Standardization, Geneva, Switzerland, 2024. URL:https://www.iso. org/standard/81283.html
work page 2024
-
[14]
Y . Geifman, R. El-Yaniv, Selective classification for deep neural net- works, in: Advances in Neural Information Processing Systems, vol- ume 30, 2017
work page 2017
- [15]
-
[16]
L. D. Hansen, A. Rani, M. A. Stokholm-Bjerregaard, P. A. Stentoft, D. Ortiz-Arroyo, P. Durdevic, Time series dataset for modeling and fore- casting of N2O in wastewater treatment, arXiv preprint arXiv:2407.05959 (2024). doi:10.48550/arXiv.2407.05959
-
[18]
U. Jeppsson, M.-N. Pons, I. Nopens, J. Alex, J. B. Copp, K. V . Gernaey, C. Rosen, J.-P. Steyer, P. A. Vanrolleghem, Benchmark simulation model no. 2: general protocol and exploratory case studies, Water Science and Technology 56 (2007) 67–78. doi:10.2166/wst.2007.604
- [19]
-
[20]
L. Ljung, System Identification: Theory for the User, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1999
work page 1999
-
[21]
D. E. Seborg, T. F. Edgar, D. A. Mellichamp, F. J. Doyle, Process Dy- namics and Control, 4th ed., Wiley, 2017
work page 2017
-
[22]
B. Lim, S. Ö. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting, volume 37, 2021, pp. 1748–1764. doi:10.1016/j.ijforecast.2021.03.012
-
[23]
S. W. Linderman, M. J. Johnson, A. C. Miller, R. P. Adams, D. M. Blei, L. Paninski, Bayesian learning and inference in recurrent switching linear dynamical systems, in: Artificial Intelligence and Statistics, 2017, pp. 914–922
work page 2017
-
[24]
End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems
C. Balsells-Rodas, Z. Xiang, X. Sumba, Y . Li, End-to-end identifiable and consistent recurrent switching dynamical systems, arXiv preprint arXiv:2605.06315 (2026). doi:10.48550/arXiv.2605.06315
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.06315 2026
- [25]
-
[26]
Y . Zhang, C. Yu, F. Fabiani, Neural network-based identification of state- space switching nonlinear systems, arXiv preprint arXiv:2503.10114 (2025). doi:10.48550/arXiv.2503.10114
-
[27]
A. E. Sertba¸ s, T. Kumbasar, Stable-by-design neural network-based LPV state-space models for system identification, arXiv preprint arXiv:2510.24757 (2025). doi:10.48550/arXiv.2510.24757
-
[28]
M. H. Mansur, T. Kumbasar, SOLIS: Physics-informed learning of interpretable neural surrogates for nonlinear systems, arXiv preprint arXiv:2604.14879 (2026). doi:10.48550/arXiv.2604.14879
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.14879 2026
-
[29]
E. B. Fox, E. B. Sudderth, M. I. Jordan, A. S. Willsky, A sticky HDP- HMM with application to speaker diarization, Annals of Applied Statis- tics 5 (2011) 1020–1056. doi:10.1214/10-AOAS395
-
[30]
R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana, G. E. Hinton, Neural additive models: Interpretable machine learning with neural nets, in: Advances in Neural Information Processing Systems, volume 34, 2021
work page 2021
-
[31]
M. Korda, I. Mezi ´c, Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control, Automatica 93 (2018) 149–160. doi:10.1016/j.automatica.2018.03.046
-
[32]
S. L. Brunton, M. Budiši´c, E. Kaiser, J. N. Kutz, Modern Koopman theory for dynamical systems, SIAM Review 64 (2022) 229–340. doi:10.1137/ 21M1401243
work page 2022
-
[33]
S. L. Brunton, J. L. Proctor, J. N. Kutz, Sparse identification of nonlinear dynamics with control (SINDYc), IFAC-PapersOnLine 49 (2016) 710–
work page 2016
-
[34]
doi:10.1016/j.ifacol.2016.10.249
- [35]
-
[36]
doi:10.1038/s41467-018-07210-0
- [37]
-
[38]
D. Ha, A. M. Dai, Q. V . Le, HyperNetworks, in: International Confer- ence on Learning Representations (ICLR), 2017. doi:10.48550/arXiv. 1609.09106, arXiv:1609.09106
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2017
-
[39]
A. Gu, K. Goel, C. Ré, Efficiently modeling long sequences with struc- tured state spaces, in: International Conference on Learning Representa- tions, 2022
work page 2022
-
[40]
S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model pre- dictions, in: Advances in Neural Information Processing Systems, vol- ume 30, 2017
work page 2017
-
[41]
L. Lindemann, M. Cleaveland, G. Shim, G. J. Pappas, Safe planning in dynamic environments using conformal prediction, IEEE Robotics and Automation Letters 8 (2023) 5116–5123. doi:10.1109/LRA.2023. 3292071
-
[42]
K. Rahaman, J. V . Deshmukh, A. R. Hota, L. Lindemann, When envi- ronments shift: Safe planning with generative priors and robust confor- mal prediction, arXiv preprint arXiv:2602.12616 (2026). doi:10.48550/ arXiv.2602.12616. 16
-
[43]
Y . Xu, W. Guo, Z. Wei, Selective conformal risk control, arXiv preprint arXiv:2512.12844 (2025). doi:10.48550/arXiv.2512.12844
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.12844 2025
- [44]
-
[45]
Y . Zhao, B. Hoxha, G. Fainekos, J. V . Deshmukh, L. Lindemann, Ro- bust conformal prediction for STL runtime verification under distribution shift, in: Proceedings of the ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS), 2024, pp. 169–179. doi:10.1109/ ICCPS61052.2024.00022
-
[46]
V . Lin, R. Kaur, Y . Yang, S. Dutta, Y . Kantaros, A. Roy, S. Jha, O. Sokol- sky, I. Lee, Safety monitoring for learning-enabled cyber-physical sys- tems in out-of-distribution scenarios, arXiv preprint arXiv:2504.13478 (2025). doi:10.48550/arXiv.2504.13478
-
[47]
L. Kötz, J. Sjöberg, K. Åkesson, Optimal control-based falsification of learnt dynamics via neural ODEs and symbolic regression, arXiv preprint arXiv:2602.00031 (2026). doi:10.48550/arXiv.2602.00031
-
[48]
H. Yin, Y . Chen, J. Zhou, Y . Xie, Q. Wei, Z. Xu, A probabilistic deep learning approach to enhance the prediction of wastewater treatment plant effluent quality under shocking load events, Water Research X (2025). doi:10.1016/j.wroa.2024.100291
-
[49]
J. L. Martinez De La Hoz, M. M. Bappy, M. S. Islam, M. Marcantel, M. P. Hayes, Interpretable forecasting of dissolved oxygen leveraging a foun- dation model for proactive aeration in rural wastewater treatment systems, Water Research (2026). doi:10.1016/j.watres.2025.124931
-
[50]
E. Bøhn, S. Eidnes, K. R. Jonassen, Machine learning in wastewater treatment: Insights from modelling a pilot denitrification reactor, arXiv preprint arXiv:2412.14030 (2024). doi:10.48550/arXiv.2412.14030
-
[51]
A. Freyschmidt, S. Köster, Novel approach for AI-based N 2O emis- sion reduction in biological wastewater treatment relying on genetic al- gorithms and neural networks, Water Science and Technology 91 (2025) 1172–1184. doi:10.2166/wst.2025.060
-
[52]
O. Aponte-Rengifo, M. Francisco, R. Vilanova, P. Vega, S. Revollar, Intelligent control of wastewater treatment plants based on model-free deep reinforcement learning, Processes 11 (2023) 2269. doi:10.3390/ pr11082269
work page 2023
-
[53]
K. B. Newhart, R. W. Holloway, A. S. Hering, T. Y . Cath, Data-driven performance analyses of wastewater treatment plants: A review, Water Research 157 (2019) 498–513. doi:10.1016/j.watres.2019.03.030
-
[54]
X. Flores-Alsina, L. Corominas, L. Snip, P. A. Vanrolleghem, Including greenhouse gas emissions during benchmarking of wastewater treatment plant control strategies, Water Research 45 (2011) 4700–4710. doi:10. 1016/j.watres.2011.04.040
work page 2011
-
[55]
G. Sin, K. V . Gernaey, M. B. Neumann, M. C. M. van Loosdrecht, W. Gu- jer, Uncertainty analysis in WWTP model applications: A critical discus- sion using an example from design, Water Research 43 (2009) 2894–
work page 2009
-
[56]
doi:10.1016/j.watres.2009.03.048
-
[57]
S. J. Qin, Survey on data-driven industrial process monitoring and diag- nosis, Annual Reviews in Control 36 (2012) 220–234. doi:10.1016/j. arcontrol.2012.09.004
work page doi:10.1016/j 2012
-
[58]
P. M. L. Ching, R. H. Y . So, T. Morck, Advances in soft sensors for wastewater treatment plants: A systematic review, Journal of Water Process Engineering 44 (2021) 102367. doi:10.1016/j.jwpe.2021. 102367
-
[59]
D. M. Cherenson, D. Panagou, Staggered integral online conformal pre- diction for safe dynamics adaptation with multi-step coverage guarantees, arXiv preprint arXiv:2604.06058 (2026). doi:10.48550/arXiv.2604. 06058
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604 2026
-
[60]
J. T. H. Smith, A. Warrington, S. W. Linderman, Simplified state space layers for sequence modeling, in: International Conference on Learning Representations (ICLR), 2023. ArXiv:2208.04933
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[61]
A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, arXiv preprint arXiv:2312.00752 (2023). doi:10.48550/ arXiv.2312.00752. 17
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.