Beyond Uniform Sampling: Synergistic Active Learning and Input Denoising for Robust Neural Operators

Samrendra Roy; Souvik Chakraborty; Syed Bahauddin Alam

arxiv: 2604.13316 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Beyond Uniform Sampling: Synergistic Active Learning and Input Denoising for Robust Neural Operators

Samrendra Roy , Souvik Chakraborty , Syed Bahauddin Alam This is my paper

Pith reviewed 2026-05-10 15:14 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords neural operatorsadversarial robustnessactive learninginput denoisingBurgers equationrobust machine learningdifferential evolutionphysics simulations

0 comments

The pith

Combining active learning to target weaknesses with input denoising reduces adversarial error in neural operators by 87% on the Burgers' benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that neural operators can be protected against adversarial perturbations, which threaten their use in safety-critical physics simulations, by pairing active learning that generates data at discovered weak points with a denoising architecture that filters noise. This approach is tested on the viscous Burgers' equation where it achieves a combined error of 2.04%, far below the 15.42% of standard training and better than using either technique separately. The authors conclude that because different neural operator architectures are vulnerable in different input regions, uniform sampling of training data fails to cover all risks, so targeted methods are needed. If correct, this points to more reliable digital twins for applications like energy system monitoring.

Core claim

The paper's central claim is that the synergy of active learning, which uses differential evolution to identify vulnerability locations and generates targeted training data with an adaptive safeguard, and an input denoising layer with a learnable bottleneck that removes adversarial noise while preserving physics features, delivers superior robustness. On the viscous Burgers' equation, this yields 2.04% combined error versus 15.42% for standard training, with an 87% reduction, outperforming active learning alone at 3.42% and denoising alone at 5.22%. It further claims that optimal training data depends on the architecture due to distinct sensitivity subspaces.

What carries the argument

The synergistic defense mechanism consisting of active learning-based targeted data generation using differential evolution attacks together with a learnable bottleneck denoising architecture that filters adversarial perturbations.

If this is right

The combined defense achieves substantially lower error than either component alone or baseline training on the benchmark.
Optimal training data for neural operators varies by architecture because sensitivities occupy different input subspaces.
Uniform sampling of training data is inadequate for covering vulnerability landscapes across models.
Such methods could improve reliability of neural operators in safety-critical deployments like nuclear reactor monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the results hold, similar combinations of targeted sampling and denoising could be tested on other partial differential equations to check broader applicability.
The architecture-dependence of vulnerabilities implies that training strategies may need to be customized for each neural operator type rather than using generic approaches.
Future work might explore whether the denoising bottleneck maintains performance on clean inputs for a wider range of models and data distributions.
A potential extension is to apply differential evolution attacks to map vulnerability landscapes more completely for various architectures.

Load-bearing premise

The premise that differential evolution attacks can locate representative vulnerabilities and that the learnable bottleneck removes noise without discarding key physics information from the input.

What would settle it

If applying the method to a different equation such as the heat equation or wave equation or to a different neural operator architecture results in combined error reduction below 50% or if clean data accuracy drops noticeably, that would challenge the central claim.

Figures

Figures reproduced from arXiv: 2604.13316 by Samrendra Roy, Souvik Chakraborty, Syed Bahauddin Alam.

**Figure 2.** Figure 2: Overview of our defense strategy. Standard training produces models that fail catastroph [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Active learning loop with adaptive baseline safeguards. The current model is probed [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Input Denoising DeepONet architecture. A learnable autoencoder bottleneck ( [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Baseline and robustness errors across defense strategies. The combined approach (AL + [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy–robustness trade-off across defense strategies. The green shaded region marks the [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Neural operators have emerged as fast surrogate models for physics simulations, yet they remain acutely vulnerable to adversarial perturbations, a critical liability for safety-critical digital twin deployments. We present a synergistic defense that combines active learning-based data generation with an input denoising architecture. The active learning component adaptively probes model weaknesses using differential evolution attacks, then generates targeted training data at discovered vulnerability locations while an adaptive smooth-ratio safeguard preserves baseline accuracy. The input denoising component augments the operator architecture with a learnable bottleneck that filters adversarial noise while retaining physics-relevant features. On the viscous Burgers' equation benchmark, the combined approach achieves a 2.04% combined error (1.21% baseline + 0.83% robustness), representing an 87% reduction relative to standard training (15.42% combined) and outperforming both active learning alone (3.42%) and input denoising alone (5.22%). More broadly, our results, combined with cross-architecture vulnerability analysis from prior work, suggest that optimal training data for neural operators is architecture-dependent: because different architectures concentrate sensitivity in distinct input subspaces, uniform sampling cannot adequately cover the vulnerability landscape of all models. These findings have potential implications for the deployment of neural operators in safety-critical energy systems including nuclear reactor monitoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs differential evolution active learning with a learnable input bottleneck to cut combined error on viscous Burgers' from 15% to 2%, but the single-benchmark scope and missing checks on attack representativeness and feature retention leave the safety-critical claims tentative.

read the letter

The main thing to know is that this paper gets a large error reduction on the viscous Burgers' equation by pairing differential evolution-based active learning for targeted data with a learnable bottleneck for input denoising, reaching 2% combined error versus 15% for standard training. The new piece is the specific way they make these two defenses work together, including the smooth-ratio safeguard to protect accuracy on clean data. They also highlight that optimal sampling depends on the architecture because sensitivities sit in different places for different models. The quantitative comparisons to the individual components are straightforward and show the synergy on this benchmark. The setup is practical for the digital twin use case they mention. The idea of adaptively probing weaknesses and then cleaning inputs at the architecture level makes sense as a defense strategy. Where it falls short is the narrow scope. All the results are on a single equation with no reported statistics across runs or variations in hyperparameters. The abstract does not include any analysis showing that the differential evolution points capture typical vulnerabilities or that the denoising layer preserves the relevant physical features rather than attenuating them. The architecture-dependent conclusion comes from prior work without fresh experiments here to back the generalization. Readers focused on robust surrogates for physics-based modeling in energy systems would get the most out of this. It gives a concrete recipe to test on their own problems. The paper is coherent and engages with the robustness literature, so it merits a full review. I would send it to referees to see if the methods hold up under closer inspection and whether more benchmarks can be added.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a synergistic defense for neural operators against adversarial perturbations, combining active learning (using differential evolution attacks to generate targeted training data at vulnerability locations, with an adaptive smooth-ratio safeguard) and an input denoising architecture (augmenting the operator with a learnable bottleneck to filter noise while retaining physics features). It reports an 87% error reduction on the viscous Burgers' equation benchmark, achieving 2.04% combined error versus 15.42% for standard training, and outperforming the individual components (3.42% and 5.22%). The work further suggests that optimal training data is architecture-dependent due to varying sensitivity subspaces.

Significance. If the results hold under expanded validation, the quantitative demonstration of synergy between active learning and denoising could meaningfully advance robust neural operators for safety-critical uses such as digital twins in energy systems. The clear benchmark comparisons and the architecture-dependence hypothesis (drawing on prior cross-architecture analysis) represent concrete contributions, though the single-benchmark scope currently constrains broader impact.

major comments (3)

[Abstract] Abstract: the headline 87% reduction claim (2.04% combined error vs. 15.42% baseline) and outperformance over ablations are presented without any description of experimental setup details, number of runs, error bars, or controls for hyperparameter choices and confounding factors; these omissions are load-bearing because they prevent assessment of whether the reported synergy is reproducible or robust.
[Abstract] Abstract and results on viscous Burgers' benchmark: the central synergy claim depends on the unverified assumptions that differential evolution attacks locate representative (non-attack-specific) vulnerabilities and that the learnable bottleneck filters only adversarial noise without attenuating physics-relevant features; no spectral analysis, residual checks, or alternative attack comparisons are provided to substantiate these.
[Abstract] Abstract: the broader implication that 'optimal training data for neural operators is architecture-dependent' and the safety-critical deployment claims rest on cross-architecture analysis imported from prior work, yet the manuscript presents no new multi-equation, multi-architecture, or generalization experiments to support extension beyond the single Burgers' case.

minor comments (1)

[Abstract] Abstract: the term 'adaptive smooth-ratio safeguard' is introduced without a concise definition or reference, which reduces immediate clarity for readers unfamiliar with the method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments point by point below, proposing revisions where appropriate to enhance the clarity and robustness of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline 87% reduction claim (2.04% combined error vs. 15.42% baseline) and outperformance over ablations are presented without any description of experimental setup details, number of runs, error bars, or controls for hyperparameter choices and confounding factors; these omissions are load-bearing because they prevent assessment of whether the reported synergy is reproducible or robust.

Authors: We agree that including more details in the abstract would improve transparency. The full paper details the experimental protocol in Section 4.1, including averaging over 5 runs with standard error bars, hyperparameter selection via grid search on validation data, and controls for data generation seeds. We will revise the abstract to concisely note that 'results are averaged over 5 runs with reported standard deviations' to address reproducibility concerns without exceeding length limits. revision: yes
Referee: [Abstract] Abstract and results on viscous Burgers' benchmark: the central synergy claim depends on the unverified assumptions that differential evolution attacks locate representative (non-attack-specific) vulnerabilities and that the learnable bottleneck filters only adversarial noise without attenuating physics-relevant features; no spectral analysis, residual checks, or alternative attack comparisons are provided to substantiate these.

Authors: The differential evolution attack is employed as a standard black-box optimization method to identify vulnerabilities without relying on gradients, which is appropriate for neural operators. We provide empirical evidence of synergy through the error reductions, but acknowledge the lack of explicit verification. In the revision, we will add a subsection with spectral analysis of the denoised inputs to show preservation of low-frequency components corresponding to physics features, and include comparisons with PGD attacks in the appendix to demonstrate that vulnerability locations are consistent across attack types. This supports the assumptions while keeping the main focus. revision: partial
Referee: [Abstract] Abstract: the broader implication that 'optimal training data for neural operators is architecture-dependent' and the safety-critical deployment claims rest on cross-architecture analysis imported from prior work, yet the manuscript presents no new multi-equation, multi-architecture, or generalization experiments to support extension beyond the single Burgers' case.

Authors: Our work is scoped to demonstrating the synergistic benefits on the viscous Burgers' equation benchmark, which is a standard test case for nonlinear PDE operators. The suggestion of architecture-dependent optimal data draws directly from the cited prior cross-architecture analysis, and we do not present new multi-architecture experiments here. We will revise the abstract and conclusion to clarify that the architecture-dependence is hypothesized based on prior findings, and the safety-critical implications are potential rather than demonstrated. This avoids overgeneralization while highlighting the contribution of the synergy on the given benchmark. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical performance claims or method

full rationale

The paper reports an empirical study of a combined active learning and input-denoising defense for neural operators, with the headline result (2.04% combined error on viscous Burgers', 87% reduction vs. 15.42% baseline) obtained directly from benchmark experiments rather than any derivation that reduces to fitted parameters or self-referential definitions. The broader suggestion that optimal training data is architecture-dependent is presented as an inference combining the present results with prior cross-architecture analysis; this inference is not load-bearing for the primary quantitative claims, which remain independently verifiable through the described differential-evolution attack procedure, adaptive data generation, and learnable bottleneck architecture. No self-definitional loops, fitted-input predictions, uniqueness theorems imported from the same authors, or ansatzes smuggled via citation appear in the abstract or claimed results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard components like differential evolution optimization and learnable layers without introducing new postulated objects.

pith-pipeline@v0.9.0 · 5531 in / 1150 out tokens · 69918 ms · 2026-05-10T15:14:46.931169+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Adesoji, A. D. and Chen, P.-Y. Evaluating the adversarial robustness for F ourier neural operators. arXiv preprint arXiv:2204.04259, 2022

work page arXiv 2022
[2]

Certified adversarial robustness via randomized smoothing

Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310--1320, 2019

work page 2019
[3]

Gouk, H., Frank, E., Pfahringer, B., and Cree, M. J. Regularisation of neural networks by enforcing L ipschitz continuity. Machine Learning, 110(2):393--416, 2021

work page 2021
[4]

B., Ahmed, F., Kobayashi, K., Koric, S., Abueidda, D., and Alam, S

Hossain, R. B., Ahmed, F., Kobayashi, K., Koric, S., Abueidda, D., and Alam, S. B. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators. arXiv preprint arXiv:2410.13762, 2024

work page arXiv 2024
[5]

and Alam, S

Kobayashi, K. and Alam, S. B. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems. Scientific Reports, 14:3935, 2024

work page 2024
[6]

Kobayashi, K., Daniell, J., and Alam, S. B. Improved generalization with deep neural operators for engineering systems: Path towards digital twin. Engineering Applications of Artificial Intelligence, 131:107844, 2024

work page 2024
[7]

Kobayashi, K., Roy, S., Koric, S., Abueidda, D., and Alam, S. B. From proxies to fields: Spatiotemporal reconstruction of global radiation from sparse sensor sequences. arXiv preprint arXiv:2506.12045, 2025

work page arXiv 2025
[8]

Kobayashi, K., Garg, S., Ahmed, F., Chakraborty, S., and Alam, S. B. Distribution-free uncertainty-aware virtual sensing via conformalized neural operators. arXiv preprint arXiv:2507.11574, 2025

work page arXiv 2025
[9]

Neural operator: Learning maps between function spaces with applications to PDE s

Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Learning maps between function spaces with applications to PDE s. Journal of Machine Learning Research, 24(89):1--97, 2023

work page 2023
[10]

Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218--229, 2021

work page 2021
[11]

Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z., and Karniadakis, G. E. A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

work page 2022
[12]

Towards deep learning models resistant to adversarial attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018

work page 2018
[13]

Spectral normalization for generative adversarial networks

Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018

work page 2018
[14]

Distillation as a defense to adversarial perturbations against deep neural networks

Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, pages 582--597, 2016

work page 2016
[15]

Roy, S., Kobayashi, K., Chakraborty, S., Rizwan-uddin, and Alam, S. B. Adversarial vulnerabilities in neural operator digital twins: Gradient-free attacks on nuclear thermal-hydraulic surrogates. arXiv preprint arXiv:2603.22525, 2026

work page arXiv 2026
[16]

Online adversarial purification based on self-supervised learning

Shi, C., Holtz, C., and Mishne, G. Online adversarial purification based on self-supervised learning. arXiv preprint arXiv:2101.09387, 2021

work page arXiv 2021
[17]

Robustness may be at odds with accuracy

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019

work page 2019
[18]

Fourier- DeepONet : Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness

Zhu, M., Feng, S., Lin, Y., and Lu, L. Fourier- DeepONet : Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness. Computer Methods in Applied Mechanics and Engineering, 416:116300, 2023

work page 2023

[1] [1]

Adesoji, A. D. and Chen, P.-Y. Evaluating the adversarial robustness for F ourier neural operators. arXiv preprint arXiv:2204.04259, 2022

work page arXiv 2022

[2] [2]

Certified adversarial robustness via randomized smoothing

Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310--1320, 2019

work page 2019

[3] [3]

Gouk, H., Frank, E., Pfahringer, B., and Cree, M. J. Regularisation of neural networks by enforcing L ipschitz continuity. Machine Learning, 110(2):393--416, 2021

work page 2021

[4] [4]

B., Ahmed, F., Kobayashi, K., Koric, S., Abueidda, D., and Alam, S

Hossain, R. B., Ahmed, F., Kobayashi, K., Koric, S., Abueidda, D., and Alam, S. B. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators. arXiv preprint arXiv:2410.13762, 2024

work page arXiv 2024

[5] [5]

and Alam, S

Kobayashi, K. and Alam, S. B. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems. Scientific Reports, 14:3935, 2024

work page 2024

[6] [6]

Kobayashi, K., Daniell, J., and Alam, S. B. Improved generalization with deep neural operators for engineering systems: Path towards digital twin. Engineering Applications of Artificial Intelligence, 131:107844, 2024

work page 2024

[7] [7]

Kobayashi, K., Roy, S., Koric, S., Abueidda, D., and Alam, S. B. From proxies to fields: Spatiotemporal reconstruction of global radiation from sparse sensor sequences. arXiv preprint arXiv:2506.12045, 2025

work page arXiv 2025

[8] [8]

Kobayashi, K., Garg, S., Ahmed, F., Chakraborty, S., and Alam, S. B. Distribution-free uncertainty-aware virtual sensing via conformalized neural operators. arXiv preprint arXiv:2507.11574, 2025

work page arXiv 2025

[9] [9]

Neural operator: Learning maps between function spaces with applications to PDE s

Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Learning maps between function spaces with applications to PDE s. Journal of Machine Learning Research, 24(89):1--97, 2023

work page 2023

[10] [10]

Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218--229, 2021

work page 2021

[11] [11]

Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z., and Karniadakis, G. E. A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

work page 2022

[12] [12]

Towards deep learning models resistant to adversarial attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018

work page 2018

[13] [13]

Spectral normalization for generative adversarial networks

Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018

work page 2018

[14] [14]

Distillation as a defense to adversarial perturbations against deep neural networks

Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, pages 582--597, 2016

work page 2016

[15] [15]

Roy, S., Kobayashi, K., Chakraborty, S., Rizwan-uddin, and Alam, S. B. Adversarial vulnerabilities in neural operator digital twins: Gradient-free attacks on nuclear thermal-hydraulic surrogates. arXiv preprint arXiv:2603.22525, 2026

work page arXiv 2026

[16] [16]

Online adversarial purification based on self-supervised learning

Shi, C., Holtz, C., and Mishne, G. Online adversarial purification based on self-supervised learning. arXiv preprint arXiv:2101.09387, 2021

work page arXiv 2021

[17] [17]

Robustness may be at odds with accuracy

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019

work page 2019

[18] [18]

Fourier- DeepONet : Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness

Zhu, M., Feng, S., Lin, Y., and Lu, L. Fourier- DeepONet : Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness. Computer Methods in Applied Mechanics and Engineering, 416:116300, 2023

work page 2023