pith. sign in

arxiv: 2606.13439 · v1 · pith:3B2ZRP5Inew · submitted 2026-06-11 · 💻 cs.CL · cs.LG

S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP

Pith reviewed 2026-06-27 07:07 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords certified robustnessword substitution attacksHessian boundsNLP modelsregularizationLSTMCNNrobust accuracy
0
0 comments X

The pith

A tensor bounding the Hessian element-wise during training yields up to 23.4% higher certified robustness against word substitution attacks in NLP models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to establish that certified defenses against word substitution attacks improve when both the gradient and its rate of change are controlled. It proposes the Smooth Growth Bound Tensor to derive element-wise Hessian bounds with formal proofs, then adds a regularization term to minimize these bounds in the training loss for LSTM and CNN models. The output change under substitution is bounded by linear and quadratic terms derived from these. On benchmark datasets, the combined first- and second-order regularization increases certified robust accuracy by up to 23.4% with competitive clean accuracy. The work indicates that curvature control complements first-order methods for building robust NLP models.

Core claim

The paper claims that by defining the Smooth Growth Bound Tensor to provide element-wise upper bounds on the Hessian, and regularizing these bounds, one obtains provably tighter certified robustness guarantees against word substitution attacks. The bounds are derived specifically for LSTM and CNN, and the regularization is integrated into the training objective. Experimental results confirm improvements in certified accuracy while maintaining clean performance.

What carries the argument

The Smooth Growth Bound Tensor (S-GBT), an element-wise bound on the model's Hessian that controls the quadratic term in the robustness certificate.

If this is right

  • Robustness certificates incorporate both first-order gradient bounds and second-order curvature bounds.
  • The regularization can be applied directly during training without modifying the model architecture.
  • Certified robust accuracy improves by up to 23.4% on multiple datasets compared to prior first-order methods.
  • Clean accuracy stays competitive, showing the method does not trade off nominal performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method might be adapted to other attack types if similar bounds can be computed for their perturbation models.
  • Lower Hessian bounds could correlate with improved generalization beyond the specific attack considered.
  • Future work could combine S-GBT with other defense techniques like adversarial training for compounded benefits.

Load-bearing premise

The theoretical element-wise Hessian bounds are valid and tight enough that minimizing them via regularization produces practically useful certified robustness against word substitutions.

What would settle it

Finding a word substitution where the actual change in model output exceeds the certified bound from the S-GBT, or training with the regularization and observing no gain in certified accuracy on the benchmarks.

Figures

Figures reproduced from arXiv: 2606.13439 by Adnane Saoud, Mohammed Bouri, Mohammed Erradi.

Figure 1
Figure 1. Figure 1: Hyperparameter study on the IMDB dataset for BiLSTM (left) and CNN (right) under the PWWS attack. The left heatmap shows the Accuracy Under Attack (AUA) of BiLSTM for different combinations of β and γ, while the right plot reports the AUA of CNN as a function of γ with β = 0.1. Across a wide range of hyperparameter values, S-GBT consistently achieves high robustness and outperforms baseline methods, demons… view at source ↗
read the original abstract

Despite recent progress in Natural Language Processing (NLP), models remain vulnerable to word substitution attacks. Most existing defenses focus on first order sensitivity and measure how much the output changes when the input is slightly perturbed. However, they ignore how this sensitivity evolves, which is described by curvature. When gradients vary sharply, models can still fail. This paper introduces the Smooth Growth Bound Tensor (S-GBT), a second order method that bounds the Hessian element-wise, for which we provide formal theoretical proofs on the resulting robustness bounds. A regularization term is added during training to minimize these bounds. This yields tighter certified robustness against word substitution attacks. The change in the output under word substitution is bounded by both a linear term and a quadratic term. S-GBT is derived for two architectures: Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN). The method is integrated directly into the training objective. Its effectiveness is evaluated on multiple benchmark datasets. The results show that combining first and second order regularization improves certified robust accuracy by up to 23.4% compared to prior methods, while clean accuracy remains competitive. These findings indicate that controlling both the gradient and its variation is a promising direction for building more robust models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Smooth Growth Bound Tensor (S-GBT), a second-order regularization technique that computes element-wise bounds on the Hessian of LSTM and CNN models. These bounds are incorporated into the training objective to produce certified robustness certificates against word-substitution attacks; the certificates combine a first-order linear term with a quadratic term derived from the Hessian bound. Formal proofs are claimed for the resulting robustness guarantees, and experiments on benchmark datasets report up to 23.4% improvement in certified robust accuracy over prior methods while preserving competitive clean accuracy.

Significance. If the claimed element-wise Hessian bounds can be shown to rigorously majorize the output change under discrete embedding-space substitutions, the approach would constitute a meaningful extension of certified robustness methods from first-order sensitivity to curvature control in NLP. The integration of the bound directly into training and the reported empirical gains would be of interest to the certified-defense community.

major comments (2)
  1. [Theoretical proofs section] Abstract and § on theoretical proofs: the central claim that element-wise bounds on the Hessian yield valid certificates for word-substitution attacks rests on controlling the quadratic remainder term (1/2)δᵀHδ for a discrete jump δ in embedding space. The manuscript must supply an explicit majorant (e.g., via an ∞-norm or Frobenius bound on δ together with the element-wise |H| matrix) rather than treating minimization of |H_ij| as automatically sufficient; without this step the certificate does not necessarily upper-bound |f(x+δ)−f(x)| on the actual attack set.
  2. [Derivation and regularization section] § on derivation for LSTM/CNN and regularization term: the transition from the element-wise Hessian bound to the final certified radius must be shown to remain valid after the regularization is added; if the regularization parameters appear inside the bound definition itself, the claimed “parameter-free” or independently verifiable nature of the certificate is compromised.
minor comments (2)
  1. [Abstract] The abstract states a 23.4% gain but does not specify the exact baseline methods, datasets, or attack model used; these details should be stated explicitly in the experimental section.
  2. [Method section] Notation for the Smooth Growth Bound Tensor should be introduced with a clear definition before its use in the regularization objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the detailed major comments. We address each point below. Where the comments identify opportunities for greater explicitness in the proofs and derivations, we agree that revisions will strengthen the manuscript and will incorporate the requested clarifications.

read point-by-point responses
  1. Referee: [Theoretical proofs section] Abstract and § on theoretical proofs: the central claim that element-wise bounds on the Hessian yield valid certificates for word-substitution attacks rests on controlling the quadratic remainder term (1/2)δᵀHδ for a discrete jump δ in embedding space. The manuscript must supply an explicit majorant (e.g., via an ∞-norm or Frobenius bound on δ together with the element-wise |H| matrix) rather than treating minimization of |H_ij| as automatically sufficient; without this step the certificate does not necessarily upper-bound |f(x+δ)−f(x)| on the actual attack set.

    Authors: We thank the referee for highlighting this step. The theoretical proofs section bounds the quadratic remainder by combining the element-wise Hessian bound with the fact that word-substitution attacks induce discrete changes δ whose magnitude is controlled by the maximum embedding-space distance between substitutable tokens. This yields |½ δᵀ H δ| ≤ ½ ‖δ‖_∞² ⋅ Σ |H_ij|. To make the majorization fully explicit and directly tied to the attack set, we will insert a new lemma (and supporting corollary) that states the precise ∞-norm bound on admissible δ and shows how it produces a valid upper bound on |f(x+δ)−f(x)|. This addition does not change the claimed guarantees but renders the connection to the discrete attack set transparent. revision: yes

  2. Referee: [Derivation and regularization section] § on derivation for LSTM/CNN and regularization term: the transition from the element-wise Hessian bound to the final certified radius must be shown to remain valid after the regularization is added; if the regularization parameters appear inside the bound definition itself, the claimed “parameter-free” or independently verifiable nature of the certificate is compromised.

    Authors: The regularization term penalizes large element-wise Hessian entries during training but does not alter the definition of the bound used at certification time. After training, the certified radius is computed directly from the realized element-wise Hessian bounds of the final model; the regularization hyperparameters λ do not enter the certificate expression. Consequently, verification remains independent of training choices and can be performed by any party given only the trained weights. We will add a short clarifying paragraph (and a remark in the certification algorithm) that separates the training objective from the post-training bound evaluation to eliminate any potential ambiguity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds and certificates derived independently of fitted regularization values

full rationale

The paper states that formal theoretical proofs establish element-wise Hessian bounds that yield linear-plus-quadratic certificates for word-substitution deltas, with a separate regularization term added to training to minimize those bounds. Certified robust accuracy is then measured on external benchmark datasets after training. No equation or claim reduces the certificate itself to a quantity defined solely by the regularization parameters; the proofs are presented as self-contained derivations for LSTM and CNN architectures. No self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided text. The derivation chain therefore remains independent of its own fitted inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence and minimizability of the S-GBT construct plus standard differentiability assumptions for the neural architectures; the abstract supplies no independent evidence for these beyond the claimed proofs.

free parameters (1)
  • regularization coefficient
    The weight balancing the new second-order term against the primary loss is a tunable hyperparameter required to minimize the bounds.
axioms (1)
  • domain assumption The models (LSTM, CNN) are twice continuously differentiable so that the Hessian exists and can be bounded element-wise.
    Required for the definition and bounding of S-GBT.
invented entities (1)
  • Smooth Growth Bound Tensor (S-GBT) no independent evidence
    purpose: Element-wise bound on the Hessian to produce certified robustness guarantees under word substitution.
    New construct introduced by the paper with no independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5751 in / 1492 out tokens · 37102 ms · 2026-06-27T07:07:08.775178+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 5 canonical work pages

  1. [1]

    In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)

    Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)

  2. [2]

    In: Findings of the Association for Computational Linguistics: ACL 2025

    Bouri, M., Saoud, A.: Bridging robustness and generalization against word substi- tution attacks in nlp via the growth bound matrix approach. In: Findings of the Association for Computational Linguistics: ACL 2025. pp. 12118–12137 (2025)

  3. [3]

    In: 9th International Conference on Learning Representations (2021) 16 M

    Dong, X., Luu, A.T., Ji, R., Liu, H.: Towards robustness against natural language word substitutions. In: 9th International Conference on Learning Representations (2021) 16 M. Bouri et al

  4. [4]

    In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018)

    Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: White-box adversarial ex- amples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018)

  5. [5]

    Eger, S., Benz, Y.: From hero to zéroe: A benchmark of low-level adversarial at- tacks. In: Proceedings of the 1st conference of the Asia-Pacific chapter of the as- sociation for computational linguistics and the 10th international joint conference on natural language processing. pp. 786–803 (2020)

  6. [6]

    In: Korhonen, A., Traum, D., Màrquez, L

    Ge, T., Zhang, X., Wei, F., Zhou, M.: Automatic grammatical error correction for sequence-to-sequence text generation: An empirical study. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the As- sociation for Computational Linguistics. pp. 6059–6064. Association for Compu- tational Linguistics, Florence, Italy (Ju...

  7. [7]

    In: 3rd International Conference on Learning Representations (2015)

    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations (2015)

  8. [8]

    Neural computation 9(8), 1735–1780 (1997)

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)

  9. [9]

    In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confer- ence on Natural Language Processing (2019)

    Huang, P.S., Stanforth, R., Welbl, J., Dyer, C., Yogatama, D., Gowal, S., Dvi- jotham, K., Kohli, P.: Achieving verified robustness to symbol substitutions via interval bound propagation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confer- ence on Natural Language Processing (2019)

  10. [10]

    In: Proceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing

    Ivgi, M., Berant, J.: Achieving model robustness through discrete adversarial train- ing. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing. pp. 1529–1544 (2021)

  11. [11]

    In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

    Jia, R., Raghunathan, A., Göksel, K., Liang, P.: Certified robustness to adversarial word substitutions. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. pp. 4127–4140 (2019)

  12. [12]

    CoRR abs/1408.5882(2014), http://arxiv.org/abs/1408.5882

    Kim, Y.: Convolutional neural networks for sentence classification. CoRR abs/1408.5882(2014), http://arxiv.org/abs/1408.5882

  13. [13]

    The Journal of Supercomput- ing81(15), 1–45 (2025)

    Kissami, I., Basmadjian, R., Chakir, O., Abid, M.R.: Toubkal: a high-performance supercomputer powering scientific research in africa. The Journal of Supercomput- ing81(15), 1–45 (2025)

  14. [14]

    arXiv preprint arXiv:2004.14543 (2020)

    Li, L., Qiu, X.: Tavat: Token-aware virtual adversarial training for language un- derstanding. arXiv preprint arXiv:2004.14543 (2020)

  15. [15]

    In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)

    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)

  16. [16]

    In: 6th International Conference on Learning Representations (2018)

    Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learn- ing models resistant to adversarial attacks. In: 6th International Conference on Learning Representations (2018)

  17. [17]

    In: Thirty-Fifth AAAI Conference on Artificial Intelligence (2021)

    Maheshwary, R., Maheshwary, S., Pudi, V.: Generating natural language attacks in a hard label black box setting. In: Thirty-Fifth AAAI Conference on Artificial Intelligence (2021)

  18. [18]

    In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security

    Pei, W., Yue, C.: Generating content-preserving and semantics-flipping adversarial text. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. pp. 975–989 (2022)

  19. [19]

    In: Proceedings of the 57th Conference of the Association for Computational Linguistics

    Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: Proceedings of the 57th Conference of the Association for Computational Linguistics. pp. 5582–5591 (2019) S-GBT: Smooth Growth Bound Tensor 17

  20. [20]

    In: Korhonen, A., Traum, D., Màrquez, L

    Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Pro- ceedings of the 57th Annual Meeting of the Association for Computational Lin- guistics. pp. 5582–5591. Association for Computational Linguistics, Florence, Italy (Jul 2019). https://doi.org/10.18653/v...

  21. [21]

    arXiv preprint arXiv:2406.05532 (2024)

    Qi, B., Luo, Y., Gao, J., Li, P., Tian, K., Ma, Z., Zhou, B.: Exploring adversarial robustness of deep state space models. arXiv preprint arXiv:2406.05532 (2024)

  22. [22]

    In: Proceedings of the 57th Conference of the Association for Computational Linguistics (2019)

    Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial ex- amples through probability weighted word saliency. In: Proceedings of the 57th Conference of the Association for Computational Linguistics (2019)

  23. [23]

    arXiv preprint arXiv:2310.10844 (2023)

    Shayegani, E., Mamun, M.A.A., Fu, Y., Zaree, P., Dong, Y., Abu-Ghazaleh, N.: Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844 (2023)

  24. [24]

    In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

    Song, L., Yu, X., Peng, H.T., Narasimhan, K.: Universal adversarial attacks with natural triggers for text classification. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 3724–3733 (2021)

  25. [25]

    In: Conference on Em- pirical Methods in Natural Language Processing (2020)

    Wang, B., Pei, H., Pan, B., Chen, Q., Wang, S., Li, B.: T3: Tree-autoencoder constrained adversarial text generation for targeted attack. In: Conference on Em- pirical Methods in Natural Language Processing (2020)

  26. [26]

    9th International Conference on Learning Representations (ICLR) (2020)

    Wang, B., Wang, S., Cheng, Y., Gan, Z., Jia, R., Li, B., Liu, J.: Infobert: Improv- ing robustness of language models from an information theoretic perspective. 9th International Conference on Learning Representations (ICLR) (2020)

  27. [27]

    In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)

    Wang, W., Tang, P., Lou, J., Xiong, L.: Certified robustness to word substitution attack with differential privacy. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)

  28. [28]

    In: Proceedings of the 37th Conference on Uncertainty in Ar- tificial Intelligence (2021)

    Wang, X., Jin, H., Yang, Y., He, K.: Natural language adversarial defense through synonym encoding. In: Proceedings of the 37th Conference on Uncertainty in Ar- tificial Intelligence (2021)

  29. [29]

    In: AAAI Conference on Artificial Intelligence (2021)

    Wang, X., Yang, Y., Deng, Y., He, K.: Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: AAAI Conference on Artificial Intelligence (2021)

  30. [30]

    In: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

    Wang, Z., Wang, W., Chen, Q., Wang, Q., Nguyen, A.: Generating valid and nat- ural adversarial examples with large language models. In: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). pp. 1716–1721. IEEE (2024)

  31. [31]

    arXiv preprint arXiv:2405.02764 (2024)

    Yang, Z., Meng, Z., Zheng, X., Wattenhofer, R.: Assessing adversarial robustness of large language models: An empirical study. arXiv preprint arXiv:2405.02764 (2024)

  32. [32]

    In: Proceed- ings of the 58th Annual Meeting of the Association for Computa- tional Linguistics

    Ye, M., Gong, C., Liu, Q.: SAFER: A structure-free approach for certified robustness to adversarial word substitutions. In: Proceed- ings of the 58th Annual Meeting of the Association for Computa- tional Linguistics. pp. 3465–3475. Association for Computational Lin- guistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.317, https://www.ac...

  33. [33]

    In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020) 18 M

    Zang, Y., Qi, F., Yang, C., Liu, Z., Zhang, M., Liu, Q., Sun, M.: Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020) 18 M. Bouri et al

  34. [34]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Zeng, G., Qi, F., Zhou, Q., Zhang, T., Ma, Z., Hou, B., Zang, Y., Liu, Z., Sun, M.: OpenAttack: An open-source textual adversarial attack toolkit. In: Ji, H., Park, J.C., Xia, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System ...

  35. [35]

    In: Findings of Association for Computational Linguistics (2021)

    Zeng, J., Zheng, X., Xu, J., Li, L., Yuan, L., Huang, X.: Certified robustness to text adversarial attacks by randomized [MASK]. In: Findings of Association for Computational Linguistics (2021)

  36. [36]

    arXiv preprint arXiv:2105.03743 (2021)

    Zeng, J., Zheng, X., Xu, J., Li, L., Yuan, L., Huang, X.: Certified robustness to text adversarial attacks by randomized [MASK]. arXiv preprint arXiv:2105.03743 (2021)

  37. [37]

    In: Advances in Neural Information Processing Systems (2015)

    Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems (2015)

  38. [38]

    In: 2024 IEEE Symposium on Security and Privacy (SP)

    Zhang, X., Hong, H., Hong, Y., Huang, P., Wang, B., Ba, Z., Ren, K.: Text-crs: A generalized certified robustness framework against textual adversarial attacks. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2920–2938. IEEE (2024)

  39. [39]

    In: Moens, M.F., Huang, X., Spe- cia, L., Yih, S.W.t

    Zhang, Y., Albarghouthi, A., D’Antoni, L.: Certified robustness to pro- grammable transformations in LSTMs. In: Moens, M.F., Huang, X., Spe- cia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on Em- pirical Methods in Natural Language Processing. pp. 1068–1083. Asso- ciation for Computational Linguistics, Online and Punta Cana, Domini- can Rep...

  40. [40]

    arXiv preprint arXiv:2006.11627 (2020)

    Zhou, Y., Zheng, X., Hsieh, C.J., Chang, K.w., Huang, X.: Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble. arXiv preprint arXiv:2006.11627 (2020)

  41. [41]

    Zhu, C., Cheng, Y., Gan, Z., Sun, S., Goldstein, T., Liu, J.: Freelb: Enhanced ad- versarial training for natural language understanding. In: 8th International Con- ference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020), https://openreview.net/forum?id=BygzbyHFvB S-GBT: Smooth Growth Bound Tensor 19 A Appendix A.1 ...