Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text
Pith reviewed 2026-06-26 04:37 UTC · model grok-4.3
The pith
GAversary uses a genetic algorithm and GloVe embeddings to create black-box adversarial text that drops NLP classifier accuracy below levels reached by BAE or A2T.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GAversary is a hybrid genetic algorithm for generating adversarial attacks on natural language models. It treats the target model as a black box, using only logit outputs to guide the search, and employs GloVe embeddings to propose semantically similar word replacements during mutation. When tested on several benchmark datasets and well-known models, GAversary substantially reduces the target model's accuracy on test data more than the BAE and A2T attacks, with the best case dropping accuracy from 76.8% to 5.8% versus BAE's 27.6%.
What carries the argument
The hybrid genetic algorithm employing a GloVe embedding-based mutation operator to generate word replacements while optimizing for adversarial effect using model logits.
If this is right
- The genetic search finds adversarial examples that reduce accuracy more effectively than BAE and A2T.
- Only logit values are needed, allowing attacks without gradient or internal model access.
- The generated examples have slightly lower semantic similarity but still succeed in fooling the models.
- Run time increases by approximately 5 percent over the compared methods.
- Nearly twice as many words are perturbed compared to the baseline attacks.
Where Pith is reading between the lines
- Models exposed through APIs providing only predictions could be more vulnerable to such evolutionary attacks than previously thought.
- Improving semantic similarity in mutation operators might allow even stronger attacks with fewer perturbations.
- The approach highlights the need for robustness testing that includes population-based search methods rather than just local perturbations.
- Extending GAversary to other modalities or tasks could reveal similar vulnerabilities in different AI systems.
Load-bearing premise
GloVe embeddings produce word replacements that maintain sufficient semantic similarity while permitting the genetic algorithm to locate adversarial examples based solely on logit outputs.
What would settle it
Running GAversary on the evaluated models and datasets and observing that the accuracy reduction is no greater than that achieved by BAE or A2T, or that semantic similarity scores drop below acceptable thresholds.
Figures
read the original abstract
Deep learning models have achieved impressive performance across various fields but remain vulnerable to adversarial inputs, particularly in NLP, where such attacks can have significant real-world consequences. Adversarial attacks often involve small, semantically similar token replacements to fool NLP models, and recent methods have become more precise by targeting specific vulnerable words, often by exploiting some level of access to the model's internal structure. This paper proposes GAversary, a hybrid Genetic Algorithm (GA) to generate adversarial attacks on natural language models. The GA is able to treat the target model as a black box, requiring only the logit value output by the model to guide the search. GAversary differs from GAs previously proposed for this problem by using GloVe embeddings to propose word replacements (the mutation operator) to improve the semantic similarity of the adversarial examples. GAversary is applied to several benchmark data sets and well-known target models. GAversary is able to substantially reduce the target model's accuracy on test data compared to the BAE and A2T attacks compared against (in the best case, reducing a 76.8% accuracy to 5.8%, compared to BAE's 27.6%). The trade-off is that GAversary perturbs just under twice as many words as the other two methods, with a slightly lower semantic similarity to the original text and around a 5% increase in run-time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GAversary, a genetic algorithm for black-box adversarial attacks on NLP classifiers. It employs GloVe embeddings for the mutation operator to generate semantically similar word replacements, guided solely by model logits. Evaluated on benchmark datasets and models, it claims superior attack performance over BAE and A2T, reducing accuracy from 76.8% to 5.8% compared to BAE's 27.6%, while perturbing nearly twice as many words.
Significance. Should the superiority be confirmed under equivalent perturbation constraints, this would establish that GA-based search with embedding-guided mutations can yield stronger black-box attacks than existing methods, emphasizing the need for robust defenses in NLP systems.
major comments (2)
- [Abstract] The central claim of substantially better attack success (76.8% to 5.8% accuracy reduction vs. BAE at 27.6%) is presented without evidence that the BAE and A2T baselines were evaluated under the same word perturbation budget; since GAversary perturbs just under twice as many words, the performance gap may be explained by the relaxed constraint rather than the GA or GloVe design.
- [Abstract] No details are provided on the specific benchmark datasets, target models, number of runs, variance across runs, or statistical tests supporting the reported accuracy figures, rendering the empirical superiority claim unverifiable from the given text.
minor comments (1)
- [Abstract] The run-time increase is stated as 'around a 5%' without specifying the baseline or measurement method.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments point by point below, agreeing where the concerns are valid and outlining planned revisions.
read point-by-point responses
-
Referee: [Abstract] The central claim of substantially better attack success (76.8% to 5.8% accuracy reduction vs. BAE at 27.6%) is presented without evidence that the BAE and A2T baselines were evaluated under the same word perturbation budget; since GAversary perturbs just under twice as many words, the performance gap may be explained by the relaxed constraint rather than the GA or GloVe design.
Authors: We agree this is a valid concern. The abstract already states the trade-off that GAversary perturbs nearly twice as many words, and the reported results use each method's standard configuration from the original papers. To strengthen the comparison and isolate the contribution of the GA and GloVe mutation, we will add new experiments in the revised manuscript that constrain all methods to identical perturbation budgets. revision: yes
-
Referee: [Abstract] No details are provided on the specific benchmark datasets, target models, number of runs, variance across runs, or statistical tests supporting the reported accuracy figures, rendering the empirical superiority claim unverifiable from the given text.
Authors: The abstract is intentionally concise, but the full manuscript's Experiments section specifies the datasets, models, run counts, variance, and any statistical tests. We will revise the abstract to include the key dataset and model names plus a brief note on experimental repetition, and we will ensure the results section explicitly reports variance and significance tests. revision: yes
Circularity Check
No circularity: purely empirical algorithmic contribution
full rationale
The paper proposes GAversary, a genetic algorithm for generating adversarial text using GloVe-based mutations, and reports empirical accuracy reductions on benchmark datasets compared to BAE and A2T. No derivation chain, equations, fitted parameters, or first-principles results are present that could reduce to inputs by construction. Comparisons are experimental; the noted difference in perturbation count is a methodological detail, not a circular reduction. No self-citations or ansatzes are load-bearing for any claimed result.
Axiom & Free-Parameter Ledger
free parameters (1)
- Genetic algorithm hyperparameters
axioms (1)
- domain assumption GloVe embeddings can be used to propose word replacements that preserve semantic similarity better than random or other substitution methods.
Reference graph
Works this paper leans on
-
[1]
Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei. Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states.Proceedings of the National Academy of Sci- ences, 114(50):13108–13113, 2017
2017
-
[2]
A study and comparison of human and deep learning recognition performance under visual distortions
Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN), pages 1–7. IEEE, 2017
2017
-
[3]
No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems
Myrthe Reuver, Antske Fokkens, and Suzan Verberne. No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems. InProceedings of the EACL Hackashop on News Me- dia Content Analysis and Automated Report Generation, pages 45–55, Online, April 2021. Association for Computational Linguistics
2021
-
[4]
Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020
M ´Angeles L´ opez-Cabarcos, Ada M P´ erez-Pico, Maria Luisa L´ opez Perez, et al. Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020
2020
-
[5]
The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005
Brian M Lucey and Michael Dowling. The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005
2005
-
[6]
Tom Roth, Yansong Gao, Alsharif Abuadbba, Surya Nepal, and Wei Liu. Token-modification adversarial attacks for natural language pro- cessing: A survey.arXiv preprint arXiv:2103.00676, 2021
arXiv 2021
-
[7]
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. Textbug- ger: Generating adversarial text against real-world applications.arXiv preprint arXiv:1812.05271, 2018
Pith/arXiv arXiv 2018
-
[8]
Black-box generation of adversarial text sequences to evade deep learning clas- sifiers
Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. Black-box generation of adversarial text sequences to evade deep learning clas- sifiers. In2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018
2018
-
[9]
Bert-attack: Adversarial attack against bert using bert
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. Bert-attack: Adversarial attack against bert using bert. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 6193–6202, 2020
2020
-
[10]
Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020
Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming- Ting Sun, and Bill Dolan. Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020
arXiv 2009
-
[11]
Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994
Edward J Anderson and Michael C Ferris. Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994
1994
-
[12]
Springer, 2020
Fouad Bennis and Rajib Kumar Bhattacharjya.Nature-Inspired Meth- ods for Metaheuristics Optimization: Algorithms and Applications in Science and Engineering, volume 16. Springer, 2020. 20
2020
-
[13]
Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples
Shijun Ye, Pengcheng Zhang, Hai Dong, and Shunhui Ji. Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples. In2021 IEEE International Conference on Artificial Intelligence Testing (AITest), pages 39–40. IEEE, 2021
2021
-
[14]
Generating natural language adversarial examples
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empir- ical Methods in Natural Language Processing, pages 2890–2896, 2018
2018
-
[16]
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word bench- mark for measuring progress in statistical language modeling.arXiv preprint arXiv:1312.3005, 2013
Pith/arXiv arXiv 2013
-
[17]
Certi- fied robustness to adversarial word substitutions
Robin Jia, Aditi Raghunathan, Kerem G¨ oksel, and Percy Liang. Certi- fied robustness to adversarial word substitutions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Process- ing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4129–4142, 2019
2019
-
[18]
Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715, 2018
arXiv 2018
-
[19]
Is bert really robust? a strong baseline for natural language attack on text classification and entailment
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020
2020
-
[20]
Bert: Pre-training of deep bidirectional transformers for language un- derstanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding. InProceedings of the 2019 conference of the North Amer- ican chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171– 4186, 2019. 21
2019
-
[21]
Bae: Bert-based ad- versarial examples for text classification
Siddhant Garg and Goutham Ramakrishnan. Bae: Bert-based ad- versarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, 2020
2020
-
[22]
Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy- anov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019
Pith/arXiv arXiv 1907
-
[23]
Towards improving adversarial train- ing of nlp models
Jin Yong Yoo and Yanjun Qi. Towards improving adversarial train- ing of nlp models. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 945–956, 2021
2021
-
[24]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019
Pith/arXiv arXiv 1910
-
[25]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks.CoRR, abs/1312.6199, 2013
Pith/arXiv arXiv 2013
-
[26]
Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021
Chuan Guo, Alexandre Sablayrolles, Herv´ e J´ egou, and Douwe Kiela. Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021
arXiv 2021
-
[27]
Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work
Lifan Yuan, Yichi Zhang, Yangyi Chen, and Wei Wei. Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7132–7146, 2023
2023
-
[28]
Semat- tack: Natural textual attacks via different semantic spaces
Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, and Bo Li. Semat- tack: Natural textual attacks via different semantic spaces. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 176–205, 2022
2022
-
[29]
Texthoaxer: Budgeted hard-label adversarial attacks on text
Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. Texthoaxer: Budgeted hard-label adversarial attacks on text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3877– 3884, 2022
2022
-
[30]
Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack
Zhen Yu, Xiaosen Wang, Wanxiang Che, and Kun He. Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, pages 622–637, 2022
2022
-
[31]
Adversarial text gen- eration by search and learning
Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiao- dan Zhang, Longtao Huang, and Honglei Lyu. Adversarial text gen- eration by search and learning. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023
2023
-
[32]
Natural language adversarial at- tacks and defenses in word level
Xiaosen Wang, Jin Hao, and Kun He. Natural language adversarial at- tacks and defenses in word level. InArXiv preprint arXiv:1909.06723v1, 2019
arXiv 1909
-
[33]
Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009
Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009
2009
-
[34]
Counter-fitting word vectors to linguistic constraints
Nikola Mrkˇ si´ c, Diarmuid O S´ eaghdha, Blaise Thomson, Milica Gaˇ si´ c, Lina Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016
Pith/arXiv arXiv 2016
-
[35]
Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017
Pith/arXiv arXiv 2017
-
[36]
Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017
Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017
Pith/arXiv arXiv 2017
-
[37]
vulnerability of natural language classifiers to evolutionary generated adversarial text
Alexander E. I. Brownlee and M. and Singh. Data and processing scripts for the paper “vulnerability of natural language classifiers to evolutionary generated adversarial text”, 2025. URL - TBC on publi- cation [Online; accessed 7-March-2025]
2025
-
[38]
Lulu, second edition, 2013
Sean Luke.Essentials of Metaheuristics. Lulu, second edition, 2013. Available for free at http://cs.gmu.edu/∼sean/book/metaheuristics/
2013
-
[39]
Glove: Global vectors for word representation
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014
2014
-
[40]
Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp
John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126, 2020
2020
-
[41]
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. InACL, 2005
2005
-
[42]
Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015
2015
-
[43]
Convolutional neural networks for sentence classification
Yoon Kim. Convolutional neural networks for sentence classification. InEMNLP, 2014
2014
-
[44]
Long short-term memory
Sepp Hochreiter and J¨ urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997
1997
-
[45]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding.arXiv preprint arXiv:1810.04805, 2018
Pith/arXiv arXiv 2018
-
[46]
Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021
Zara Nasar, Syed Waqar Jaffry, and Muhammad Kamran Malik. Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021
2021
-
[47]
Pablo Moscato and Michael G Norman. A memetic approach for the traveling salesman problem implementation of a computational ecology for combinatorial optimization on message-passing systems.Parallel computing and transputer applications, 1:177–186, 1992
1992
-
[48]
The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024
Pith/arXiv arXiv 2024
-
[49]
Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024. 24
Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.