Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text

Alexander E. I. Brownlee; Manjinder Singh; Mohamed Elawady

arxiv: 2606.27215 · v1 · pith:MAGTIFKJnew · submitted 2026-06-25 · 💻 cs.AI

Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text

Manjinder Singh , Alexander E. I. Brownlee , Mohamed Elawady This is my paper

Pith reviewed 2026-06-26 04:37 UTC · model grok-4.3

classification 💻 cs.AI

keywords adversarial textgenetic algorithmblack-box attacknatural language processingtext classificationGloVemodel robustness

0 comments

The pith

GAversary uses a genetic algorithm and GloVe embeddings to create black-box adversarial text that drops NLP classifier accuracy below levels reached by BAE or A2T.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GAversary as a method to generate adversarial examples for natural language classifiers using a genetic algorithm that requires only the model's output logits. By incorporating GloVe embeddings in the mutation operator, the approach aims to produce replacements that preserve semantic meaning better than previous techniques. Experiments on benchmark datasets show it reduces model accuracy more than the BAE and A2T attacks, for instance lowering accuracy from 76.8 percent to 5.8 percent compared to BAE's 27.6 percent. This demonstrates the potential of evolutionary algorithms to uncover vulnerabilities in text models even without internal access. The method perturbs more words on average and has a modest increase in computation time.

Core claim

GAversary is a hybrid genetic algorithm for generating adversarial attacks on natural language models. It treats the target model as a black box, using only logit outputs to guide the search, and employs GloVe embeddings to propose semantically similar word replacements during mutation. When tested on several benchmark datasets and well-known models, GAversary substantially reduces the target model's accuracy on test data more than the BAE and A2T attacks, with the best case dropping accuracy from 76.8% to 5.8% versus BAE's 27.6%.

What carries the argument

The hybrid genetic algorithm employing a GloVe embedding-based mutation operator to generate word replacements while optimizing for adversarial effect using model logits.

If this is right

The genetic search finds adversarial examples that reduce accuracy more effectively than BAE and A2T.
Only logit values are needed, allowing attacks without gradient or internal model access.
The generated examples have slightly lower semantic similarity but still succeed in fooling the models.
Run time increases by approximately 5 percent over the compared methods.
Nearly twice as many words are perturbed compared to the baseline attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models exposed through APIs providing only predictions could be more vulnerable to such evolutionary attacks than previously thought.
Improving semantic similarity in mutation operators might allow even stronger attacks with fewer perturbations.
The approach highlights the need for robustness testing that includes population-based search methods rather than just local perturbations.
Extending GAversary to other modalities or tasks could reveal similar vulnerabilities in different AI systems.

Load-bearing premise

GloVe embeddings produce word replacements that maintain sufficient semantic similarity while permitting the genetic algorithm to locate adversarial examples based solely on logit outputs.

What would settle it

Running GAversary on the evaluated models and datasets and observing that the accuracy reduction is no greater than that achieved by BAE or A2T, or that semantic similarity scores drop below acceptable thresholds.

Figures

Figures reproduced from arXiv: 2606.27215 by Alexander E. I. Brownlee, Manjinder Singh, Mohamed Elawady.

**Figure 2.** Figure 2: Four solutions using the compact representation. Each solution is [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Masking is used in the mutation operator, so the GloVe can iden [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Deep learning models have achieved impressive performance across various fields but remain vulnerable to adversarial inputs, particularly in NLP, where such attacks can have significant real-world consequences. Adversarial attacks often involve small, semantically similar token replacements to fool NLP models, and recent methods have become more precise by targeting specific vulnerable words, often by exploiting some level of access to the model's internal structure. This paper proposes GAversary, a hybrid Genetic Algorithm (GA) to generate adversarial attacks on natural language models. The GA is able to treat the target model as a black box, requiring only the logit value output by the model to guide the search. GAversary differs from GAs previously proposed for this problem by using GloVe embeddings to propose word replacements (the mutation operator) to improve the semantic similarity of the adversarial examples. GAversary is applied to several benchmark data sets and well-known target models. GAversary is able to substantially reduce the target model's accuracy on test data compared to the BAE and A2T attacks compared against (in the best case, reducing a 76.8% accuracy to 5.8%, compared to BAE's 27.6%). The trade-off is that GAversary perturbs just under twice as many words as the other two methods, with a slightly lower semantic similarity to the original text and around a 5% increase in run-time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAversary gets stronger attacks mainly by changing almost twice as many words as BAE and A2T, so the GA plus GloVe design may not be the real driver.

read the letter

The central point is that GAversary reduces target accuracy more than the baselines, but the abstract is clear that it perturbs just under twice as many words while accepting slightly lower semantic similarity. That trade-off is stated up front, which means the headline numbers (76.8% down to 5.8% versus BAE at 27.6%) could be explained by the relaxed edit budget rather than the genetic search or the embedding-guided mutations.

What is actually new is the specific combination: a black-box genetic algorithm that uses GloVe to propose replacements in the mutation step. Earlier GA attacks are referenced, and this version adds the embedding lookup to keep replacements more plausible while still relying only on logit outputs. That is a modest but concrete extension.

The work does a few things cleanly. It stays strictly black-box, reports the runtime and similarity costs alongside the attack success, and applies the method to standard benchmarks and models. No fitted parameters or circular derivations are involved; it is an empirical algorithmic tweak.

The soft spot is the comparison. The abstract does not indicate that BAE or A2T were re-run with the same number of allowed changes, so it is hard to isolate how much the GA and GloVe operator actually add. If the gap shrinks under a matched perturbation cap, the main result weakens. The lack of reported run counts, error bars, or statistical tests in the abstract also leaves the empirical claim harder to assess until the full experimental section is checked.

This paper is for researchers who test or build NLP robustness tools and want another attack variant to try. It is not a broad theoretical advance, but the black-box framing and the explicit trade-off reporting make it usable for that audience.

I would send it to peer review. The core method is straightforward and the authors acknowledge the costs; a referee can ask for the controlled budget experiment and the missing experimental details without starting from scratch.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GAversary, a genetic algorithm for black-box adversarial attacks on NLP classifiers. It employs GloVe embeddings for the mutation operator to generate semantically similar word replacements, guided solely by model logits. Evaluated on benchmark datasets and models, it claims superior attack performance over BAE and A2T, reducing accuracy from 76.8% to 5.8% compared to BAE's 27.6%, while perturbing nearly twice as many words.

Significance. Should the superiority be confirmed under equivalent perturbation constraints, this would establish that GA-based search with embedding-guided mutations can yield stronger black-box attacks than existing methods, emphasizing the need for robust defenses in NLP systems.

major comments (2)

[Abstract] The central claim of substantially better attack success (76.8% to 5.8% accuracy reduction vs. BAE at 27.6%) is presented without evidence that the BAE and A2T baselines were evaluated under the same word perturbation budget; since GAversary perturbs just under twice as many words, the performance gap may be explained by the relaxed constraint rather than the GA or GloVe design.
[Abstract] No details are provided on the specific benchmark datasets, target models, number of runs, variance across runs, or statistical tests supporting the reported accuracy figures, rendering the empirical superiority claim unverifiable from the given text.

minor comments (1)

[Abstract] The run-time increase is stated as 'around a 5%' without specifying the baseline or measurement method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, agreeing where the concerns are valid and outlining planned revisions.

read point-by-point responses

Referee: [Abstract] The central claim of substantially better attack success (76.8% to 5.8% accuracy reduction vs. BAE at 27.6%) is presented without evidence that the BAE and A2T baselines were evaluated under the same word perturbation budget; since GAversary perturbs just under twice as many words, the performance gap may be explained by the relaxed constraint rather than the GA or GloVe design.

Authors: We agree this is a valid concern. The abstract already states the trade-off that GAversary perturbs nearly twice as many words, and the reported results use each method's standard configuration from the original papers. To strengthen the comparison and isolate the contribution of the GA and GloVe mutation, we will add new experiments in the revised manuscript that constrain all methods to identical perturbation budgets. revision: yes
Referee: [Abstract] No details are provided on the specific benchmark datasets, target models, number of runs, variance across runs, or statistical tests supporting the reported accuracy figures, rendering the empirical superiority claim unverifiable from the given text.

Authors: The abstract is intentionally concise, but the full manuscript's Experiments section specifies the datasets, models, run counts, variance, and any statistical tests. We will revise the abstract to include the key dataset and model names plus a brief note on experimental repetition, and we will ensure the results section explicitly reports variance and significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical algorithmic contribution

full rationale

The paper proposes GAversary, a genetic algorithm for generating adversarial text using GloVe-based mutations, and reports empirical accuracy reductions on benchmark datasets compared to BAE and A2T. No derivation chain, equations, fitted parameters, or first-principles results are present that could reduce to inputs by construction. Comparisons are experimental; the noted difference in perturbation count is a methodological detail, not a circular reduction. No self-citations or ansatzes are load-bearing for any claimed result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on standard assumptions from genetic algorithms and word embeddings without introducing new free parameters or entities beyond the algorithm itself.

free parameters (1)

Genetic algorithm hyperparameters
Population size, generations, mutation and crossover rates are required for the method but not specified in the abstract.

axioms (1)

domain assumption GloVe embeddings can be used to propose word replacements that preserve semantic similarity better than random or other substitution methods.
Invoked directly in the mutation operator description.

pith-pipeline@v0.9.1-grok · 5784 in / 1204 out tokens · 45498 ms · 2026-06-26T04:37:55.016951+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 11 linked inside Pith

[1]

Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei. Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states.Proceedings of the National Academy of Sci- ences, 114(50):13108–13113, 2017

2017
[2]

A study and comparison of human and deep learning recognition performance under visual distortions

Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN), pages 1–7. IEEE, 2017

2017
[3]

No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems

Myrthe Reuver, Antske Fokkens, and Suzan Verberne. No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems. InProceedings of the EACL Hackashop on News Me- dia Content Analysis and Automated Report Generation, pages 45–55, Online, April 2021. Association for Computational Linguistics

2021
[4]

Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020

M ´Angeles L´ opez-Cabarcos, Ada M P´ erez-Pico, Maria Luisa L´ opez Perez, et al. Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020

2020
[5]

The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005

Brian M Lucey and Michael Dowling. The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005

2005
[6]

Token-modification adversarial attacks for natural language pro- cessing: A survey.arXiv preprint arXiv:2103.00676, 2021

Tom Roth, Yansong Gao, Alsharif Abuadbba, Surya Nepal, and Wei Liu. Token-modification adversarial attacks for natural language pro- cessing: A survey.arXiv preprint arXiv:2103.00676, 2021

arXiv 2021
[7]

Textbug- ger: Generating adversarial text against real-world applications.arXiv preprint arXiv:1812.05271, 2018

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. Textbug- ger: Generating adversarial text against real-world applications.arXiv preprint arXiv:1812.05271, 2018

Pith/arXiv arXiv 2018
[8]

Black-box generation of adversarial text sequences to evade deep learning clas- sifiers

Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. Black-box generation of adversarial text sequences to evade deep learning clas- sifiers. In2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018

2018
[9]

Bert-attack: Adversarial attack against bert using bert

Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. Bert-attack: Adversarial attack against bert using bert. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 6193–6202, 2020

2020
[10]

Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020

Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming- Ting Sun, and Bill Dolan. Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020

arXiv 2009
[11]

Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994

Edward J Anderson and Michael C Ferris. Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994

1994
[12]

Springer, 2020

Fouad Bennis and Rajib Kumar Bhattacharjya.Nature-Inspired Meth- ods for Metaheuristics Optimization: Algorithms and Applications in Science and Engineering, volume 16. Springer, 2020. 20

2020
[13]

Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples

Shijun Ye, Pengcheng Zhang, Hai Dong, and Shunhui Ji. Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples. In2021 IEEE International Conference on Artificial Intelligence Testing (AITest), pages 39–40. IEEE, 2021

2021
[14]

Generating natural language adversarial examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empir- ical Methods in Natural Language Processing, pages 2890–2896, 2018

2018
[16]

One billion word bench- mark for measuring progress in statistical language modeling.arXiv preprint arXiv:1312.3005, 2013

Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word bench- mark for measuring progress in statistical language modeling.arXiv preprint arXiv:1312.3005, 2013

Pith/arXiv arXiv 2013
[17]

Certi- fied robustness to adversarial word substitutions

Robin Jia, Aditi Raghunathan, Kerem G¨ oksel, and Percy Liang. Certi- fied robustness to adversarial word substitutions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Process- ing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4129–4142, 2019

2019
[18]

On the effectiveness of interval bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715, 2018

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715, 2018

arXiv 2018
[19]

Is bert really robust? a strong baseline for natural language attack on text classification and entailment

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020

2020
[20]

Bert: Pre-training of deep bidirectional transformers for language un- derstanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding. InProceedings of the 2019 conference of the North Amer- ican chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171– 4186, 2019. 21

2019
[21]

Bae: Bert-based ad- versarial examples for text classification

Siddhant Garg and Goutham Ramakrishnan. Bae: Bert-based ad- versarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, 2020

2020
[22]

Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy- anov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907
[23]

Towards improving adversarial train- ing of nlp models

Jin Yong Yoo and Yanjun Qi. Towards improving adversarial train- ing of nlp models. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 945–956, 2021

2021
[24]

Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019

Pith/arXiv arXiv 1910
[25]

Erhan, Ian J

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks.CoRR, abs/1312.6199, 2013

Pith/arXiv arXiv 2013
[26]

Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021

Chuan Guo, Alexandre Sablayrolles, Herv´ e J´ egou, and Douwe Kiela. Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021

arXiv 2021
[27]

Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work

Lifan Yuan, Yichi Zhang, Yangyi Chen, and Wei Wei. Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7132–7146, 2023

2023
[28]

Semat- tack: Natural textual attacks via different semantic spaces

Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, and Bo Li. Semat- tack: Natural textual attacks via different semantic spaces. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 176–205, 2022

2022
[29]

Texthoaxer: Budgeted hard-label adversarial attacks on text

Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. Texthoaxer: Budgeted hard-label adversarial attacks on text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3877– 3884, 2022

2022
[30]

Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack

Zhen Yu, Xiaosen Wang, Wanxiang Che, and Kun He. Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, pages 622–637, 2022

2022
[31]

Adversarial text gen- eration by search and learning

Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiao- dan Zhang, Longtao Huang, and Honglei Lyu. Adversarial text gen- eration by search and learning. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2023
[32]

Natural language adversarial at- tacks and defenses in word level

Xiaosen Wang, Jin Hao, and Kun He. Natural language adversarial at- tacks and defenses in word level. InArXiv preprint arXiv:1909.06723v1, 2019

arXiv 1909
[33]

Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009

Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009

2009
[34]

Counter-fitting word vectors to linguistic constraints

Nikola Mrkˇ si´ c, Diarmuid O S´ eaghdha, Blaise Thomson, Milica Gaˇ si´ c, Lina Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016

Pith/arXiv arXiv 2016
[35]

Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017

Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017

Pith/arXiv arXiv 2017
[36]

Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017

Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017

Pith/arXiv arXiv 2017
[37]

vulnerability of natural language classifiers to evolutionary generated adversarial text

Alexander E. I. Brownlee and M. and Singh. Data and processing scripts for the paper “vulnerability of natural language classifiers to evolutionary generated adversarial text”, 2025. URL - TBC on publi- cation [Online; accessed 7-March-2025]

2025
[38]

Lulu, second edition, 2013

Sean Luke.Essentials of Metaheuristics. Lulu, second edition, 2013. Available for free at http://cs.gmu.edu/∼sean/book/metaheuristics/

2013
[39]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014

2014
[40]

Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp

John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126, 2020

2020
[41]

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. InACL, 2005

2005
[42]

Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015

2015
[43]

Convolutional neural networks for sentence classification

Yoon Kim. Convolutional neural networks for sentence classification. InEMNLP, 2014

2014
[44]

Long short-term memory

Sepp Hochreiter and J¨ urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

1997
[45]

Bert: Pre-training of deep bidirectional transformers for language un- derstanding.arXiv preprint arXiv:1810.04805, 2018

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding.arXiv preprint arXiv:1810.04805, 2018

Pith/arXiv arXiv 2018
[46]

Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021

Zara Nasar, Syed Waqar Jaffry, and Muhammad Kamran Malik. Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021

2021
[47]

Pablo Moscato and Michael G Norman. A memetic approach for the traveling salesman problem implementation of a computational ecology for combinatorial optimization on message-passing systems.Parallel computing and transputer applications, 1:177–186, 1992

1992
[48]

The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024
[49]

Mixtral of experts

Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024. 24

Pith/arXiv arXiv 2024

[1] [1]

Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei. Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states.Proceedings of the National Academy of Sci- ences, 114(50):13108–13113, 2017

2017

[2] [2]

A study and comparison of human and deep learning recognition performance under visual distortions

Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN), pages 1–7. IEEE, 2017

2017

[3] [3]

No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems

Myrthe Reuver, Antske Fokkens, and Suzan Verberne. No NLP task 19 should be an island: Multi-disciplinarity for diversity in news recom- mender systems. InProceedings of the EACL Hackashop on News Me- dia Content Analysis and Automated Report Generation, pages 45–55, Online, April 2021. Association for Computational Linguistics

2021

[4] [4]

Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020

M ´Angeles L´ opez-Cabarcos, Ada M P´ erez-Pico, Maria Luisa L´ opez Perez, et al. Investor sentiment in the theoretical field of behavioural finance.Economic research-Ekonomska istraˇ zivanja, 33(1):2101–2119, 2020

2020

[5] [5]

The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005

Brian M Lucey and Michael Dowling. The role of feelings in investor decision-making.Journal of economic surveys, 19(2):211–237, 2005

2005

[6] [6]

Token-modification adversarial attacks for natural language pro- cessing: A survey.arXiv preprint arXiv:2103.00676, 2021

Tom Roth, Yansong Gao, Alsharif Abuadbba, Surya Nepal, and Wei Liu. Token-modification adversarial attacks for natural language pro- cessing: A survey.arXiv preprint arXiv:2103.00676, 2021

arXiv 2021

[7] [7]

Textbug- ger: Generating adversarial text against real-world applications.arXiv preprint arXiv:1812.05271, 2018

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. Textbug- ger: Generating adversarial text against real-world applications.arXiv preprint arXiv:1812.05271, 2018

Pith/arXiv arXiv 2018

[8] [8]

Black-box generation of adversarial text sequences to evade deep learning clas- sifiers

Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. Black-box generation of adversarial text sequences to evade deep learning clas- sifiers. In2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018

2018

[9] [9]

Bert-attack: Adversarial attack against bert using bert

Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. Bert-attack: Adversarial attack against bert using bert. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 6193–6202, 2020

2020

[10] [10]

Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020

Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming- Ting Sun, and Bill Dolan. Contextualized perturbation for textual adversarial attack.arXiv preprint arXiv:2009.07502, 2020

arXiv 2009

[11] [11]

Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994

Edward J Anderson and Michael C Ferris. Genetic algorithms for com- binatorial optimization: the assemble line balancing problem.ORSA Journal on Computing, 6(2):161–173, 1994

1994

[12] [12]

Springer, 2020

Fouad Bennis and Rajib Kumar Bhattacharjya.Nature-Inspired Meth- ods for Metaheuristics Optimization: Algorithms and Applications in Science and Engineering, volume 16. Springer, 2020. 20

2020

[13] [13]

Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples

Shijun Ye, Pengcheng Zhang, Hai Dong, and Shunhui Ji. Heuristic- word-selection genetic algorithm for generating natural language adver- sarial examples. In2021 IEEE International Conference on Artificial Intelligence Testing (AITest), pages 39–40. IEEE, 2021

2021

[14] [14]

Generating natural language adversarial examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empir- ical Methods in Natural Language Processing, pages 2890–2896, 2018

2018

[15] [16]

One billion word bench- mark for measuring progress in statistical language modeling.arXiv preprint arXiv:1312.3005, 2013

Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word bench- mark for measuring progress in statistical language modeling.arXiv preprint arXiv:1312.3005, 2013

Pith/arXiv arXiv 2013

[16] [17]

Certi- fied robustness to adversarial word substitutions

Robin Jia, Aditi Raghunathan, Kerem G¨ oksel, and Percy Liang. Certi- fied robustness to adversarial word substitutions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Process- ing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4129–4142, 2019

2019

[17] [18]

On the effectiveness of interval bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715, 2018

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models.arXiv preprint arXiv:1810.12715, 2018

arXiv 2018

[18] [19]

Is bert really robust? a strong baseline for natural language attack on text classification and entailment

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020

2020

[19] [20]

Bert: Pre-training of deep bidirectional transformers for language un- derstanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding. InProceedings of the 2019 conference of the North Amer- ican chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171– 4186, 2019. 21

2019

[20] [21]

Bae: Bert-based ad- versarial examples for text classification

Siddhant Garg and Goutham Ramakrishnan. Bae: Bert-based ad- versarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, 2020

2020

[21] [22]

Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy- anov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907

[22] [23]

Towards improving adversarial train- ing of nlp models

Jin Yong Yoo and Yanjun Qi. Towards improving adversarial train- ing of nlp models. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 945–956, 2021

2021

[23] [24]

Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019

Pith/arXiv arXiv 1910

[24] [25]

Erhan, Ian J

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks.CoRR, abs/1312.6199, 2013

Pith/arXiv arXiv 2013

[25] [26]

Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021

Chuan Guo, Alexandre Sablayrolles, Herv´ e J´ egou, and Douwe Kiela. Gradient-based adversarial attacks against text transformers.arXiv preprint arXiv:2104.13733, 2021

arXiv 2021

[26] [27]

Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work

Lifan Yuan, Yichi Zhang, Yangyi Chen, and Wei Wei. Bridge the gap between cv and nlp! a gradient-based textual adversarial attack frame- work. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7132–7146, 2023

2023

[27] [28]

Semat- tack: Natural textual attacks via different semantic spaces

Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, and Bo Li. Semat- tack: Natural textual attacks via different semantic spaces. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 176–205, 2022

2022

[28] [29]

Texthoaxer: Budgeted hard-label adversarial attacks on text

Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. Texthoaxer: Budgeted hard-label adversarial attacks on text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3877– 3884, 2022

2022

[29] [30]

Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack

Zhen Yu, Xiaosen Wang, Wanxiang Che, and Kun He. Texthacker: Learning based hybrid local search algorithm for text hard-label adver- 22 sarial attack. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, pages 622–637, 2022

2022

[30] [31]

Adversarial text gen- eration by search and learning

Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiao- dan Zhang, Longtao Huang, and Honglei Lyu. Adversarial text gen- eration by search and learning. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2023

[31] [32]

Natural language adversarial at- tacks and defenses in word level

Xiaosen Wang, Jin Hao, and Kun He. Natural language adversarial at- tacks and defenses in word level. InArXiv preprint arXiv:1909.06723v1, 2019

arXiv 1909

[32] [33]

Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009

Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classi- fication using distant supervision.CS224N project report, Stanford, 1(12):2009, 2009

2009

[33] [34]

Counter-fitting word vectors to linguistic constraints

Nikola Mrkˇ si´ c, Diarmuid O S´ eaghdha, Blaise Thomson, Milica Gaˇ si´ c, Lina Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016

Pith/arXiv arXiv 2016

[34] [35]

Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017

Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. Deep text classification can be fooled.arXiv preprint arXiv:1704.08006, 2017

Pith/arXiv arXiv 2017

[35] [36]

Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017

Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples.arXiv preprint arXiv:1710.11342, 2017

Pith/arXiv arXiv 2017

[36] [37]

vulnerability of natural language classifiers to evolutionary generated adversarial text

Alexander E. I. Brownlee and M. and Singh. Data and processing scripts for the paper “vulnerability of natural language classifiers to evolutionary generated adversarial text”, 2025. URL - TBC on publi- cation [Online; accessed 7-March-2025]

2025

[37] [38]

Lulu, second edition, 2013

Sean Luke.Essentials of Metaheuristics. Lulu, second edition, 2013. Available for free at http://cs.gmu.edu/∼sean/book/metaheuristics/

2013

[38] [39]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014

2014

[39] [40]

Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp

John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. Textattack: A framework for adversarial attacks, data 23 augmentation, and adversarial training in nlp. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126, 2020

2020

[40] [41]

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. InACL, 2005

2005

[41] [42]

Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolu- tional networks for text classification.Advances in neural information processing systems, 28, 2015

2015

[42] [43]

Convolutional neural networks for sentence classification

Yoon Kim. Convolutional neural networks for sentence classification. InEMNLP, 2014

2014

[43] [44]

Long short-term memory

Sepp Hochreiter and J¨ urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

1997

[44] [45]

Bert: Pre-training of deep bidirectional transformers for language un- derstanding.arXiv preprint arXiv:1810.04805, 2018

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language un- derstanding.arXiv preprint arXiv:1810.04805, 2018

Pith/arXiv arXiv 2018

[45] [46]

Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021

Zara Nasar, Syed Waqar Jaffry, and Muhammad Kamran Malik. Named entity recognition and relation extraction: State-of-the-art.ACM Com- puting Surveys (CSUR), 54(1):1–39, 2021

2021

[46] [47]

Pablo Moscato and Michael G Norman. A memetic approach for the traveling salesman problem implementation of a computational ecology for combinatorial optimization on message-passing systems.Parallel computing and transputer applications, 1:177–186, 1992

1992

[47] [48]

The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of mod- els.arXiv preprint arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024

[48] [49]

Mixtral of experts

Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024. 24

Pith/arXiv arXiv 2024