arxiv: 2605.11188 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.AI· cs.ET

Recognition: no theorem link

Adversarial SQL Injection Generation with LLM-Based Architectures

Ali Karakoc , H. Birkan Yilmaz

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:07 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.ET

keywords SQL injectionadversarial attackslarge language modelsweb application firewallsbypass evaluationretrieval augmented generationsecurity testing

0 comments

The pith

LLM-based systems generate SQL injection payloads that bypass web application firewalls at rates up to 22.73 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops and tests two new LLM-based methods for automatically creating adversarial SQL injection payloads meant to evade detection by web application firewalls. The authors run 240 experiments that produce 240,000 payloads and execute 2.2 million checks against six rule-based, two AI/ML-based, and two commercial WAFs using models such as GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1. A sympathetic reader would care because automated tools that reliably find bypasses could let developers test and harden their applications against real attacks before they occur. The results show that the RADAGAS variant paired with GPT-4o reaches the highest overall bypass rate while performing especially well against machine-learning defenses.

Core claim

The authors introduce RADAGAS, a retrieval-augmented generation system, and RefleXQLi, a reflective chain-of-thought approach, for producing adversarial SQLi payloads. Across extensive tests, RADAGAS-GPT4o achieves a 22.73 percent bypass rate and outperforms other baselines. The methods reach high success rates on AI/ML-based WAFs yet remain largely ineffective against rule-based ones, and less diverse payload sets sometimes produce more bypasses provided the starting payload succeeds.

What carries the argument

RADAGAS, a retrieval-augmented generation system that pulls relevant examples to steer large language models toward creating effective adversarial SQL injection payloads.

If this is right

RADAGAS-DeepSeek reaches 92.49 percent bypass on the AI-based WAF Brain.
RADAGAS-Claude reaches 80.48 percent bypass on the AI-based CNN-WAF.
Bypass rates on rule-based WAFs such as ModSecurity and Coraza remain between 0 and 5.70 percent.
Less diverse payload sets can increase bypass counts but perform poorly when the initial payload fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security teams could add LLM generators to routine testing suites to simulate advanced attacks against current defenses.
Rule-based WAF developers may need to add dynamic pattern matching to counter the outputs of these LLM methods.
Hybrid systems that combine LLM generation with traditional fuzzing could reduce the risk that a single weak starting payload blocks all further attempts.

Load-bearing premise

The 240 experiments with chosen prompts and WAF configurations accurately reflect real-world adversarial conditions without introducing bias from those choices.

What would settle it

Applying the generated payloads against additional live web applications protected by the tested WAFs and measuring whether the reported bypass rates hold under actual traffic.

read the original abstract

SQL injection (SQLi) attacks are still one of the serious attacks ranked in the Open Worldwide Application Security Project (OWASP) Top 10 threats. Today, with advances in Artificial Intelligence (AI), especially in Large Language Models (LLMs), an opportunity has been created for automating adversarial attack tests to measure the defense mechanisms. In this paper, we aim to create a comprehensive evaluation of use cases that utilize LLMs for adversarial SQL injection generation. We introduce two novel LLM-based systems, Retrieval Augmented Generation for Adversarial SQLi (RADAGAS) and Reflective Chain-of-Thought SQLi (RefleXQLi), and compare them with existing baselines against 10 Web Application Firewalls (WAFs) and one execution-based MySQL validator. To perform a comprehensive test, we used six rule-based open-source WAFs (ModSecurity PL1--3, Coraza PL1--3), 2 AI/ML-based WAFs (WAF Brain, CNN-WAF), and 2 commercial WAFs (AWS WAF and Cloudflare WAF). For the LLM models, we used GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1. Our tests consist of 240 experiments that generate 240,000 payloads and perform 2.2 million tests against WAFs. Our comprehensive evaluation reveals that RADAGAS-GPT4o outperforms other baseline models with a 22.73\% bypass rate. The proposed RADAGAS variants are highly successful on AI/ML-based WAFs (92.49\% on WAF-Brain by RADAGAS-DeepSeek, 80.48\% on CNN-WAF by RADAGAS-Claude), but struggle to bypass rule-based WAFs (0--5.70\% on ModSecurity and Coraza). In addition to these findings, another observation is that creating less diverse payloads achieves more bypasses, however they show poor results if the initially chosen payload is not successful. We observe that our findings provide a comprehensive view on using LLM-based approaches in security testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RADAGAS and RefleXQLi give a useful large-scale benchmark for LLM-generated SQLi against real WAFs, but the gains look tied to prompt choices.

read the letter

The paper's main value is the scale and the two new LLM setups for generating adversarial SQL injections. RADAGAS adds retrieval augmentation while RefleXQLi uses reflective chain-of-thought, and both get compared to baselines across GPT-4o, Claude, and DeepSeek. They ran 240 experiments that produced 240k payloads and 2.2 million tests on ten WAFs, split between rule-based open-source, ML-based, and commercial ones. The standout numbers are the 22.73% overall bypass for RADAGAS-GPT4o and the much higher rates on the AI/ML WAFs, up to 92% on WAF-Brain. That kind of head-to-head data on diverse defenses is not common and gives practitioners something concrete to work with when thinking about automated testing.

Referee Report

3 major / 2 minor

Summary. The paper introduces two novel LLM-based systems, RADAGAS (Retrieval Augmented Generation for Adversarial SQLi) and RefleXQLi (Reflective Chain-of-Thought SQLi), for generating adversarial SQL injection payloads. It evaluates these against baselines (using GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1) on 10 WAFs (six rule-based open-source, two AI/ML-based, and two commercial) plus a MySQL validator via 240 experiments that produce 240,000 payloads and 2.2 million tests. The central claim is that RADAGAS-GPT4o achieves the highest bypass rate of 22.73%, with strong results on AI/ML WAFs (e.g., 92.49% on WAF-Brain) but low success on rule-based WAFs (0-5.70%), plus the observation that less diverse payloads yield higher bypass rates when the initial payload succeeds.

Significance. If the bypass rates hold, the work supplies a large-scale empirical benchmark on LLM-driven adversarial SQLi generation, quantifying the relative strengths of retrieval-augmented and reflective techniques against different WAF classes. The 2.2 million test scale and coverage of open-source, ML-based, and commercial defenses provide concrete data that could inform both offensive security tooling and defensive WAF hardening, particularly highlighting the comparative weakness of rule-based systems.

major comments (3)

[Evaluation] Evaluation section: The exact prompt templates, retrieval corpus, and chain-of-thought instructions for RADAGAS and RefleXQLi are not specified. Because the 22.73% bypass claim for RADAGAS-GPT4o and the outperformance over baselines rest on these custom generation procedures, and the skeptic note indicates sensitivity to prompt variations, the absence of reproducible prompt details prevents verification that the reported advantage derives from the architectures rather than hyperparameter or prompt tuning.
[Results] Results section: Bypass rates (e.g., 22.73% overall for RADAGAS-GPT4o, 92.49% on WAF-Brain) are presented as point estimates without error bars, confidence intervals, or statistical significance tests across the 240 experiments. This omission is load-bearing for the comparative claim, as variance from payload sampling or WAF response stochasticity could alter whether the observed outperformance is reliable.
[WAF testing setup] WAF testing setup: The paper does not state whether the 10 WAFs were evaluated under default configurations or with any custom rules, thresholds, or versions. Given that bypass performance is known to be sensitive to WAF rule sets (especially for ModSecurity PL1-3 and Coraza), this detail is required to interpret the low rule-based bypass rates (0-5.70%) versus high AI/ML rates as generalizable rather than configuration-specific.

minor comments (2)

[Discussion] The note that less diverse payloads achieve more bypasses would be strengthened by quantitative diversity metrics (e.g., number of unique payloads or lexical entropy) rather than a qualitative observation.
[Results] A table summarizing per-WAF and per-model bypass rates with exact experiment counts would improve readability of the 2.2 million test results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important aspects of reproducibility, statistical rigor, and experimental clarity. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims or findings.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The exact prompt templates, retrieval corpus, and chain-of-thought instructions for RADAGAS and RefleXQLi are not specified. Because the 22.73% bypass claim for RADAGAS-GPT4o and the outperformance over baselines rest on these custom generation procedures, and the skeptic note indicates sensitivity to prompt variations, the absence of reproducible prompt details prevents verification that the reported advantage derives from the architectures rather than hyperparameter or prompt tuning.

Authors: We agree that full reproducibility requires the exact prompt templates, retrieval corpus composition, and chain-of-thought instructions. The manuscript describes the high-level architectures of RADAGAS and RefleXQLi in the methods section, but does not include the verbatim prompts or corpus details. In the revised version, we will add a dedicated appendix containing the complete prompt templates for each LLM, the structure and size of the retrieval corpus, and the precise reflective CoT instructions. This will enable independent verification that performance differences arise from the proposed techniques. revision: yes
Referee: [Results] Results section: Bypass rates (e.g., 22.73% overall for RADAGAS-GPT4o, 92.49% on WAF-Brain) are presented as point estimates without error bars, confidence intervals, or statistical significance tests across the 240 experiments. This omission is load-bearing for the comparative claim, as variance from payload sampling or WAF response stochasticity could alter whether the observed outperformance is reliable.

Authors: The referee correctly notes that only point estimates are reported. Although the scale of 240 experiments and 2.2 million tests provides substantial empirical support, we did not include error bars, confidence intervals, or formal statistical tests. In the revision, we will compute and report 95% confidence intervals for the bypass rates and add a brief discussion of observed variance across runs. Where direct comparisons are made, we will include appropriate statistical significance indicators to substantiate the outperformance claims. revision: yes
Referee: [WAF testing setup] WAF testing setup: The paper does not state whether the 10 WAFs were evaluated under default configurations or with any custom rules, thresholds, or versions. Given that bypass performance is known to be sensitive to WAF rule sets (especially for ModSecurity PL1-3 and Coraza), this detail is required to interpret the low rule-based bypass rates (0-5.70%) versus high AI/ML rates as generalizable rather than configuration-specific.

Authors: We used the default configurations for every WAF: standard ModSecurity and Coraza installations at the stated paranoia levels (PL1–3) with no custom rules added, and the AI/ML-based and commercial WAFs in their out-of-the-box settings. This information was omitted from the manuscript. In the revised evaluation section, we will explicitly state that all tests used default configurations and versions, thereby clarifying that the reported differences between rule-based and AI/ML WAFs reflect standard deployments. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking with no derivation chain or fitted predictions

full rationale

The paper conducts a direct empirical evaluation of LLM-based SQLi payload generators (RADAGAS and RefleXQLi) against 10 WAFs and a MySQL validator using 240 experiments that produce 240,000 payloads and 2.2 million tests. No equations, parameters fitted to subsets then renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the reported methodology or results. Bypass rates (e.g., 22.73% for RADAGAS-GPT4o) are measured outcomes from external system interactions, not derived quantities that reduce to the inputs by construction. The study is self-contained against its chosen benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The contribution is an empirical evaluation of two new LLM prompting architectures for security testing. No free parameters, mathematical axioms, or invented theoretical entities are involved.

pith-pipeline@v0.9.0 · 5693 in / 1212 out tokens · 66623 ms · 2026-05-13T02:07:13.821333+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 5 internal anchors

[1]

https://owasp.org/Top10/

OWASP Foundation: OWASP Top 10-2021: The Ten Most Critical Web Appli- cation Security Risks (2021). https://owasp.org/Top10/

work page 2021
[2]

Open-source penetration testing tool (2006)

Damele, B., Stampar, M.: SQLMAP: Automatic SQL Injection and Database Takeover Tool. Open-source penetration testing tool (2006). https://sqlmap.org/

work page 2006
[3]

GPT-4 Technical Report

OpenAI: GPT-4 Technical Report (2024). https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Claude-3 Model Card1(1), 4 (2024)

Anthropic-AI: The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card1(1), 4 (2024)

work page 2024
[5]

DeepSeek-Coder-V2: Breaking the barrier of closed-source models in code intelligence.arXiv preprint arXiv:2406.11931,

DeepSeek-AI: DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence (2024). https://arxiv.org/abs/2406.11931

work page arXiv 2024
[6]

IEEE Transactions on Software Engineering47(11), 2312–2331 (2019)

Manes, V.J., Han, H., Han, C., Cha, S.K., Egele, M., Schwartz, E.J., Woo, M.: The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering47(11), 2312–2331 (2019)

work page 2019
[7]

https: //modsecurity.org/

OWASP: ModSecurity: Open Source Web Application Firewall (2024). https: //modsecurity.org/

work page 2024
[8]

Future Internet17(1) (2025) https://doi.org/10.3390/ fi17010008

Babaey, V., Ravindran, A.: GenSQLi: A generative artificial intelligence frame- work for automatically securing web application firewalls against structured query language injection attacks. Future Internet17(1) (2025) https://doi.org/10.3390/ fi17010008

work page 2025
[9]

In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp

Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., Xia, H., Xu, J., Wu, Z., Chang, B., Sun, X., Li, L., Sui, Z.: A survey on in-context learning. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1107–

work page
[10]

A survey on in-context learning

Association for Computational Linguistics, Miami, Florida, USA (2024). https://doi.org/10.18653/v1/2024.emnlp-main.64

work page doi:10.18653/v1/2024.emnlp-main.64 2024
[11]

In: IEEE International Symposium on Consumer Elec- tronics (ISCE), pp

Kindy, D.A., Pathan, A.-S.K.: A survey on SQL injection: Vulnerabilities, attacks, and prevention techniques. In: IEEE International Symposium on Consumer Elec- tronics (ISCE), pp. 468–471 (2011). https://doi.org/10.1109/ISCI.2011.5973873

work page doi:10.1109/isci.2011.5973873 2011
[12]

Elsevier, Waltham, MA (2012) 29

Clarke, J.: SQL Injection Attacks and Defense. Elsevier, Waltham, MA (2012) 29

work page 2012
[13]

1–10 (2015)

Appelt, D., Nguyen, C.D., Briand, L.: Behind an application firewall, are we safe from SQL injection attacks? In: IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 1–10 (2015)

work page 2015
[14]

Journal of Internet Services and Applications10(1), 1–22 (2019)

Pan, Y., Sun, F., Teng, Z., White, J., Schmidt, D.C., Staples, J., Krause, L.: Detecting web attacks with end-to-end deep learning. Journal of Internet Services and Applications10(1), 1–22 (2019)

work page 2019
[15]

https://github.com/BBVA/waf-brain

BBVA-Labs: WAF-Brain: Machine Learning Based Web Application Firewall (2018). https://github.com/BBVA/waf-brain

work page 2018
[16]

Applied Sciences14(16) (2024) https://doi.org/10

Gui, Z., Wang, E., Deng, B., Zhang, M., Chen, Y., Wei, S., Xie, W., Wang, B.: SqliGPT: Evaluating and utilizing large language models for automated SQL injection black-box detection. Applied Sciences14(16) (2024) https://doi.org/10. 3390/app14166929

work page 2024
[17]

In: IEEE International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp

Yang, T., Jiang, Z., Wang, Y.: LLMSQLi: A black-box web SQLi detection tool based on large language model. In: IEEE International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp. 629–633 (2024)

work page 2024
[18]

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning,

Sun, Y., Wu, D., Xue, Y., Liu, H., Ma, W., Zhang, L., Liu, Y., Li, Y.: LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning (2025). https://arxiv.org/abs/2401.16185

work page arXiv 2025
[19]

In: USENIX Security Symposium (USENIX Security 24), pp

Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y., Pinzger, M., Rass, S.: PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In: USENIX Security Symposium (USENIX Security 24), pp. 847–864 (2024). https://www.usenix.org/conference/ usenixsecurity24/presentation/deng

work page 2024
[20]

Prompt Injection attack against LLM-integrated Applications

Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y., Wang, H., Zheng, Y., Zhang, L.Y., Liu, Y.: Prompt Injection attack against LLM- integrated Applications (2025). https://arxiv.org/abs/2306.05499

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

The Curious Case of Neural Text Degeneration

Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The Curious Case of Neural Text Degeneration (2020). https://arxiv.org/abs/1904.09751

work page internal anchor Pith review Pith/arXiv arXiv 2020
[22]

Lost in the Middle: How Language Models Use Long Contexts

Meister, C., Pimentel, T., Wiher, G., Cotterell, R.: Locally typical sampling. Transactions of the Association for Computational Linguistics11, 102–121 (2023) https://doi.org/10.1162/tacl a 00536

work page internal anchor Pith review doi:10.1162/tacl 2023
[23]

In: Findings of the Association for Computational Linguistics (EMNLP), pp

Renze, M.: The effect of sampling temperature on problem solving in large lan- guage models. In: Findings of the Association for Computational Linguistics (EMNLP), pp. 7346–7356. Association for Computational Linguistics, Miami, Florida, USA (2024). https://doi.org/10.18653/v1/2024.findings-emnlp.432 30

work page doi:10.18653/v1/2024.findings-emnlp.432 2024
[24]

In: Advances in Neural Information Processing Systems, vol

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A....

work page 1901
[25]

In: USENIX Security Symposium (USENIX Security), pp

Wahaibi, S.A., Foley, M., Maffeis, S.: SQIRL: Grey-Box detection of SQL injection vulnerabilities using reinforcement learning. In: USENIX Security Symposium (USENIX Security), pp. 6097–6114. USENIX Association, Anaheim, CA (2023). https://www.usenix.org/conference/usenixsecurity23/presentation/al-wahaibi

work page 2023
[26]

In: IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp

Hu, Z., Beuran, R., Tan, Y.: Automated penetration testing using deep rein- forcement learning. In: IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 2–10 (2020)

work page 2020
[27]

In: Proceedings of the ACM SIGSAC Conference on Computer and Communi- cations Security, New York, NY, USA, pp

Zhong, R., Chen, Y., Hu, H., Zhang, H., Lee, W., Wu, D.: SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In: Proceedings of the ACM SIGSAC Conference on Computer and Communi- cations Security, New York, NY, USA, pp. 955–970 (2020). https://doi.org/10. 1145/3372297.3417260

work page arXiv 2020
[28]

https://lcamtuf

Zalewski, M.: American fuzzy lop (AFL) fuzzer (2017). https://lcamtuf. coredump.cx/afl

work page 2017
[29]

In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, New York, NY, USA, pp

Lemieux, C., Sen, K.: Fairfuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, New York, NY, USA, pp. 475–485 (2018). https://doi.org/10.1145/3238147.3238176

work page doi:10.1145/3238147.3238176 2018
[30]

In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp

B¨ ohme, M., Pham, V.-T., Roychoudhury, A.: Coverage-based greybox fuzzing as markov chain. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 1032–1043 (2016). https: //doi.org/10.1145/2976749.2978428

work page doi:10.1145/2976749.2978428 2016
[31]

In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval (SIGIR)

Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, pp. 335–336. Association for Computing Machinery, New York, NY, USA (1998). https://doi.org/10.1145/290941.291025

work page doi:10.1145/290941.291025 1998
[32]

In: Advances in Neural Information Processing Systems, vol

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014). https://proceedings.neurips.cc/ paper files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf 31

work page 2014
[33]

In: Advances in Neural Information Processing Systems, vol

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain-of-Thought Prompting Elicits Reasoning in Large Lan- guage Models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022). https://proceedings.neurips.cc/paper files/paper/2022/ file/9d5609613524ecf4f15af0f7b31abca4-Paper-C...

work page 2022
[34]

https://portswigger

PortSwigger: Web Security Academy: SQL Injection (2024). https://portswigger. net/web-security/sql-injection

work page 2024
[35]

Community-maintained security payload repository (2024)

swisskyrepo, contributors: PayloadsAllTheThings: A List of Useful Payloads and Bypass for Web Application Security. Community-maintained security payload repository (2024). https://github.com/swisskyrepo/PayloadsAllTheThings

work page 2024
[36]

Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natu- ral Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992 (2019). https://doi.org/10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[37]

In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: Massive Text Embedding Benchmark. In: Proceedings of the Conference of the European Chap- ter of the Association for Computational Linguistics, Dubrovnik, Croatia, pp. 2014–2037 (2023). https://doi.org/10.18653/v1/2023.eacl-main.148

work page doi:10.18653/v1/2023.eacl-main.148 2014
[38]

IEEE Transactions on Big Data7(3), 535–547 (2019)

Johnson, J., Douze, M., J´ egou, H.: Billion-scale similarity search with GPUs. IEEE Transactions on Big Data7(3), 535–547 (2019)

work page 2019
[39]

In: Soviet Physics Doklady, vol

Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966). Soviet Union

work page 1966
[40]

BERTScore: Evaluating Text Generation with BERT

Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: Eval- uating Text Generation with BERT (2020). https://arxiv.org/abs/1904.09675

work page internal anchor Pith review Pith/arXiv arXiv 2020
[41]

https://coraza.io/

OWASP: Coraza Web Application Firewall (2024). https://coraza.io/

work page 2024
[42]

Biometrika13(1), 25–45 (1920)

Pearson, K.: Notes on the History of Correlation. Biometrika13(1), 25–45 (1920)

work page 1920
[43]

Biometrika10(4), 507–521 (1915)

Fisher, R.A.: Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika10(4), 507–521 (1915)

work page 1915
[44]

Routledge, New York (2013)

Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Routledge, New York (2013)

work page 2013
[45]

Information Sciences254, 19–38 (2014) https://doi.org/10.1016/j.ins.2013.08.007 32

Razzaq, A., Latif, K., Ahmad, H.F., Hur, A., Anwar, Z., Bloodsworth, P.C.: Semantic security against web application attacks. Information Sciences254, 19–38 (2014) https://doi.org/10.1016/j.ins.2013.08.007 32

work page doi:10.1016/j.ins.2013.08.007 2014