pith. machine review for the scientific record. sign in

arxiv: 2605.11188 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.AI· cs.ET

Recognition: no theorem link

Adversarial SQL Injection Generation with LLM-Based Architectures

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:07 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.ET
keywords SQL injectionadversarial attackslarge language modelsweb application firewallsbypass evaluationretrieval augmented generationsecurity testing
0
0 comments X

The pith

LLM-based systems generate SQL injection payloads that bypass web application firewalls at rates up to 22.73 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops and tests two new LLM-based methods for automatically creating adversarial SQL injection payloads meant to evade detection by web application firewalls. The authors run 240 experiments that produce 240,000 payloads and execute 2.2 million checks against six rule-based, two AI/ML-based, and two commercial WAFs using models such as GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1. A sympathetic reader would care because automated tools that reliably find bypasses could let developers test and harden their applications against real attacks before they occur. The results show that the RADAGAS variant paired with GPT-4o reaches the highest overall bypass rate while performing especially well against machine-learning defenses.

Core claim

The authors introduce RADAGAS, a retrieval-augmented generation system, and RefleXQLi, a reflective chain-of-thought approach, for producing adversarial SQLi payloads. Across extensive tests, RADAGAS-GPT4o achieves a 22.73 percent bypass rate and outperforms other baselines. The methods reach high success rates on AI/ML-based WAFs yet remain largely ineffective against rule-based ones, and less diverse payload sets sometimes produce more bypasses provided the starting payload succeeds.

What carries the argument

RADAGAS, a retrieval-augmented generation system that pulls relevant examples to steer large language models toward creating effective adversarial SQL injection payloads.

If this is right

  • RADAGAS-DeepSeek reaches 92.49 percent bypass on the AI-based WAF Brain.
  • RADAGAS-Claude reaches 80.48 percent bypass on the AI-based CNN-WAF.
  • Bypass rates on rule-based WAFs such as ModSecurity and Coraza remain between 0 and 5.70 percent.
  • Less diverse payload sets can increase bypass counts but perform poorly when the initial payload fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security teams could add LLM generators to routine testing suites to simulate advanced attacks against current defenses.
  • Rule-based WAF developers may need to add dynamic pattern matching to counter the outputs of these LLM methods.
  • Hybrid systems that combine LLM generation with traditional fuzzing could reduce the risk that a single weak starting payload blocks all further attempts.

Load-bearing premise

The 240 experiments with chosen prompts and WAF configurations accurately reflect real-world adversarial conditions without introducing bias from those choices.

What would settle it

Applying the generated payloads against additional live web applications protected by the tested WAFs and measuring whether the reported bypass rates hold under actual traffic.

read the original abstract

SQL injection (SQLi) attacks are still one of the serious attacks ranked in the Open Worldwide Application Security Project (OWASP) Top 10 threats. Today, with advances in Artificial Intelligence (AI), especially in Large Language Models (LLMs), an opportunity has been created for automating adversarial attack tests to measure the defense mechanisms. In this paper, we aim to create a comprehensive evaluation of use cases that utilize LLMs for adversarial SQL injection generation. We introduce two novel LLM-based systems, Retrieval Augmented Generation for Adversarial SQLi (RADAGAS) and Reflective Chain-of-Thought SQLi (RefleXQLi), and compare them with existing baselines against 10 Web Application Firewalls (WAFs) and one execution-based MySQL validator. To perform a comprehensive test, we used six rule-based open-source WAFs (ModSecurity PL1--3, Coraza PL1--3), 2 AI/ML-based WAFs (WAF Brain, CNN-WAF), and 2 commercial WAFs (AWS WAF and Cloudflare WAF). For the LLM models, we used GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1. Our tests consist of 240 experiments that generate 240,000 payloads and perform 2.2 million tests against WAFs. Our comprehensive evaluation reveals that RADAGAS-GPT4o outperforms other baseline models with a 22.73\% bypass rate. The proposed RADAGAS variants are highly successful on AI/ML-based WAFs (92.49\% on WAF-Brain by RADAGAS-DeepSeek, 80.48\% on CNN-WAF by RADAGAS-Claude), but struggle to bypass rule-based WAFs (0--5.70\% on ModSecurity and Coraza). In addition to these findings, another observation is that creating less diverse payloads achieves more bypasses, however they show poor results if the initially chosen payload is not successful. We observe that our findings provide a comprehensive view on using LLM-based approaches in security testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces two novel LLM-based systems, RADAGAS (Retrieval Augmented Generation for Adversarial SQLi) and RefleXQLi (Reflective Chain-of-Thought SQLi), for generating adversarial SQL injection payloads. It evaluates these against baselines (using GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1) on 10 WAFs (six rule-based open-source, two AI/ML-based, and two commercial) plus a MySQL validator via 240 experiments that produce 240,000 payloads and 2.2 million tests. The central claim is that RADAGAS-GPT4o achieves the highest bypass rate of 22.73%, with strong results on AI/ML WAFs (e.g., 92.49% on WAF-Brain) but low success on rule-based WAFs (0-5.70%), plus the observation that less diverse payloads yield higher bypass rates when the initial payload succeeds.

Significance. If the bypass rates hold, the work supplies a large-scale empirical benchmark on LLM-driven adversarial SQLi generation, quantifying the relative strengths of retrieval-augmented and reflective techniques against different WAF classes. The 2.2 million test scale and coverage of open-source, ML-based, and commercial defenses provide concrete data that could inform both offensive security tooling and defensive WAF hardening, particularly highlighting the comparative weakness of rule-based systems.

major comments (3)
  1. [Evaluation] Evaluation section: The exact prompt templates, retrieval corpus, and chain-of-thought instructions for RADAGAS and RefleXQLi are not specified. Because the 22.73% bypass claim for RADAGAS-GPT4o and the outperformance over baselines rest on these custom generation procedures, and the skeptic note indicates sensitivity to prompt variations, the absence of reproducible prompt details prevents verification that the reported advantage derives from the architectures rather than hyperparameter or prompt tuning.
  2. [Results] Results section: Bypass rates (e.g., 22.73% overall for RADAGAS-GPT4o, 92.49% on WAF-Brain) are presented as point estimates without error bars, confidence intervals, or statistical significance tests across the 240 experiments. This omission is load-bearing for the comparative claim, as variance from payload sampling or WAF response stochasticity could alter whether the observed outperformance is reliable.
  3. [WAF testing setup] WAF testing setup: The paper does not state whether the 10 WAFs were evaluated under default configurations or with any custom rules, thresholds, or versions. Given that bypass performance is known to be sensitive to WAF rule sets (especially for ModSecurity PL1-3 and Coraza), this detail is required to interpret the low rule-based bypass rates (0-5.70%) versus high AI/ML rates as generalizable rather than configuration-specific.
minor comments (2)
  1. [Discussion] The note that less diverse payloads achieve more bypasses would be strengthened by quantitative diversity metrics (e.g., number of unique payloads or lexical entropy) rather than a qualitative observation.
  2. [Results] A table summarizing per-WAF and per-model bypass rates with exact experiment counts would improve readability of the 2.2 million test results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important aspects of reproducibility, statistical rigor, and experimental clarity. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims or findings.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The exact prompt templates, retrieval corpus, and chain-of-thought instructions for RADAGAS and RefleXQLi are not specified. Because the 22.73% bypass claim for RADAGAS-GPT4o and the outperformance over baselines rest on these custom generation procedures, and the skeptic note indicates sensitivity to prompt variations, the absence of reproducible prompt details prevents verification that the reported advantage derives from the architectures rather than hyperparameter or prompt tuning.

    Authors: We agree that full reproducibility requires the exact prompt templates, retrieval corpus composition, and chain-of-thought instructions. The manuscript describes the high-level architectures of RADAGAS and RefleXQLi in the methods section, but does not include the verbatim prompts or corpus details. In the revised version, we will add a dedicated appendix containing the complete prompt templates for each LLM, the structure and size of the retrieval corpus, and the precise reflective CoT instructions. This will enable independent verification that performance differences arise from the proposed techniques. revision: yes

  2. Referee: [Results] Results section: Bypass rates (e.g., 22.73% overall for RADAGAS-GPT4o, 92.49% on WAF-Brain) are presented as point estimates without error bars, confidence intervals, or statistical significance tests across the 240 experiments. This omission is load-bearing for the comparative claim, as variance from payload sampling or WAF response stochasticity could alter whether the observed outperformance is reliable.

    Authors: The referee correctly notes that only point estimates are reported. Although the scale of 240 experiments and 2.2 million tests provides substantial empirical support, we did not include error bars, confidence intervals, or formal statistical tests. In the revision, we will compute and report 95% confidence intervals for the bypass rates and add a brief discussion of observed variance across runs. Where direct comparisons are made, we will include appropriate statistical significance indicators to substantiate the outperformance claims. revision: yes

  3. Referee: [WAF testing setup] WAF testing setup: The paper does not state whether the 10 WAFs were evaluated under default configurations or with any custom rules, thresholds, or versions. Given that bypass performance is known to be sensitive to WAF rule sets (especially for ModSecurity PL1-3 and Coraza), this detail is required to interpret the low rule-based bypass rates (0-5.70%) versus high AI/ML rates as generalizable rather than configuration-specific.

    Authors: We used the default configurations for every WAF: standard ModSecurity and Coraza installations at the stated paranoia levels (PL1–3) with no custom rules added, and the AI/ML-based and commercial WAFs in their out-of-the-box settings. This information was omitted from the manuscript. In the revised evaluation section, we will explicitly state that all tests used default configurations and versions, thereby clarifying that the reported differences between rule-based and AI/ML WAFs reflect standard deployments. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking with no derivation chain or fitted predictions

full rationale

The paper conducts a direct empirical evaluation of LLM-based SQLi payload generators (RADAGAS and RefleXQLi) against 10 WAFs and a MySQL validator using 240 experiments that produce 240,000 payloads and 2.2 million tests. No equations, parameters fitted to subsets then renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the reported methodology or results. Bypass rates (e.g., 22.73% for RADAGAS-GPT4o) are measured outcomes from external system interactions, not derived quantities that reduce to the inputs by construction. The study is self-contained against its chosen benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The contribution is an empirical evaluation of two new LLM prompting architectures for security testing. No free parameters, mathematical axioms, or invented theoretical entities are involved.

pith-pipeline@v0.9.0 · 5693 in / 1212 out tokens · 66623 ms · 2026-05-13T02:07:13.821333+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 5 internal anchors

  1. [1]

    https://owasp.org/Top10/

    OWASP Foundation: OWASP Top 10-2021: The Ten Most Critical Web Appli- cation Security Risks (2021). https://owasp.org/Top10/

  2. [2]

    Open-source penetration testing tool (2006)

    Damele, B., Stampar, M.: SQLMAP: Automatic SQL Injection and Database Takeover Tool. Open-source penetration testing tool (2006). https://sqlmap.org/

  3. [3]

    GPT-4 Technical Report

    OpenAI: GPT-4 Technical Report (2024). https://arxiv.org/abs/2303.08774

  4. [4]

    Claude-3 Model Card1(1), 4 (2024)

    Anthropic-AI: The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card1(1), 4 (2024)

  5. [5]

    DeepSeek-Coder-V2: Breaking the barrier of closed-source models in code intelligence.arXiv preprint arXiv:2406.11931,

    DeepSeek-AI: DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence (2024). https://arxiv.org/abs/2406.11931

  6. [6]

    IEEE Transactions on Software Engineering47(11), 2312–2331 (2019)

    Manes, V.J., Han, H., Han, C., Cha, S.K., Egele, M., Schwartz, E.J., Woo, M.: The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering47(11), 2312–2331 (2019)

  7. [7]

    https: //modsecurity.org/

    OWASP: ModSecurity: Open Source Web Application Firewall (2024). https: //modsecurity.org/

  8. [8]

    Future Internet17(1) (2025) https://doi.org/10.3390/ fi17010008

    Babaey, V., Ravindran, A.: GenSQLi: A generative artificial intelligence frame- work for automatically securing web application firewalls against structured query language injection attacks. Future Internet17(1) (2025) https://doi.org/10.3390/ fi17010008

  9. [9]

    In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp

    Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., Xia, H., Xu, J., Wu, Z., Chang, B., Sun, X., Li, L., Sui, Z.: A survey on in-context learning. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1107–

  10. [10]

    A survey on in-context learning

    Association for Computational Linguistics, Miami, Florida, USA (2024). https://doi.org/10.18653/v1/2024.emnlp-main.64

  11. [11]

    In: IEEE International Symposium on Consumer Elec- tronics (ISCE), pp

    Kindy, D.A., Pathan, A.-S.K.: A survey on SQL injection: Vulnerabilities, attacks, and prevention techniques. In: IEEE International Symposium on Consumer Elec- tronics (ISCE), pp. 468–471 (2011). https://doi.org/10.1109/ISCI.2011.5973873

  12. [12]

    Elsevier, Waltham, MA (2012) 29

    Clarke, J.: SQL Injection Attacks and Defense. Elsevier, Waltham, MA (2012) 29

  13. [13]

    1–10 (2015)

    Appelt, D., Nguyen, C.D., Briand, L.: Behind an application firewall, are we safe from SQL injection attacks? In: IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 1–10 (2015)

  14. [14]

    Journal of Internet Services and Applications10(1), 1–22 (2019)

    Pan, Y., Sun, F., Teng, Z., White, J., Schmidt, D.C., Staples, J., Krause, L.: Detecting web attacks with end-to-end deep learning. Journal of Internet Services and Applications10(1), 1–22 (2019)

  15. [15]

    https://github.com/BBVA/waf-brain

    BBVA-Labs: WAF-Brain: Machine Learning Based Web Application Firewall (2018). https://github.com/BBVA/waf-brain

  16. [16]

    Applied Sciences14(16) (2024) https://doi.org/10

    Gui, Z., Wang, E., Deng, B., Zhang, M., Chen, Y., Wei, S., Xie, W., Wang, B.: SqliGPT: Evaluating and utilizing large language models for automated SQL injection black-box detection. Applied Sciences14(16) (2024) https://doi.org/10. 3390/app14166929

  17. [17]

    In: IEEE International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp

    Yang, T., Jiang, Z., Wang, Y.: LLMSQLi: A black-box web SQLi detection tool based on large language model. In: IEEE International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp. 629–633 (2024)

  18. [18]

    LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning,

    Sun, Y., Wu, D., Xue, Y., Liu, H., Ma, W., Zhang, L., Liu, Y., Li, Y.: LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning (2025). https://arxiv.org/abs/2401.16185

  19. [19]

    In: USENIX Security Symposium (USENIX Security 24), pp

    Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y., Pinzger, M., Rass, S.: PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In: USENIX Security Symposium (USENIX Security 24), pp. 847–864 (2024). https://www.usenix.org/conference/ usenixsecurity24/presentation/deng

  20. [20]

    Prompt Injection attack against LLM-integrated Applications

    Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y., Wang, H., Zheng, Y., Zhang, L.Y., Liu, Y.: Prompt Injection attack against LLM- integrated Applications (2025). https://arxiv.org/abs/2306.05499

  21. [21]

    The Curious Case of Neural Text Degeneration

    Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The Curious Case of Neural Text Degeneration (2020). https://arxiv.org/abs/1904.09751

  22. [22]

    Lost in the Middle: How Language Models Use Long Contexts

    Meister, C., Pimentel, T., Wiher, G., Cotterell, R.: Locally typical sampling. Transactions of the Association for Computational Linguistics11, 102–121 (2023) https://doi.org/10.1162/tacl a 00536

  23. [23]

    In: Findings of the Association for Computational Linguistics (EMNLP), pp

    Renze, M.: The effect of sampling temperature on problem solving in large lan- guage models. In: Findings of the Association for Computational Linguistics (EMNLP), pp. 7346–7356. Association for Computational Linguistics, Miami, Florida, USA (2024). https://doi.org/10.18653/v1/2024.findings-emnlp.432 30

  24. [24]

    In: Advances in Neural Information Processing Systems, vol

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A....

  25. [25]

    In: USENIX Security Symposium (USENIX Security), pp

    Wahaibi, S.A., Foley, M., Maffeis, S.: SQIRL: Grey-Box detection of SQL injection vulnerabilities using reinforcement learning. In: USENIX Security Symposium (USENIX Security), pp. 6097–6114. USENIX Association, Anaheim, CA (2023). https://www.usenix.org/conference/usenixsecurity23/presentation/al-wahaibi

  26. [26]

    In: IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp

    Hu, Z., Beuran, R., Tan, Y.: Automated penetration testing using deep rein- forcement learning. In: IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 2–10 (2020)

  27. [27]

    In: Proceedings of the ACM SIGSAC Conference on Computer and Communi- cations Security, New York, NY, USA, pp

    Zhong, R., Chen, Y., Hu, H., Zhang, H., Lee, W., Wu, D.: SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In: Proceedings of the ACM SIGSAC Conference on Computer and Communi- cations Security, New York, NY, USA, pp. 955–970 (2020). https://doi.org/10. 1145/3372297.3417260

  28. [28]

    https://lcamtuf

    Zalewski, M.: American fuzzy lop (AFL) fuzzer (2017). https://lcamtuf. coredump.cx/afl

  29. [29]

    In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, New York, NY, USA, pp

    Lemieux, C., Sen, K.: Fairfuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, New York, NY, USA, pp. 475–485 (2018). https://doi.org/10.1145/3238147.3238176

  30. [30]

    In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp

    B¨ ohme, M., Pham, V.-T., Roychoudhury, A.: Coverage-based greybox fuzzing as markov chain. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 1032–1043 (2016). https: //doi.org/10.1145/2976749.2978428

  31. [31]

    In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval (SIGIR)

    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, pp. 335–336. Association for Computing Machinery, New York, NY, USA (1998). https://doi.org/10.1145/290941.291025

  32. [32]

    In: Advances in Neural Information Processing Systems, vol

    Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014). https://proceedings.neurips.cc/ paper files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf 31

  33. [33]

    In: Advances in Neural Information Processing Systems, vol

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain-of-Thought Prompting Elicits Reasoning in Large Lan- guage Models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022). https://proceedings.neurips.cc/paper files/paper/2022/ file/9d5609613524ecf4f15af0f7b31abca4-Paper-C...

  34. [34]

    https://portswigger

    PortSwigger: Web Security Academy: SQL Injection (2024). https://portswigger. net/web-security/sql-injection

  35. [35]

    Community-maintained security payload repository (2024)

    swisskyrepo, contributors: PayloadsAllTheThings: A List of Useful Payloads and Bypass for Web Application Security. Community-maintained security payload repository (2024). https://github.com/swisskyrepo/PayloadsAllTheThings

  36. [36]

    Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natu- ral Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992 (2019). https://doi.org/10.18653/v1/D19-1410

  37. [37]

    In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

    Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: Massive Text Embedding Benchmark. In: Proceedings of the Conference of the European Chap- ter of the Association for Computational Linguistics, Dubrovnik, Croatia, pp. 2014–2037 (2023). https://doi.org/10.18653/v1/2023.eacl-main.148

  38. [38]

    IEEE Transactions on Big Data7(3), 535–547 (2019)

    Johnson, J., Douze, M., J´ egou, H.: Billion-scale similarity search with GPUs. IEEE Transactions on Big Data7(3), 535–547 (2019)

  39. [39]

    In: Soviet Physics Doklady, vol

    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966). Soviet Union

  40. [40]

    BERTScore: Evaluating Text Generation with BERT

    Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: Eval- uating Text Generation with BERT (2020). https://arxiv.org/abs/1904.09675

  41. [41]

    https://coraza.io/

    OWASP: Coraza Web Application Firewall (2024). https://coraza.io/

  42. [42]

    Biometrika13(1), 25–45 (1920)

    Pearson, K.: Notes on the History of Correlation. Biometrika13(1), 25–45 (1920)

  43. [43]

    Biometrika10(4), 507–521 (1915)

    Fisher, R.A.: Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika10(4), 507–521 (1915)

  44. [44]

    Routledge, New York (2013)

    Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Routledge, New York (2013)

  45. [45]

    Information Sciences254, 19–38 (2014) https://doi.org/10.1016/j.ins.2013.08.007 32

    Razzaq, A., Latif, K., Ahmad, H.F., Hur, A., Anwar, Z., Bloodsworth, P.C.: Semantic security against web application attacks. Information Sciences254, 19–38 (2014) https://doi.org/10.1016/j.ins.2013.08.007 32