Recognition: unknown
Towards Automated Pentesting with Large Language Models
Pith reviewed 2026-05-10 16:07 UTC · model grok-4.3
The pith
RedShell fine-tunes large language models on malicious PowerShell samples to generate valid offensive code for Windows pentesting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RedShell is a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. Trained on a malicious PowerShell dataset from the literature enhanced with manually curated code samples, the framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art counterparts in distance metrics such as edit distance with above 50% average code similarity. Functional experiments emphasize the execution reliability of the snippets produced by RedShell in a testing scenario that mimics
What carries the argument
RedShell: a framework that fine-tunes LLMs on an enhanced malicious PowerShell dataset to output offensive code for vulnerability testing.
Load-bearing premise
That strong results on syntactic validity, code similarity, and execution in mirrored test settings will translate to safe, useful performance during actual live pentesting without introducing new risks or needing heavy human correction.
What would settle it
A controlled test in which RedShell-generated scripts are run against live but isolated Windows systems and frequently fail to execute, fail to detect the targeted vulnerabilities, or require major manual fixes would show the claims do not hold.
Figures
read the original abstract
Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional pentesting workflows. In this work, we present RedShell, a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. RedShell was trained on a malicious PowerShell dataset from the literature, which we further enhanced with manually curated code samples. Experiments show that our framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art counterparts in distance metrics such as edit distance (above 50% average code similarity). Additionally, functional experiments emphasize the execution reliability of the snippets produced by RedShell in a testing scenario that mirrors real-world settings. This work sheds light on the state-of-the-art research in the field of Generative AI applied to malicious code generation and automated testing, acknowledging the potential benefits that LLMs hold within controlled environments such as pentesting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents RedShell, a privacy-preserving and hardware-efficient framework that fine-tunes LLMs on an enhanced malicious PowerShell dataset to generate offensive code for automating pentesting of Microsoft Windows vulnerabilities. It reports experimental outcomes of over 90% syntactic validity in generated samples, strong semantic alignment with reference pentesting snippets, outperformance of state-of-the-art models on edit-distance similarity (above 50% average code similarity), and reliable execution of produced snippets in a testing scenario that mirrors real-world settings.
Significance. If the performance claims are supported by complete experimental details, baselines, and statistical validation, the work would offer a concrete contribution to the application of generative AI for ethical hacking tools in controlled settings. The focus on privacy preservation and hardware efficiency, combined with the acknowledgment of controlled-environment benefits, strengthens its potential relevance if the utility and safety assertions can be substantiated.
major comments (2)
- [Abstract] Abstract: the central performance claims (>90% syntactic validity, >50% average code similarity, outperformance of SOTA) are stated without any reference to dataset size, train/test split, exact measurement procedures for syntactic validity or semantic alignment, baseline models, or statistical significance testing; these omissions directly undermine evaluation of the reported results.
- [Functional experiments] Functional experiments (as described in the abstract): the claim of 'execution reliability ... in a testing scenario that mirrors real-world settings' is load-bearing for the utility of assisting pentesters, yet no specifics are provided on which real-world elements are mirrored (dynamic networks, EDR evasion, multi-stage chains) or how safety, containment, and avoidance of new vulnerabilities were assessed.
minor comments (1)
- [Abstract] Abstract: the phrasing 'dark LLMs such as XXXGPT and WolfGPT' would benefit from citations or brief definitions to allow readers to contextualize the comparison to ethical use cases.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications from the full paper and indicating where revisions will strengthen the presentation without altering the core contributions or experimental outcomes.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims (>90% syntactic validity, >50% average code similarity, outperformance of SOTA) are stated without any reference to dataset size, train/test split, exact measurement procedures for syntactic validity or semantic alignment, baseline models, or statistical significance testing; these omissions directly undermine evaluation of the reported results.
Authors: We agree that the abstract's conciseness omits explicit references to supporting details, which can hinder immediate assessment. The full manuscript (Sections 3 and 4) specifies the enhanced dataset size and composition, the 80/20 train/test split, syntactic validity measured via PowerShell parser success rates on generated samples, semantic alignment assessed through edit-distance similarity and embedding-based metrics, the specific baseline models (including prior SOTA approaches), and statistical validation via repeated trials with reported variances. To improve accessibility, we will revise the abstract to include brief, high-level references to these elements (e.g., 'on an enhanced dataset of X samples with 80/20 split, using parser-based validity checks and edit-distance metrics against baselines'). This is a targeted addition that preserves abstract length constraints. revision: yes
-
Referee: [Functional experiments] Functional experiments (as described in the abstract): the claim of 'execution reliability ... in a testing scenario that mirrors real-world settings' is load-bearing for the utility of assisting pentesters, yet no specifics are provided on which real-world elements are mirrored (dynamic networks, EDR evasion, multi-stage chains) or how safety, containment, and avoidance of new vulnerabilities were assessed.
Authors: The manuscript's functional experiments section describes execution in isolated virtual Windows environments configured with standard vulnerability setups and basic security postures to simulate typical pentesting conditions. We acknowledge that the abstract and section could more explicitly delineate the mirrored elements and safety protocols. We will expand the description to clarify the actual scope: static vulnerability targets in contained VMs, basic multi-stage chaining where relevant to the generated snippets, and safety measures including full sandbox isolation, no external network access, and post-execution monitoring to confirm no unintended side effects or new vulnerabilities introduced. Advanced elements such as dynamic network simulation or specific EDR evasion were outside the current experimental focus (which prioritized code validity and basic executability); we will explicitly note this scope to prevent overgeneralization. This is a partial revision focused on elaboration rather than new experiments. revision: partial
Circularity Check
No circularity in empirical framework and evaluation
full rationale
The paper presents RedShell as a fine-tuned LLM framework trained on an external malicious PowerShell dataset from the literature plus manually curated samples. Performance claims (>90% syntactic validity, edit-distance similarity, execution reliability) are reported as outcomes of separate experiments in a mirrored test scenario rather than being derived from or defined into the training process. No equations, self-referential definitions, or load-bearing self-citations appear in the provided text that would reduce results to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Alotaibi, L., Seher, S., Mohammad, N.: Cyberattacks using chatgpt: Exploring ma- licious content generation through prompt engineering. In: 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS). pp. 1304–1311. IEEE (2024), https://doi.org/10.1109/ICETSIS61505. 2024.10459698
-
[2]
AtomicRedTeam:Atomicredteam:Adversaryemulationforcybersecurity(2024), https://www.atomicredteam.io/, (Accessed: 2025-11-03)
2024
-
[3]
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align an translate. 3rd International Conference on Learning Repre- sentations, ICLR 2015 - Conference Track Proceedings pp. 1–15 (2015), https: //arxiv.org/abs/1409.0473
work page internal anchor Pith review arXiv 2015
-
[4]
Bianou, S.G., Batogna, R.G.: Pentest-ai, an llm-powered multi-agents framework for penetration testing automation leveraging mitre attack. In: 2024 IEEE Inter- national Conference on Cyber Security and Resilience (CSR). pp. 763–770. IEEE (2024), https://doi.org/10.1109/CSR61664.2024.10679480
-
[5]
https://redcanary.com/threat-detection-report/techniques/, (Accessed: 2025-11-
Canary, R.: Top ATT&CK®Techniques | Threat Detection Report. https://redcanary.com/threat-detection-report/techniques/, (Accessed: 2025-11-
2025
-
[6]
Sensors23(18), 1–18 (2023), https://doi.org/10.3390/s23188014
Chowdhary, A., Jha, K., Zhao, M.: Generative adversarial network (gan)-based autonomous penetration testing for web applications. Sensors23(18), 1–18 (2023), https://doi.org/10.3390/s23188014
-
[7]
URL https: //doi.org/10.1145/1390156.1390177
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th Interna- tional Conference on Machine Learning. pp. 160–167. Association for Computing Machinery (2008), https://doi.org/10.1145/1390156.1390177
-
[8]
Corporation, M.: Azuread module (2025), https://learn.microsoft.com/en-us/ powershell/module/azuread/?view=azureadps-2.0, (Accessed: 2025-11-03)
2025
-
[9]
Corporation, M.: Mitre att&ck framework (2024), https://attack.mitre.org/, (Ac- cessed: 2025-11-03)
2024
-
[10]
Bessa et al
DeepSeek-AI: Deepseek chat platform (2025), https://chat.deepseek.com/, (Ac- cessed: 2025-11-03) 20 R. Bessa et al
2025
-
[11]
Delpy, B.: Mimikatz (2011), https://github.com/gentilkiwi/mimikatz, (Accessed: 2025-11-03)
2011
-
[12]
In: 33rd USENIX Security Symposium (USENIX Security 24)
Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y., Pinzger, M., Rass, S.: PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 847–864. USENIX Association (2024), https://www. usenix.org/conference/usenixsecurity24/prese...
2024
-
[13]
In: 32nd USENIX Security Symposium (USENIX Security 23)
Deng, G., Zhang, Z., Li, Y., Liu, Y., Zhang, T., Liu, Y., Yu, G., Wang, D.: NAUTILUS: Automated RESTful API vulnerability detection. In: 32nd USENIX Security Symposium (USENIX Security 23). pp. 5593–5609. USENIX Associ- ation (2023), https://www.usenix.org/conference/usenixsecurity23/presentation/ deng-gelei
2023
-
[14]
Electronics (Switzerland)13(10) (2024), https://doi.org/10.3390/ electronics13101839
Eze, C.S., Shamir, L.: Analysis and prevention of ai-based phishing email attacks. Electronics (Switzerland)13(10) (2024), https://doi.org/10.3390/ electronics13101839
2024
-
[15]
Face, H.: Open llm leaderboard (2025), https://huggingface.co/ open-llm-leaderboard, (Accessed: 2025-11-03)
2025
-
[16]
Falade, P.V.: Decoding the threat landscape : Chatgpt, fraudgpt, and wormgpt in social engineering attacks. International Journal of Scientific Research in Computer Science, Engineering and Information Technology pp. 185–198 (2023), https://doi. org/10.32628/cseit2390533
-
[17]
doi:10.1109/ICSE-FoSE59343.2023.00008
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: Survey and open problems. In: 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). pp. 31–53. IEEE (2023), https://doi.org/10. 1109/ICSE-FoSE59343.2023.00008
-
[18]
arXiv preprint arXiv:2505.10066 (2025), https://arxiv.org/ abs/2505.10066
Fire, M., Elbazis, Y., Wasenstein, A., Rokach, L.: Dark llms: The growing threat of unaligned ai models. arXiv preprint arXiv:2505.10066 (2025), https://arxiv.org/ abs/2505.10066
-
[19]
MIT Press (2016), http: //www.deeplearningbook.org
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016), http: //www.deeplearningbook.org
2016
-
[20]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A.: The llama 3 herd of models (2024), https://arxiv.org/abs/2407.21783, (Accessed: 2025-11-03)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Hacktricks: Hacktricks (2025), https://book.hacktricks.xyz/, (Accessed: 2025-11- 03)
2025
-
[22]
Speech recognition through physical reservoir computing with neuromorphic nanowire networks,
Hanif, H., Maffeis, S.: Vulberta: Simplified source code pre-training for vul- nerability detection. In: 2022 International Joint Conference on Neural Net- works (IJCNN). pp. 1–8. IEEE (2022), https://doi.org/10.1109/IJCNN55064.2022. 9892280
-
[23]
Hugging Face: evaluate: A python library for model evaluation and comparison (2025), https://pypi.org/project/evaluate/, (Accessed: 2025-11-03)
2025
-
[24]
Hugging Face Datasets: dessertlab/offensive-powershell (2025), https: //huggingface.co/datasets/dessertlab/offensive-powershell, (Accessed: 2025- 11-03)
2025
-
[25]
Hui, B., Yang, J., Cui, Z., Yang, J., Liu, D., Zhang, L., Liu, T., Zhang, J., Yu, B., Dang, K., Yang, A., Men, R., Huang, F., Ren, X., Ren, X., Zhou, J., Lin, J.: Qwen2.5 coder technical report (2024), https://arxiv.org/abs/2409.12186, (Ac- cessed: 2025-11-03) Towards Automated Pentesting with Large Language Models 21
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
arXiv preprint arXiv:2410.23261 (2025), https://arxiv.org/abs/2410.23261
Khandelwal, A., Yun, T., Nayak, N.V., Merullo, J., Bach, S.H., Sun, C., Pavlick, E.: $100k or 100 days: Trade-offs when pre-training with academic resources. arXiv preprint arXiv:2410.23261 (2025), https://arxiv.org/abs/2410.23261
-
[27]
kuangzh: pylcs: A super fast c++ implementation of classic lcs problems using dynamic programming (2023), https://pypi.org/project/pylcs/, (Accessed: 2025- 11-03)
2023
-
[28]
Liguori, P., Al-Hossami, E., Cotroneo, D., Natella, R., Cukic, B., Shaikh, S.: Can we generate shellcodes via natural language? An empirical study, vol. 29. Springer US (2022). https://doi.org/10.1007/s10515-022-00331-3
-
[29]
In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)
Liguori, P., Al-Hossami, E., Orbinato, V., Natella, R., Shaikh, S., Cotroneo, D., Cukic, B.: Evil: Exploiting software via natural language. In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). pp. 321–
2021
-
[30]
IEEE (2021), https://doi.org/10.1109/ISSRE52982.2021.00042
-
[31]
Liguori, P., Improta, C., Natella, R., Cukic, B., Cotroneo, D.: Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code gener- ators. Expert Systems with Applications225(2023), https://doi.org/10.1016/j. eswa.2023.120073
work page doi:10.1016/j 2023
-
[32]
In: 18th USENIX WOOTConferenceonOffensiveTechnologies(WOOT24).pp.27–43.USENIXAs- sociation (2024), https://www.usenix.org/conference/woot24/presentation/liguori
Liguori, P., Marescalco, C., Natella, R., Orbinato, V., Pianese, L.: The power of words: Generating PowerShell attacks from natural language. In: 18th USENIX WOOTConferenceonOffensiveTechnologies(WOOT24).pp.27–43.USENIXAs- sociation (2024), https://www.usenix.org/conference/woot24/presentation/liguori
2024
-
[33]
arXiv preprint arXiv:2102.04664 (2021) 16 A
Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C.B., Drain, D., Jiang, D., Tang, D., Li, G., Zhou, L., Shou, L., Zhou, L., Tufano, M., Gong, M., Zhou, M., Duan, N., Sundaresan, N., Deng, S.K., Fu, S., Liu, S.: Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRRabs/2102.04664(2021), https...
-
[34]
infosecmatter.com/powershell-commands-for-pentesters/, (Accessed: 2025-11- 03)
Matter, I.: Powershell commands for pentesters (2025), https://www. infosecmatter.com/powershell-commands-for-pentesters/, (Accessed: 2025-11- 03)
2025
-
[35]
Microsoft Corporation: Psscriptanalyzer (2025), https://github.com/PowerShell/ PSScriptAnalyzer, (Accessed: 2025-11-03)
2025
-
[36]
com/samratashok/nishang, (Accessed: 2025-11-03)
Mittal, N.: Nishang - offensive powershell for red teams (2018), https://github. com/samratashok/nishang, (Accessed: 2025-11-03)
2018
-
[37]
In: 2023 24th Interna- tional Arab Conference on Information Technology (ACIT)
Mohamed Firdhous, M.F., Elbreiki, W., Abdullahi, I., Sudantha, B.H., Budiarto, R.: Wormgpt: A large language model chatbot for criminals. In: 2023 24th Interna- tional Arab Conference on Information Technology (ACIT). pp. 1–6. IEEE (2023), https://doi.org/10.1109/ACIT58888.2023.10453752
-
[38]
Mora, S.: Rouge: A pure python implementation of the rouge metric (2019), https: //pypi.org/project/rouge/, (Accessed: 2025-11-03)
2019
-
[39]
arXiv preprint arXiv:2402.00891 (2024), https://arxiv.org/pdf/2402.00891
Motlagh, F.N., Hajizadeh, M., Majd, M., Najafi, P., Cheng, F., Meinel, C.: Large language models in cybersecurity: State-of-the-art. arXiv preprint arXiv:2402.00891 (2024), https://arxiv.org/pdf/2402.00891
-
[40]
Nasrabadi,N.M.:PatternRecognitionandMachineLearning.JournalofElectronic Imaging16(4), 049901 (2007), https://doi.org/10.1117/1.2819119
-
[41]
Natella, R., Liguori, P., Improta, C., Cukic, B., Cotroneo, D.: AI Code Generators for Security: Friend or Foe? IEEE Security and Privacy22(5), 73–81 (2024), https: //doi.org/10.1109/MSEC.2024.3355713 22 R. Bessa et al
-
[42]
NetSPI: Microburst (2025), https://github.com/NetSPI/MicroBurst, (Accessed: 2025-11-03)
2025
-
[43]
com/NetSPI/PowerUpSQL/wiki/PowerUpSQL-Cheat-Sheet, (Accessed: 2025-11- 03)
NetSPI: Powerupsql - a powershell toolkit for sql server (2025), https://github. com/NetSPI/PowerUpSQL/wiki/PowerUpSQL-Cheat-Sheet, (Accessed: 2025-11- 03)
2025
-
[44]
OpenAI: Chatgpt: Overview and features (2025), https://openai.com/chatgpt/ overview/, (Accessed: 2025-11-03)
2025
-
[45]
Project, E.: Empire (2025), https://github.com/EmpireProject/Empire, (Ac- cessed: 2025-11-03)
2025
-
[46]
Rustam, F., Ranaweera, P., Jurcut, A.D.: Ai on the defensive and offensive: Securing multi-environment networks from ai agents. In: ICC 2024 - IEEE In- ternational Conference on Communications. pp. 4287–4292. IEEE (2024), https: //doi.org/10.1109/ICC51166.2024.10622943
-
[47]
Sladić, M., Valeros, V., Catania, C., Garcia, S.: Llm in the shell: Generative honey- pots.In:2024IEEEEuropeanSymposiumonSecurityandPrivacyWorkshops(Eu- roS&PW). pp. 430–435. IEEE (2024), https://doi.org/10.1109/EuroSPW61312. 2024.00054
-
[48]
In: Technical report
Strom, B.E., Miller, D.P., Nickels, K.C., Pennington, A.G., Thomas, C.B.: MITRE ATT&CK: Design and Philosophy. In: Technical report. The MITRE Corpora- tion (2018), https://attack.mitre.org/docs/ATTACK_Design_and_Philosophy_ March_2020.pdf
2018
-
[49]
io/blog/qwen2.5/, (Accessed: 2025-11-03)
Team, Q.: Qwen2.5: A party of foundation models (2024), https://qwenlm.github. io/blog/qwen2.5/, (Accessed: 2025-11-03)
2024
-
[50]
TryHackMe: Tryhackme - learn cybersecurity, penetration testing, and ethical hacking (2025), https://tryhackme.com/, (Accessed: 2025-11-03)
2025
-
[51]
Unsloth AI: Unsloth: Open source fine-tuning for llms (2025), https://unsloth.ai/, (Accessed: 2025-11-03)
2025
-
[52]
unsloth.ai/get-started/fine-tuning-guide/lora-hyperparameters-guide, (Accessed: 2025-11-03)
Unsloth Team: Lora hyperparameters guide - unsloth docs (2024), https://docs. unsloth.ai/get-started/fine-tuning-guide/lora-hyperparameters-guide, (Accessed: 2025-11-03)
2024
-
[53]
Advances in Neural Infor- mation Processing Systems30, 5999–6009 (2017), https://proceedings.neurips.cc/ paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Infor- mation Processing Systems30, 5999–6009 (2017), https://proceedings.neurips.cc/ paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
2017
-
[54]
Vats, P., Mandot, M., Gosain, A.: A comprehensive literature review of penetration testing and its applications. In: 2020 8th International Conference on Reliability, InfocomTechnologiesandOptimization(TrendsandFutureDirections)(ICRITO). pp. 674–680. IEEE (2020), https://doi.org/10.1109/ICRITO48877.2020.9197961
-
[55]
No starch press (2014)
Weidman, G.: Penetration testing: a hands-on introduction to hacking. No starch press (2014)
2014
-
[56]
Transformers: State-of-the-Art Natural Language Processing
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T., Gugger, S., Rush, A.: Transformers: State-of- the-art natural language processing. In: EMNLP 2020 - Conference on Empirical Methods in Natural Language Pr...
-
[57]
In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Yang, G., Chen, X., Zhou, Y., Yu, C.: DualSC: Automatic Generation and Sum- marization of Shellcode via Transformer and Dual Learning. In: 2022 IEEE Inter- national Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 361–372 (2022), https://doi.org/10.1109/SANER53432.2022.00052 Towards Automated Pentesting with Large Language Models 23
-
[58]
Journal of Systems and Software197(2023), https://doi.org/10.1016/j.jss.2022.111577
Yang,G.,Zhou,Y.,Chen,X.,Zhang,X.,Han,T.,Chen,T.:ExploitGen:Template- augmented exploit code generation based on CodeBERT. Journal of Systems and Software197(2023), https://doi.org/10.1016/j.jss.2022.111577
-
[59]
arXiv preprint arXiv:2403.08701 (2024), https://arxiv
Yigit, Y., Buchanan, W.J., Tehrani, M.G., Maglaras, L.: Review of generative ai methods in cybersecurity. arXiv preprint arXiv:2403.08701 (2024), https://arxiv. org/abs/2403.08701
-
[60]
openreview.net preprint openre- view.net:d1PtojR26j (2024), https://openreview.net/forum?id=d1PtojR26j
Yoo, H., Yang, Y., Lee, H.: Code-switching red-teaming: Llm evaluation for safety and multilingual understanding. openreview.net preprint openre- view.net:d1PtojR26j (2024), https://openreview.net/forum?id=d1PtojR26j
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.