Recognition: unknown
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition
Pith reviewed 2026-05-10 04:52 UTC · model grok-4.3
The pith
Framing data generation as an attacker-defender competition produces diverse multi-turn conversations that improve secure code generation after fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By framing data generation as an adversarial task between attacker bots creating prompts and defender bots generating responses in a competitive arena with multiple teams, the approach naturally produces diverse and complex multi-turn conversations. A competition involving 10 teams generated 19,683 such conversations focused on safety alignment of LLMs in cybersecurity. Fine-tuning an open-source model on this dataset resulted in an 18.47% improvement on CyberSecEval-Instruct and 29.42% on CyberSecEval-MITRE.
What carries the argument
Adversarial Arena: an interactive competition framework where attackers generate prompts and defenders produce responses, which drives the creation of high-quality data through team competition.
Load-bearing premise
The benchmark improvements result specifically from the quality and diversity induced by the attacker-defender competition rather than from other factors like the total amount of data collected or the choice of participants.
What would settle it
A control experiment collecting a similar number of conversations without the adversarial competition structure and showing no comparable improvements upon fine-tuning would falsify the claim that the arena method is responsible for the gains.
Figures
read the original abstract
Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this dataset produced an 18.47% improvement in secure code generation on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Adversarial Arena, a framework that frames conversational data generation as an interactive competition between attacker and defender bots built by multiple independent teams. A competition involving 10 academic teams generated 19,683 multi-turn conversations focused on cybersecurity safety alignment; fine-tuning an open-source model on this dataset is reported to produce an 18.47% improvement on CyberSecEval-Instruct and a 29.42% improvement on CyberSecEval-MITRE for secure code generation.
Significance. If the reported gains can be causally linked to the adversarial competition format rather than data volume or participant expertise, the approach offers a scalable way to produce diverse, high-complexity multi-turn data in low-resource domains such as safety alignment. The concrete benchmark deltas and the scale of the crowdsourced dataset (19k conversations) suggest practical utility for post-training, provided the method is shown to outperform simpler collection strategies.
major comments (2)
- [Abstract] Abstract: The central empirical claim attributes the 18.47% and 29.42% benchmark improvements to the quality and diversity arising from the attacker-defender competition, yet the abstract (and, from context, the results section) provides no baseline model details, no matched-volume non-adversarial control dataset, and no description of how the 19,683 conversations were filtered or split. Without these, the incremental benefit of the arena format over simply collecting an equivalent volume of cybersecurity multi-turn data cannot be assessed.
- [Results] Results / Evaluation: The fine-tuning experiments report percentage improvements on CyberSecEval-Instruct and CyberSecEval-MITRE but omit the base model name, training hyperparameters, number of epochs, and any statistical significance tests or variance across runs. This leaves open whether the deltas exceed what would be obtained from a comparable volume of data generated by non-competitive prompting or from existing cybersecurity corpora.
minor comments (2)
- [Method] The manuscript does not specify how the 10 teams were assigned to attacker versus defender roles or whether any teams participated in both, which affects reproducibility of the data-generation protocol.
- [Data] Figure or table captions describing the generated conversations should include basic statistics (average turns per conversation, topic distribution) to allow readers to gauge diversity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, indicating the revisions we will make to improve clarity and reproducibility while honestly noting limitations of the current study.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central empirical claim attributes the 18.47% and 29.42% benchmark improvements to the quality and diversity arising from the attacker-defender competition, yet the abstract (and, from context, the results section) provides no baseline model details, no matched-volume non-adversarial control dataset, and no description of how the 19,683 conversations were filtered or split. Without these, the incremental benefit of the arena format over simply collecting an equivalent volume of cybersecurity multi-turn data cannot be assessed.
Authors: We agree that the abstract is too concise and that the results section requires expansion for proper evaluation of the claims. In the revised manuscript we will update the abstract to name the base model and briefly note the data filtering and train/test split procedure. We will add a dedicated subsection in Results describing the full data processing pipeline, including any quality filters applied to the 19,683 conversations. A matched-volume non-adversarial control was not collected in this work; we will explicitly discuss this as a limitation, explain the design rationale for focusing on the adversarial format (iterative attack-defense dynamics and cross-team diversity), and suggest such a control as valuable future work. revision: partial
-
Referee: [Results] Results / Evaluation: The fine-tuning experiments report percentage improvements on CyberSecEval-Instruct and CyberSecEval-MITRE but omit the base model name, training hyperparameters, number of epochs, and any statistical significance tests or variance across runs. This leaves open whether the deltas exceed what would be obtained from a comparable volume of data generated by non-competitive prompting or from existing cybersecurity corpora.
Authors: We will revise the Results section to specify the exact base model, all training hyperparameters, the number of epochs, and any available run-to-run variance or statistical tests. These additions will make the experimental protocol fully reproducible and allow readers to assess the magnitude of the reported gains relative to the base model. We maintain that the multi-team adversarial setting produces data characteristics (complexity, diversity of attack vectors) that are difficult to replicate with simple non-competitive prompting, but we will add explicit language acknowledging that a direct volume-matched comparison remains an open question. revision: yes
- Reporting results from a matched-volume non-adversarial control dataset, which was outside the scope of the original competition-based study.
Circularity Check
Empirical benchmark gains measured on external datasets exhibit no circularity
full rationale
The paper's central result is an empirical measurement: 10 teams generated 19,683 conversations via an attacker-defender competition, an open-source model was fine-tuned on the resulting dataset, and performance deltas of 18.47% and 29.42% were observed on the independent CyberSecEval-Instruct and CyberSecEval-MITRE benchmarks. No equations, fitted parameters, self-definitions, or self-citation chains are present that would reduce these measured outcomes to the inputs by construction. The derivation chain consists of a described data-generation procedure followed by standard fine-tuning and external evaluation; the reported improvements are falsifiable observations rather than tautological restatements of the competition setup.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Interactive competition between attacker teams creating prompts and defender teams generating responses naturally produces diverse, complex, and high-quality multi-turn conversational data suitable for LLM safety alignment.
Reference graph
Works this paper leans on
-
[1]
2024 , note="
Chen, Hao and Waheed, Abdul and Li, Xiang and Wang, Yidong and Wang, Jindong and Raj, Bhiksha and Abdin, Marah I , journal=. 2024 , note="
2024
-
[2]
2023 , note="
Ranaldi, Leonardo and Pucci, Giulia , journal=. 2023 , note="
2023
-
[3]
2022 , note="
Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A and Khashabi, Daniel and Hajishirzi, Hannaneh , journal=. 2022 , note="
2022
-
[4]
and Goldman, Max and Miller, Robert C
Little, Greg and Chilton, Lydia B. and Goldman, Max and Miller, Robert C. , title =. Proceedings of the ACM SIGKDD Workshop on Human Computation , pages =. 2010 , isbn =. doi:10.1145/1837885.1837907 , abstract =
-
[5]
Hamby, Tyler and Taylor, Wyn , abstract =. Educ. Psychol. Meas. , publisher =
-
[6]
Marshall, Catherine C. and Goguladinne, Partha S.R. and Maheshwari, Mudit and Sathe, Apoorva and Shipman, Frank M. , title =. Proceedings of the 15th ACM Web Science Conference 2023 , pages =. 2023 , isbn =. doi:10.1145/3578503.3583622 , abstract =
-
[7]
Concrete Problems in AI Safety
Dario Amodei and Chris Olah and Jacob Steinhardt and Paul Christiano and John Schulman and Dan Mané , year=. 1606.06565 , archivePrefix=
work page internal anchor Pith review arXiv
-
[8]
2022 , publisher=
Wu, Xingjiao and Xiao, Luwei and Sun, Yixuan and Zhang, Junhang and Ma, Tianlong and He, Liang , journal=. 2022 , publisher=
2022
-
[9]
2022 , note="
Ding, Bosheng and Qin, Chengwei and Liu, Linlin and Chia, Yew Ken and Joty, Shafiq and Li, Boyang and Bing, Lidong , journal=. 2022 , note="
2022
-
[10]
2024 , note="
Long, Lin and Wang, Rui and Xiao, Ruixuan and Zhao, Junbo and Ding, Xiao and Chen, Gang and Wang, Haobo , journal=. 2024 , note="
2024
-
[11]
2023 , note="
Xu, Ran and Cui, Hejie and Yu, Yue and Kan, Xuan and Shi, Wenqi and Zhuang, Yuchen and Jin, Wei and Ho, Joyce and Yang, Carl , journal=. 2023 , note="
2023
-
[12]
2024 , note="
Sudalairaj, Shivchander and Bhandwaldar, Abhishek and Pareja, Aldo and Xu, Kai and Cox, David D and Srivastava, Akash , journal=. 2024 , note="
2024
-
[13]
2021 , note="
Yoo, Kang Min and Park, Dongju and Kang, Jaewook and Lee, Sang-Woo and Park, Woomyeong , journal=. 2021 , note="
2021
-
[14]
2024 , note="
Gandhi, Saumya and Gala, Ritu and Viswanathan, Vijay and Wu, Tongshuang and Neubig, Graham , journal=. 2024 , note="
2024
-
[15]
Gunasekar, Suriya and Zhang, Yi and Aneja, Jyoti and Mendes, Caio C. arXiv preprint arXiv:2306.11644 , year=
work page internal anchor Pith review arXiv
-
[16]
2023 , note="
Eldan, Ronen and Li, Yuanzhi , journal=. 2023 , note="
2023
-
[17]
2022 , note="
Ye, Jiacheng and Gao, Jiahui and Li, Qintong and Xu, Hang and Feng, Jiangtao and Wu, Zhiyong and Yu, Tao and Kong, Lingpeng , journal=. 2022 , note="
2022
-
[18]
2023 , note="
Yu, Yue and Zhuang, Yuchen and Zhang, Jieyu and Meng, Yu and Ratner, Alexander J and Krishna, Ranjay and Shen, Jiaming and Zhang, Chao , journal=. 2023 , note="
2023
-
[19]
2023 , note="
Josifoski, Martin and Sakota, Marija and Peyrard, Maxime and West, Robert , journal=. 2023 , note="
2023
-
[20]
2023 , note="
Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Zheng, Zhi and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen , journal=. 2023 , note="
2023
-
[21]
2022 , note="
Meng, Yu and Huang, Jiaxin and Zhang, Yu and Han, Jiawei , journal=. 2022 , note="
2022
-
[22]
2023 , note="
He, Xingwei and Lin, Zhenghao and Gong, Yeyun and Jin, Alex and Zhang, Hang and Lin, Chen and Jiao, Jian and Yiu, Siu Ming and Duan, Nan and Chen, Weizhu and others , journal=. 2023 , note="
2023
-
[23]
2022 , note="
Li, Junlong and Wang, Jinyuan and Zhang, Zhuosheng and Zhao, Hai , journal=. 2022 , note="
2022
-
[24]
2023 , note="
Ye, Jiacheng and Li, Chengzu and Kong, Lingpeng and Yu, Tao , journal=. 2023 , note="
2023
-
[25]
2022 , note="
Honovich, Or and Scialom, Thomas and Levy, Omer and Schick, Timo , journal=. 2022 , note="
2022
-
[26]
2023 , organization=
Shao, Zhihong and Gong, Yeyun and Shen, Yelong and Huang, Minlie and Duan, Nan and Chen, Weizhu , booktitle=. 2023 , organization=
2023
-
[27]
2023 , note="
Chen, Lichang and Li, Shiyang and Yan, Jun and Wang, Hai and Gunaratna, Kalpa and Yadav, Vikas and Tang, Zheng and Srinivasan, Vijay and Zhou, Tianyi and Huang, Heng and others , journal=. 2023 , note="
2023
-
[28]
2023 , note="
Seedat, Nabeel and Huynh, Nicolas and Van Breugel, Boris and Van Der Schaar, Mihaela , journal=. 2023 , note="
2023
-
[29]
2022 , note="
Ye, Jiacheng and Gao, Jiahui and Feng, Jiangtao and Wu, Zhiyong and Yu, Tao and Kong, Lingpeng , journal=. 2022 , note="
2022
-
[30]
2023 , note="
Chung, John Joon Young and Kamar, Ece and Amershi, Saleema , journal=. 2023 , note="
2023
-
[31]
2023 , note="
Pangakis, Nicholas and Wolken, Samuel and Fasching, Neil , journal=. 2023 , note="
2023
-
[32]
2022 , note="
Liu, Alisa and Swayamdipta, Swabha and Smith, Noah A and Choi, Yejin , journal=. 2022 , note="
2022
-
[33]
2024 , note="
Tan, Zhen and Li, Dawei and Wang, Song and Beigi, Alimohammad and Jiang, Bohan and Bhattacharjee, Amrita and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan , journal=. 2024 , note="
2024
-
[34]
2023 , note="
Li, Zhuoyan and Zhu, Hangxiao and Lu, Zhuoran and Yin, Ming , journal=. 2023 , note="
2023
-
[35]
2024 , note="
Guo, Xu and Chen, Yiqiang , journal=. 2024 , note="
2024
-
[36]
arXiv preprint arXiv:2401.02524 , year=
Bauer, Andr. arXiv preprint arXiv:2401.02524 , year=
-
[37]
2024 , note="
Liu, Ruibo and Wei, Jerry and Liu, Fangyu and Si, Chenglei and Zhang, Yanzhe and Rao, Jinmeng and Zheng, Steven and Peng, Daiyi and Yang, Diyi and Zhou, Denny and others , journal=. 2024 , note="
2024
-
[38]
2024 , note="
Havrilla, Alex and Dai, Andrew and O'Mahony, Laura and Oostermeijer, Koen and Zisler, Vera and Albalak, Alon and Milo, Fabrizio and Raparthy, Sharath Chandra and Gandhi, Kanishk and Abbasi, Baber and others , journal=. 2024 , note="
2024
-
[39]
2025 , note="
Nadas, Mihai and Diosan, Laura and Tomescu, Andreea , journal=. 2025 , note="
2025
-
[40]
MSN , year =
Sherin Shibu , title =. MSN , year =
-
[41]
cnbc , year =
Jordan Novet , title =. cnbc , year =
-
[42]
Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,
Hammond Pearce and Baleegh Ahmad and Benjamin Tan and Brendan Dolan-Gavitt and Ramesh Karri , year=. 2108.09293 , archivePrefix=
-
[43]
Opening up closings.Semiotica, 8(4): 289–327, 1973
Emanuel A. Schegloff and Harvey Sacks. Semiotica. 1973. doi:10.1515/semi.1973.8.4.289
-
[44]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed Chi and Quoc Le and Denny Zhou , year=. 2201.11903 , archivePrefix=
work page internal anchor Pith review arXiv
-
[45]
Self-Reflection in Large Language Model Agents: Effects on Problem-Solving Performance
Renze, Matthew and Guven, Erhan , year=. doi:10.1109/fllm63129.2024.10852493 , booktitle=
-
[46]
B leu: a method for automatic evaluation of machine translation
Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135
-
[47]
2018 , note="
Vaughan, Jennifer Wortman , journal=. 2018 , note="
2018
-
[48]
2019 , note="
Reimers, Nils and Gurevych, Iryna , journal=. 2019 , note="
2019
-
[49]
Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mik...
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Alex Cloud and Minh Le and James Chua and Jan Betley and Anna Sztyber-Betley and Jacob Hilton and Samuel Marks and Owain Evans , year=. 2507.14805 , archivePrefix=
-
[51]
Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and Lélio Renard Lavaud and Marie-Anne Lachaux and Pierre Stock and Teven Le Scao and Thibaut Lavril and Thomas Wang and Timothée Lacroix and Willi...
work page internal anchor Pith review Pith/arXiv arXiv
-
[52]
Purple llama CyberSecEval : A secure coding benchmark for language models
Manish Bhatt and Sahana Chennabasappa and Cyrus Nikolaidis and Shengye Wan and Ivan Evtimov and Dominik Gabi and Daniel Song and Faizan Ahmad and Cornelius Aschermann and Lorenzo Fontana and Sasha Frolov and Ravi Prakash Giri and Dhaval Kapil and Yiannis Kozyrakis and David LeBlanc and James Milazzo and Aleksandar Straumann and Gabriel Synnaeve and Varun ...
-
[53]
Ming Li and Yong Zhang and Shwai He and Zhitao Li and Hongyu Zhao and Jianzong Wang and Ning Cheng and Tianyi Zhou , year=. 2402.00530 , archivePrefix=
-
[54]
Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay , year=. 2306.01116 , archivePrefix=
work page internal anchor Pith review arXiv
-
[55]
Advances in Neural Information Processing Systems , editor =
Penedo, Guilherme and Kydl\'. Advances in Neural Information Processing Systems , editor =. 2024 , note = "
2024
-
[56]
o pf, Yannic Kilcher, Dimitri von R \
Andreas Köpf and Yannic Kilcher and Dimitri von Rütte and Sotiris Anagnostidis and Zhi-Rui Tam and Keith Stevens and Abdullah Barhoum and Nguyen Minh Duc and Oliver Stanley and Richárd Nagyfi and Shahul ES and Sameer Suri and David Glushkov and Arnav Dantuluri and Andrew Maguire and Christoph Schuhmann and Huu Nguyen and Alexander Mattick , year=. 2304.07...
-
[57]
arXiv preprint arXiv:2403.02990 , year=
Bosheng Ding and Chengwei Qin and Ruochen Zhao and Tianze Luo and Xinze Li and Guizhen Chen and Wenhan Xia and Junjie Hu and Anh Tuan Luu and Shafiq Joty , year=. 2403.02990 , archivePrefix=
-
[58]
AI @ Meta, Llama Team , year=. 2407.21783 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[59]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai and Andy Jones and Kamal Ndousse and Amanda Askell and Anna Chen and Nova DasSarma and Dawn Drain and Stanislav Fort and Deep Ganguli and Tom Henighan and Nicholas Joseph and Saurav Kadavath and Jackson Kernion and Tom Conerly and Sheer El-Showk and Nelson Elhage and Zac Hatfield-Dodds and Danny Hernandez and Tristan Hume and Scott Johnston and...
-
[60]
Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul F and Leike, Jan and Lowe,...
-
[61]
Gonzalez and Ion Stoica , booktitle=
Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Banghua Zhu and Hao Zhang and Michael Jordan and Joseph E. Gonzalez and Ion Stoica , booktitle=. 2024 , note="
2024
-
[62]
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang and Yeganeh Kordi and Swaroop Mishra and Alisa Liu and Noah A. Smith and Daniel Khashabi and Hannaneh Hajishirzi , year=. 2212.10560 , archivePrefix=
work page internal anchor Pith review arXiv
-
[63]
Llama-nemotron: Efficient reasoning models.arXiv preprint arXiv:2505.00949, 2025
Bercovich, Akhiad and others , year=. 2505.00949 , archivePrefix=
-
[64]
Jie Chen and Yupeng Zhang and Bingning Wang and Wayne Xin Zhao and Ji-Rong Wen and Weipeng Chen , year=. 2406.12397 , archivePrefix=
-
[65]
Zur, Amir and Loftus, Alexander R and Orgad, Hadas and Ying, Zhuofan and Sahin, Kerem and Bau, David , year=
-
[66]
WizardLM: Empowering large pre-trained language models to follow complex instructions
Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Qingwei Lin and Daxin Jiang , year=. 2304.12244 , archivePrefix=
work page internal anchor Pith review arXiv
-
[67]
Magicoder: Empow- ering code generation with oss-instruct.arXiv preprint arXiv:2312.02120, 2023
Yuxiang Wei and Zhe Wang and Jiawei Liu and Yifeng Ding and Lingming Zhang , year=. 2312.02120 , archivePrefix=
-
[68]
Journal of Machine Learning Research , year =
Laurens van der Maaten and Geoffrey Hinton , title =. Journal of Machine Learning Research , year =
-
[69]
2025 , eprint=
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development , author=. 2025 , eprint=
2025
-
[70]
Secure and useful models are reasonable: Aligning code models via utility-preserving reasoning , journal =
Atharva Naik and Alex Xie and Abhinav Rao and Anmol Agarwal and Shubham Gandhi and Michael Hilton and Carolyn Ros. Secure and useful models are reasonable: Aligning code models via utility-preserving reasoning , journal =. 2025 , url =
2025
-
[71]
2025 , eprint=
PurpCode: Reasoning for Safer Code Generation , author=. 2025 , eprint=
2025
-
[72]
SecureLion: Building a trustworthy AI assistant with security reasoning in a realistic adversarial competition , journal =
Jinjun Peng and Weiliang Zhao and Ira Ceka and Alex Mathai and Adam. SecureLion: Building a trustworthy AI assistant with security reasoning in a realistic adversarial competition , journal =. 2025 , url =
2025
-
[73]
AlquistCoder: A constitution-guided approach to safe, trustworthy code generation , journal =
Ond. AlquistCoder: A constitution-guided approach to safe, trustworthy code generation , journal =. 2025 , url =
2025
-
[74]
2025 , eprint=
RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning , author=. 2025 , eprint=
2025
-
[75]
2025 , eprint=
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants , author=. 2025 , eprint=
2025
-
[76]
2025 , eprint=
RedCoder: Automated Multi-Turn Red Teaming for Code LLMs , author=. 2025 , eprint=
2025
-
[77]
Amazon Nova AI Challenge Proceedings , title =
Xu, Zexin and Li, Tingxi and Rathnasuriya, Ravishka and Song, Zihe and Ren, Jun and Mandalapu, Bhavesh and Setayeshpour, Soroush and Du, Xinya and Yang, Wei , year =. Amazon Nova AI Challenge Proceedings , title =
-
[78]
Amazon Nova AI Challenge Proceedings , year =
Xiaogeng Liu and Jingyu Huang and Jiongxiao Wang and Yingzi Ma and Hao Wu and Chaowei Xiao , title =. Amazon Nova AI Challenge Proceedings , year =
-
[79]
Amazon Nova AI Challenge Proceedings , year =
Yi Zeng and Mahavir Dabas and Tran Huynh and Nikhil Reddy and Adam Nguyen and Sanchit Kabra and Ruoxi Jia , title =. Amazon Nova AI Challenge Proceedings , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.