SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

Andrew Chin; Cen Zhang; David J. Musliner; Dongkwan Kim; Fabian Fleischer; Hanqing Zhao; Isaac Goldthwaite; Jefferson Casavant; Jeff Huang; Jiho Kim

arxiv: 2602.07666 · v3 · pith:7E2SU5KMnew · submitted 2026-02-07 · 💻 cs.CR · cs.AI

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

Cen Zhang , Younggi Park , Fabian Fleischer , Yu-Fu Fu , Jiho Kim , Dongkwan Kim , Youngjoon Kim , Qingxiao Xu

show 13 more authors

Andrew Chin Ze Sheng Hanqing Zhao Michael Pelican David J. Musliner Jeff Huang Jon Silliman Mikel Mcdaniel Jefferson Casavant Isaac Goldthwaite Nicholas Vidovich Matthew Lehman Taesoo Kim

This is my paper

classification 💻 cs.CR cs.AI

keywords competitionaixcccrsscyberdesignadvancesanalysisautonomous

0 comments

read the original abstract

DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OverrideFuzz: Semantic-Aware Grammar Fuzzing for Script-Runtime Vulnerabilities
cs.CR 2026-05 conditional novelty 7.0

OverrideFuzz uses semantic-aware grammar fuzzing with reflection to model override hooks and dynamic rebinding, producing coverage growth and inputs that match known vulnerability patterns on CPython, Lua, and QuickJS...
Quality-Assured Fuzz Harness Generation via the Four Principles Framework
cs.CR 2026-05 unverdicted novelty 6.0

QuartetFuzz introduces the Four Principles framework for harness correctness and deploys an autonomous LLM agent that produces verified harnesses, yielding 29 confirmed bugs across 23 projects and identifying violatio...