Recognition: unknown
TitanCA: Lessons from Orchestrating LLM Agents to Discover 100+ CVEs
Pith reviewed 2026-05-10 04:44 UTC · model grok-4.3
The pith
An orchestrated team of LLM agents discovers 203 zero-day vulnerabilities and yields 118 CVEs in open-source code.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TitanCA orchestrates LLM-powered agents through a sequence of matching, filtering, inspection, and adaptation modules to scan open-source software, resulting in the discovery of 203 confirmed zero-day vulnerabilities and the assignment of 118 CVEs.
What carries the argument
The four-module architecture of matching, filtering, inspection, and adaptation that coordinates multiple LLM agents for targeted vulnerability detection.
If this is right
- Traditional static analysis can be supplemented by LLM orchestration to lower false-positive rates in vulnerability scanning.
- Open-source projects gain a practical method for surfacing previously undetected security issues at the scale of hundreds of confirmed cases.
- Deployment experience highlights concrete design choices for the filtering and adaptation steps that improve reliability.
- The pipeline produces actionable results that translate directly into CVE assignments and fixes.
Where Pith is reading between the lines
- Similar agent orchestration could extend to other quality assurance tasks such as detecting logic bugs or performance issues.
- The approach may lower the barrier for smaller teams to perform thorough security audits without large manual review efforts.
- Lessons on module coordination could guide the construction of multi-agent systems for other domains that require sequential analysis of complex artifacts.
Load-bearing premise
The flagged issues are genuine zero-day vulnerabilities that independent verification can confirm with low false-positive rates, and the orchestration pattern applies beyond the specific projects tested.
What would settle it
Independent security researchers attempting to reproduce the 203 reported vulnerabilities or running the same four-module pipeline on a fresh set of open-source projects and measuring the number of new verified CVEs.
Figures
read the original abstract
Software vulnerabilities remain one of the most persistent threats to modern digital infrastructure. While static application security testing (SAST) tools have long served as the first line of defense, they suffer from high false-positive rates. This article presents TitanCA, a collaborative project between Singapore Management University and GovTech Singapore that orchestrates multiple large language model (LLM)-powered agents into a unified vulnerability discovery pipeline. Applied in open-source software, TitanCA has discovered 203 confirmed zero-day vulnerabilities and yielded 118 CVEs. We describe the four-module architecture, i.e., matching, filtering, inspection, and adaptation, and share key lessons from building and deploying an LLM-based vulnerability discovery solution in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents TitanCA, a system orchestrating multiple LLM-powered agents into a four-module pipeline (matching, filtering, inspection, and adaptation) for vulnerability discovery in open-source software. It reports discovering 203 confirmed zero-day vulnerabilities that yielded 118 CVEs and shares practical lessons from building and deploying the system.
Significance. If the reported vulnerabilities are independently validated with demonstrably low false-positive rates, the work would be significant for showing that multi-agent LLM orchestration can surface real, previously unknown security issues in production OSS at scale, providing a practical complement to traditional SAST tools. The emphasis on deployment lessons adds immediate value for the security-engineering community.
major comments (2)
- [Abstract and Results] The central claim (203 confirmed zero-days and 118 CVEs) is presented in the abstract and results without any quantitative breakdown of total candidates generated by the pipeline, acceptance/rejection rates at each module, or the explicit confirmation workflow (e.g., maintainer review statistics, CVE assignment timeline, or cross-checks against NVD/other databases). This information is load-bearing for establishing that the outputs are true zero-days rather than unconfirmed reports.
- [Evaluation / Results] No precision, recall, or false-positive metrics are supplied for the individual modules or the end-to-end system, nor is there a comparison against baseline SAST tools or alternative LLM prompting strategies. Without these, the contribution of the four-module orchestration cannot be isolated from the headline numbers.
minor comments (2)
- [Title and Abstract] The title refers to '100+ CVEs' while the abstract states 118; ensure numerical consistency across title, abstract, and body.
- [Architecture] The description of the four modules would benefit from a single summary table listing input/output interfaces and key LLM prompts used in each stage.
Simulated Author's Rebuttal
Thank you for the referee's insightful comments. We appreciate the opportunity to clarify and strengthen our manuscript. Below, we provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [Abstract and Results] The central claim (203 confirmed zero-days and 118 CVEs) is presented in the abstract and results without any quantitative breakdown of total candidates generated by the pipeline, acceptance/rejection rates at each module, or the explicit confirmation workflow (e.g., maintainer review statistics, CVE assignment timeline, or cross-checks against NVD/other databases). This information is load-bearing for establishing that the outputs are true zero-days rather than unconfirmed reports.
Authors: We agree with this observation and will revise the manuscript to include the requested quantitative details. Specifically, we will add a table and accompanying text in the Results section detailing the number of candidates entering and exiting each of the four modules, along with acceptance rates. Additionally, we will describe the confirmation workflow, including statistics on maintainer reviews, the timeline for CVE assignments, and how we verified against NVD and other sources to confirm these are zero-days. This information is available from our internal logs and will be summarized appropriately. revision: yes
-
Referee: [Evaluation / Results] No precision, recall, or false-positive metrics are supplied for the individual modules or the end-to-end system, nor is there a comparison against baseline SAST tools or alternative LLM prompting strategies. Without these, the contribution of the four-module orchestration cannot be isolated from the headline numbers.
Authors: The paper's primary contribution lies in the practical lessons learned from orchestrating LLM agents in a real deployment scenario, rather than in providing a benchmark-style evaluation. As such, we did not compute precision/recall metrics or run systematic comparisons, which would have required a different experimental setup. We will add a paragraph in the Discussion section explaining this focus and noting that the real-world validation through 118 CVEs provides strong evidence of effectiveness. We can partially address by including high-level qualitative comparisons to SAST tools based on observed false positive rates in practice, but full quantitative baselines are beyond the scope of this lessons-oriented work. revision: partial
Circularity Check
No circularity: empirical systems report with no derivations or self-referential predictions
full rationale
The paper describes a four-module LLM agent pipeline (matching, filtering, inspection, adaptation) applied to open-source software and reports empirical outcomes: 203 confirmed zero-day vulnerabilities yielding 118 CVEs. No equations, fitted parameters, mathematical predictions, or derivation chains exist. Claims rest on external CVE assignment and maintainer validation processes, which are independent of any internal construction. No self-citations form load-bearing premises, no uniqueness theorems are invoked, and no ansatzes or renamings reduce results to inputs. This is a standard non-circular empirical systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bui, Tan and Tun, Yan Naing and Nguyen, Thanh Phuc and Su, Yindu and Thung, Ferdian and Li, Yikun and Ang, Han Wei and Yin, Yide and Liauw, Frank and Shar, Lwin Khin and others , journal=
-
[2]
Weyssow, Martin and Yang, Chengran and Chen, Junkai and Widyasari, Ratnadira and Zhang, Ting and Huang, Huihui and Nguyen, Huu Hung and Tun, Yan Naing and Bui, Tan and Li, Yikun and others , journal=
-
[3]
Accepted by ACL 2026 arXiv preprint arXiv:2510.01002 , year=
Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework , author=. Accepted by ACL 2026 arXiv preprint arXiv:2510.01002 , year=
-
[4]
Let the Trial Begin: A Mock-court Approach to Vulnerability Detection using
Widyasari, Ratnadira and Weyssow, Martin and Irsan, Ivana Clairine and Ang, Han Wei and Liauw, Frank and Ouh, Eng Lieh and Shar, Lwin Khin and Kang, Hong Jin and Lo, David , journal=. Let the Trial Begin: A Mock-court Approach to Vulnerability Detection using
-
[5]
Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25
Li, Yikun and Bui, Ngoc Tan and Zhang, Ting and Yang, Chengran and Zhou, Xin and Weyssow, Martin and Jiang, Jinfeng and Chen, Junkai and Huang, Huihui and Nguyen, Huu Hung and others , journal=. Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25
-
[6]
Chen, Junkai and Huang, Huihui and Lyu, Yunbo and An, Junwen and Shi, Jieke and Yang, Chengran and Zhang, Ting and Tian, Haoye and Li, Yikun and Li, Zhenhao and others , journal=
-
[7]
Accepted by ICSE 2026 arXiv preprint arXiv:2507.09199 , year=
Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval , author=. Accepted by ICSE 2026 arXiv preprint arXiv:2507.09199 , year=
-
[8]
Li, Yikun and Zhang, Ting and Widyasari, Ratnadira and Tun, Yan Naing and Nguyen, Huu Hung and Bui, Tan and Irsan, Ivana Clairine and Cheng, Yiran and Lan, Xiang and Ang, Han Wei and others , journal=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.