Retrieval-Augmented LLMs for Security Incident Analysis
Pith reviewed 2026-05-15 08:27 UTC · model grok-4.3
The pith
A RAG system with targeted query filtering lets LLMs reconstruct security incidents from raw logs within context limits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The system extracts indicators from raw logs using a query library linked to MITRE ATT&CK techniques, then retrieves only the relevant passages to answer forensic questions and rebuild attack timelines. On eight malware scenarios the best configuration reaches 94 percent average recall; on the Active Directory scenario attack-step detection reaches 100 percent precision and 96 percent recall. Ablation studies show both the query filter and RAG retrieval are required to stay inside context limits while preserving accuracy. Different LLMs trade recall for cost, with locally hosted models delivering usable performance at zero per-query expense.
What carries the argument
The RAG pipeline driven by a predefined query library mapped to MITRE ATT&CK techniques, which first filters raw logs into compact indicator sets before semantic retrieval supplies context to the LLM.
If this is right
- Claude Sonnet 4 delivers the highest recall while DeepSeek V3 matches it at roughly one-fifteenth the cost.
- Llama 3.1 70B achieves 81 percent recall with no per-query expense when run locally.
- Removing either the query filter or the RAG step causes measurable drops in reconstruction accuracy.
- Enumeration prompts raise attack-step detection recall to 96 percent on the Active Directory scenario.
Where Pith is reading between the lines
- The same filtering-plus-retrieval pattern could be applied to other log-heavy domains such as fraud detection or industrial control monitoring.
- An automated way to expand or validate the query library against new attack patterns would reduce dependence on manual MITRE mapping.
- Real-time streaming of filtered indicators could turn the method into an online alert triage tool rather than a post-incident analyzer.
Load-bearing premise
The fixed query library captures every necessary indicator from the logs without missing critical events or injecting enough noise to mislead the LLM.
What would settle it
Run the system on a fresh incident whose key indicators are absent from the query library; if recall falls below the levels reported for the evaluated scenarios, the central claim does not hold.
Figures
read the original abstract
Investigating cybersecurity incidents requires collecting and analyzing evidence from multiple log sources, including intrusion detection alerts, network traffic records, and authentication events. This process is labor-intensive: analysts must sift through large volumes of data to identify relevant indicators and piece together what happened. We present a RAG-based system that performs security incident analysis through targeted query-based filtering and LLM semantic reasoning. The system uses a query library with associated MITRE ATT&CK techniques to extract indicators from raw logs, then retrieves relevant context to answer forensic questions and reconstruct attack sequences. We evaluate the system with eight LLM configurations on malware traffic incidents and a multi-stage Active Directory attack. We find that LLMs have different performance and tradeoffs, with Claude Sonnet 4 achieving 94% and DeepSeek V3 achieving 89% average recall across 17 malware scenarios, while DeepSeek costs 15$\times$ less than Claude per analysis, and locally-deployed Llama 3.1:70b achieves 81% recall at zero per-query cost. Attack step detection on the Active Directory scenario reaches 100% precision and up to 96% recall with an enumeration prompt. These results demonstrate that combining targeted query-based filtering with RAG-based retrieval -- confirmed essential by ablation studies -- enables accurate, cost-effective security analysis within LLM context limits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a RAG-based system for security incident analysis that combines a predefined query library linked to MITRE ATT&CK techniques for targeted filtering of raw logs (IDS alerts, network traffic, authentication events) with LLM semantic reasoning to answer forensic questions and reconstruct attack sequences. It evaluates the approach across eight LLM configurations on 17 malware traffic scenarios and one multi-stage Active Directory attack, reporting average recall of 94% (Claude Sonnet 4) and 89% (DeepSeek V3), 100% precision and up to 96% recall on attack-step detection, cost trade-offs (DeepSeek 15× cheaper than Claude), and zero-cost local deployment (Llama 3.1:70b at 81% recall), with ablations confirming that query filtering plus RAG is essential for staying within context limits while preserving accuracy.
Significance. If the results hold, the work provides a concrete, engineering-grounded demonstration that targeted filtering plus RAG enables accurate, cost-effective LLM-based security forensics without exceeding context windows. The ablation studies directly supporting the necessity of both components, the multi-LLM comparison including local and low-cost options, and the use of held-out scenarios with quantitative recall/precision metrics constitute a solid empirical contribution to AI-assisted incident response.
minor comments (2)
- [Evaluation] §4 (Evaluation setup): the exact composition of the eight LLM configurations is only partially illustrated by the three named models; an explicit table listing all eight with their context windows and pricing would improve reproducibility.
- [System Design] The query library is described as 'predefined' and MITRE-linked, but no quantitative breakdown of how many queries per technique or coverage statistics are provided; adding this would clarify the engineering effort and potential for omission.
Simulated Author's Rebuttal
We thank the referee for their thorough and positive review. We are pleased that the significance of the targeted filtering plus RAG approach, the ablation studies, multi-LLM cost/performance comparisons, and quantitative evaluation on held-out scenarios were recognized. We appreciate the recommendation to accept.
Circularity Check
No significant circularity identified
full rationale
The paper's central claims rest on direct empirical measurements of recall and precision for an LLM+RAG pipeline evaluated on 17 held-out malware scenarios plus one AD attack. No equations appear that define performance metrics in terms of fitted parameters from the same data. The MITRE-linked query library is presented as an engineering artifact whose sufficiency is shown by the reported numbers rather than presupposed by definition. Ablation studies compare variants on the same external test cases, providing independent confirmation rather than self-referential reduction. No self-citation chain or uniqueness theorem is invoked to force the architecture. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can perform reliable semantic reasoning over filtered log excerpts to reconstruct attack sequences.
- domain assumption The query library with associated MITRE ATT&CK mappings extracts the indicators required for the tested incident types.
Reference graph
Works this paper leans on
-
[1]
Ronal Singh, Shahroz Tariq, Fatemeh Jalalvand, Mohan Baruwal Chhetri, Surya Nepal, Cecile Paris, and Martin Lochner. LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres, September 2025. arXiv:2508.18947
-
[2]
Leon Kersten, Tom Mulders, Emmanuele Zambon, Chris Snijders, and Luca Allodi. ’Give Me Structure’: Synthesis and Evaluation of a (Network) Threat Analysis Process Supporting Tier 1 Investigations in a Security Operation Center. pages 97–111, 2023
work page 2023
-
[3]
Alert Fatigue in Security Operations Centres: Research Challenges and Opportunities
Shahroz Tariq, Mohan Baruwal Chhetri, Surya Nepal, and Cecile Paris. Alert Fatigue in Security Operations Centres: Research Challenges and Opportunities. ACM Comput. Surv., 57(9):224:1–224:38, April 2025
work page 2025
-
[4]
Certified pre-owned: Abusing active direc- tory certificate services
Will Schroeder and Lee Christensen. Certified pre-owned: Abusing active direc- tory certificate services. Technical report, SpecterOps, June 2021. Whitepaper
work page 2021
-
[5]
Steal or forge kerberos tickets (technique t1558), 2025
MITRE ATT&CK. Steal or forge kerberos tickets (technique t1558), 2025. Ac- cessed: February 6, 2026
work page 2025
-
[6]
Audit-LLM: Multi-agent collaboration for log-based insider threat detection.arXiv preprint, 2024
Chengyu Song, Linru Ma, Jianming Zheng, Jinzhi Liao, Hongyu Kuang, and Lin Yang. Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection, August 2024. arXiv:2408.08902
-
[7]
CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage, September 2025
Bowen Wei, Yuan Shen Tay, Howard Liu, Jinhao Pan, Kun Luo, Ziwei Zhu, and Chris Jordan. CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage, September 2025. arXiv:2510.00311
-
[8]
Large Language Models for Cyber Security: A Systematic Literature Review.ACM Trans
Hanxiang Xu, Shenao Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, and Haoyu Wang. Large Language Models for Cyber Security: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol., September 2025
work page 2025
- [9]
-
[10]
Training exercises - Malware-Traffic-Analysis.net
Brad Duncan. Training exercises - Malware-Traffic-Analysis.net. https://www. malware-traffic-analysis.net/training-exercises.html, 2026. Accessed: 2026-02- 25
work page 2026
-
[11]
Unit 42 Wireshark quiz series (Gozi, Qakbot, RedLine Stealer, IcedID)
Brad Duncan. Unit 42 Wireshark quiz series (Gozi, Qakbot, RedLine Stealer, IcedID). https://unit42.paloaltonetworks.com/category/cybersecurity-tutorials/,
-
[12]
Accessed: 2026-02-25
work page 2026
-
[13]
SANS ISC forensic contest diaries
SANS Internet Storm Center. SANS ISC forensic contest diaries. https://isc.sans. edu/diary/28160, 2021. Series of monthly forensic contests. Accessed: 2026-02-25
work page 2021
-
[14]
SecureBERT: A Domain-Specific Language Model for Cybersecurity
Ehsan Aghaei, Xi Niu, Waseem Shadid, and Ehab Al-Shaer. SecureBERT: A Domain-Specific Language Model for Cybersecurity. In Fengjun Li, Kaitai Liang, Zhiqiang Lin, and Sokratis K. Katsikas, editors,Security and Privacy in Commu- nication Networks, pages 39–56, Cham, 2023. Springer Nature Switzerland
work page 2023
-
[15]
Cy- SecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain
Markus Bayer, Philipp Kuehn, Ramin Shanehsaz, and Christian Reuter. Cy- SecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain. ACM Trans. Priv. Secur., 27(2):18:1–18:20, April 2024
work page 2024
-
[16]
Threat Report ATT&CK Mapper (TRAM)
MITRE. Threat Report ATT&CK Mapper (TRAM). https://ctid.mitre.org/projects/ threat-report-attck-mapper-tram/, 2023
work page 2023
-
[17]
LogLLM: Log-based Anomaly Detection Using Large Language Models, April 2025
Wei Guan, Jian Cao, Shiyou Qian, Jianqi Gao, and Chun Ouyang. LogLLM: Log-based Anomaly Detection Using Large Language Models, April 2025. arXiv:2411.08561
-
[18]
LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analy- sis
Yilun Liu, Shimin Tao, Weibin Meng, Feiyu Yao, Xiaofeng Zhao, and Hao Yang. LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analy- sis. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, ICSE-Companion ’24, pages 364–365, New York, NY, USA, May 2024. Association for Comput...
work page 2024
-
[19]
PeiYu Tseng, ZihDwo Yeh, Xushu Dai, and Peng Liu. Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers, July 2024. arXiv:2407.13093
-
[20]
Retrieval-Augmented Gener- ation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-Augmented Gener- ation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates,...
work page 2020
-
[21]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-Augmented Generation for Large Language Models: A Survey, March 2024. arXiv:2312.10997
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
arXiv preprint arXiv:2506.00054 , year=
Chaitanya Sharma. Retrieval-augmented generation: A comprehensive sur- vey of architectures, enhancements, and robustness frontiers.arXiv preprint, arXiv:2506.00054, 2025
-
[23]
Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, and Fabrizio Marozzo. CyberRAG: An agentic RAG cyber attack classification and reporting tool.Future Generation Computer Systems, 176:108186, March 2026
work page 2026
-
[24]
AgCyRAG: An agentic knowledge graph based RAG framework for automated security analysis
Kabul Kurniawan, Rayhan Firdaus Ardian, Elmar Kiesling, and Andreas Ekelhart. AgCyRAG: An agentic knowledge graph based RAG framework for automated security analysis. InProceedings of the Second International Workshop on Retrieval- Augmented Generation Enabled by Knowledge Graphs (RAGE-KG 2025), (CEUR Workshop Proceedings), 2025
work page 2025
-
[25]
Marco Carvalho, Fitzroy Nembhard, and Dhanish Mehta. Towards the Appli- cation of GraphRAG to Network Security.The International FLAIRS Conference Proceedings, 38, May 2025
work page 2025
-
[26]
Brad Duncan. Cold as ice: Unit 42 Wireshark quiz for IcedID. https://unit42. paloaltonetworks.com/wireshark-quiz-icedid/, May 2023. Accessed: 2026-02-25. Retrieval-Augmented LLMs for Security Incident Analysis Appendix: Table of contents A Information about the Malware Scenarios 11 B Prompt Template 11 C Sample Network Context 12 D Questions used for LLM ...
work page 2023
-
[27]
Analysis: IP Address 10.1.17.215 shows multiple concerning indicators:
What is the IP address of the potentially infected internal host in the LAN? Looking at the security data, I need to identify the potentially infected internal host by analyzing the various security indicators. Analysis: IP Address 10.1.17.215 shows multiple concerning indicators:
-
[28]
Suspicious Domain Communications: This IP is the sole source for connections to 16 different suspicious domains including: - "appointedtimeagriculture.com" (suspicious domain name) - "authenticatoor.org" (typosquatting-like domain) - "google-authenticator.burleson-appliance.net" (suspicious subdomain) - "bttrack.com" (potentially BitTorrent related)
-
[29]
ET MALWARE Fake Microsoft Teams CnC Payload Request (GET)
High-SeveritySecurityAlerts: 10.1.17.215 appears as the top source IP (6 occurrences) in high-severity Suricata alerts, including: - "ET MALWARE Fake Microsoft Teams CnC Payload Request (GET)" (2 occurrences) - "ET INFO PS1 Powershell File Request" (2 occurrences) - "ET INFO TeamViewer Dyngate User-Agent" (2 occurrences)
-
[30]
Kerberos Authentication Activity: The IP is associated with: - User account "shutchenson" (11 authentication events) - Computer accounts "DESKTOP-L8C5GSJ$" and "desktop-l8c5gsj$" (14 total events) Cadet et al. - Hostname "desktop-l8c5gsj" 4.NetworkContext: 10.1.17.215 is within the internal LAN segment (10.1.17.0/24) and shows extensive outbound communica...
-
[31]
What are the suspicious external IP addresses contacted, which might be involved in command-and-control (C2) communication? Looking at the security data, I can identify several suspicious external IP addresses that are likely involved in command- and-control (C2) communication based on multiple indicators: Analysis High-Confidence C2 IPs:
-
[32]
ET DROP Spamhaus DROP Listed Traffic Inbound group 5
5.252.153.241 - This IP appears 4 times in high-severity alerts and is flagged by "ET DROP Spamhaus DROP Listed Traffic Inbound group 5", indicating it's on a known malicious IP blocklist
-
[33]
Medium-Confidence Suspicious IPs:
185.188.32.26 - This IP appears 2 times in high-severity alerts and is also flagged by the same Spamhaus DROP alert, confirming malicious activity. Medium-Confidence Suspicious IPs:
-
[34]
ET MALWARE Fake Microsoft Teams CnC Payload Request (GET)
45.125.66.32 and 45.125.66.252 - These IPs are using suspicious IP-based certificates (rather than domain-based certificates) and self-signed certificates, which are common C2 infrastructure characteristics. They also appear as sources in high-severity alerts. Additional Context: - The alerts include "ET MALWARE Fake Microsoft Teams CnC Payload Request (G...
-
[35]
ET MALWARE Fake Microsoft Teams CnC Payload Request (GET)
What is the Windows human user account name (not machine accounts ending in $) of the potentially infected machine in the LAN? Looking at the security data, I need to identify the Windows human user account name of the potentially infected machine. Analysis Potentially Infected Machine Identification: From the Suricata alerts, I can see several concerning...
- [36]
-
[37]
"DESKTOP-L8C5GSJ$" - 10 authentication events (machine account - ends in $)
-
[38]
"desktop-l8c5gsj$" - 4 authentication events (machine account - ends in $) Suspicious Activity Correlation: The suspicious domains data shows that IP 10.1.17.215 is accessing multiple suspicious domains, and this same IP is generating malware alerts in Suricata. The machine accounts ending in "$" are excluded per the question requirements, leaving "shutch...
-
[39]
What are the suspicious external IP addresses contacted, which might be involved in command-and-control (C2) communication? **Analysis** The question asks for suspicious external IP addresses that might be involved in command-and-control (C2) communication. The provided data shows network activity primarily from the internal host 172.17.0.99 (hostname des...
-
[40]
ET HUNTING GENERIC SUSPICIOUS POST to Dotted Quad with Fake Browser 1
**Suricata Alerts:** The high_severity_alerts aggregation shows 49 alerts for the signature "ET HUNTING GENERIC SUSPICIOUS POST to Dotted Quad with Fake Browser 1". The destination for these alerts is the external IP 79.124.78.197. Furthermore, there are single alerts for "ET ATTACK_RESPONSE Koi Loader/Stealer CnC Config Inbound" and "ET MALWARE Win32/Koi...
-
[41]
Its top destination by a massive margin (218 connections) is 79.124.78.197
**High Activity Logs:** The high_activity_external_dest_result.json data strongly supports this, showing that internal host 172.17.0.99 contacted 35 unique external destinations. Its top destination by a massive margin (218 connections) is 79.124.78.197. This high volume of traffic to a single external IP, coupled with the C2-themed Suricata alerts to/ fr...
-
[42]
**Other External IPs:** The same high-activity data lists other external IPs contacted by 172.17.0.99 (23.45.119.144, 204.79.197.203, 52.113.194.132, 23.195.212.189). However, there is no direct alert or signature data linking these IPs to malicious activity. Some may be benign (e.g., 204.79.197.203 is a Microsoft IP, 52.113.194.132 is likely Azure). With...
-
[43]
What are the suspicious external IP addresses contacted, which might be involved in command-and-control (C2) communication? The suspicious external IP addresses contacted, which might be involved in command-and-control (C2) communication, are: - 79.124.78.197 - 23.45.119.144 - 204.79.197.203 - 52.113.194.132 - 23.195.212.189 FINAL ANSWER= [79.124.78.197, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.