Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

Abir Ashab Niloy; Ahmed Ryan; Imamul Hossain Rafi; Md Erfan; Md Rayhanur Rahman

arxiv: 2606.18190 · v1 · pith:LZ2V35PCnew · submitted 2026-06-16 · 💻 cs.CR · cs.LG

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

Abir Ashab Niloy , Ahmed Ryan , Imamul Hossain Rafi , Md Erfan , Md Rayhanur Rahman This is my paper

Pith reviewed 2026-06-26 23:54 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords cybersecurity datasetATT&CK labelsmulti-source logssmall language modelsintrusion detectionfine-tuninglog analysismulti-stage attacks

0 comments

The pith

A new multi-source log dataset with per-entry ATT&CK labels lets fine-tuned small language models classify attack chunks at 90-97 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds the first public dataset that records system, network, and browser logs together and tags each malicious event with a specific MITRE ATT&CK technique ID. It contains 870 sessions and roughly 2.3 million events, with 70 attack sessions created using real tools such as RATs and C2 tunnels and labeled across 12 tactics and 53 techniques. Three small language models fine-tuned with LoRA on this data raise chunk classification accuracy from about 8 percent in their base forms to 90-97 percent, while technique identification reaches a best exact-match rate of 42 percent. The work shows that cross-source patterns in multi-stage attacks become learnable once labeled multi-source data exists.

Core claim

No existing public dataset supplies simultaneous system, network, and browser logs with per-entry ATT&CK technique labels. The introduced collection of 870 sessions and 2.3 million events, generated on Windows endpoints and labeled with 53 techniques from real attack tools, enables fine-tuned small language models to classify log chunks at 90-97 percent accuracy and identify techniques at up to 42 percent exact match.

What carries the argument

The ATT&CK-labeled multi-source log dataset, which supplies the training examples needed for models to correlate events across system, network, and browser sources.

If this is right

Fine-tuned models can detect multi-stage attacks by learning patterns that span system, network, and browser logs simultaneously.
The dataset supports training for both broad chunk classification and granular ATT&CK technique identification.
Performance gains from LoRA fine-tuning hold across three different small language model architectures.
High partial-match scores indicate models capture underlying attack reasoning even when exact technique labels are missed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security teams could incorporate the dataset into monitoring pipelines to improve detection of attacks that involve browser activity.
The exact-match limit of 42 percent on technique identification points to a need for additional data or model refinements to reach production reliability.
Extending the dataset with more attack variants or automated labeling could scale its use for broader model training.

Load-bearing premise

The 70 author-generated attack sessions using real tools represent the distribution and labeling quality of actual multi-stage cyberattacks in the wild.

What would settle it

An independent collection of real-world multi-source logs from actual incidents, labeled by experts, on which models trained solely on this dataset show low accuracy in chunk classification or technique identification.

read the original abstract

Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets such as CICIDS and UNSW-NB15 miss host and browser activity. Host-focused datasets such as LMDG and CICAPT-IIoT lack browser telemetry. ATLAS includes all three sources but labels events only as malicious or benign, without MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) technique granularity. No public dataset combines all three sources with per-entry ATT&CK technique labels. We close the gap by building a multi-source log dataset of 870 sessions (70 attack, 800 benign) and approximately 2.3 million events. We captured system, network, and browser activity simultaneously on Windows endpoints. We labeled malicious events with ATT&CK technique IDs, covering 12 tactics and 53 techniques. We generated all attack data using real tools, including Remote Access Trojan (RAT), Command and Control (C2) tunnels, and cloud exfiltration. To demonstrate learnability, we fine-tuned three Small Language Models (SLMs) (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) using Low-Rank Adaptation (LoRA). We compared each against its base variant across ten metrics on two tasks: chunk classification and ATT&CK technique identification. Fine-tuning improved every model on every metric. Chunk classification accuracy rose from approximately 8% in the base variants to between 90% and 97% after fine-tuning. Technique identification remained challenging, with the best exact-match accuracy at 42%, although high partial-match scores show the models captured most of the underlying reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main value is the new multi-source ATT&CK-labeled dataset, but labeling validation and attack diversity details are missing.

read the letter

The paper's main value is the new multi-source ATT&CK-labeled dataset. No earlier public collection combines system, network, and browser logs with per-event technique labels from the 53 techniques they cover.

They assembled 870 sessions and 2.3 million events on Windows endpoints, using real tools such as RATs and C2 tunnels for the 70 attack cases. They then fine-tuned three SLMs with LoRA and reported clear gains: chunk classification accuracy moved from roughly 8% to 90-97%. Technique identification stayed harder, with the best exact-match score at 42%.

The work does well on the explicit gap analysis against CICIDS, UNSW-NB15, LMDG, CICAPT-IIoT, and ATLAS. Collecting three sources at once and labeling at technique granularity is a concrete step. The use of actual attack tooling rather than purely scripted events also helps.

The soft spots are the missing checks on label quality and representativeness. The abstract gives no inter-annotator agreement numbers, no external validation of the ATT&CK assignments, and no analysis of how the 70 sessions compare to real incident distributions. Without those, the fine-tuning results could be tied to the narrow generation method rather than general patterns.

This paper is for applied security ML researchers who need labeled cross-source data for detection models. Readers building or benchmarking SLM-based log correlation tools would get direct use from the numbers and the data if released.

It deserves a serious referee. The dataset claim is new and the evaluation numbers are specific, even though the methods need more on validation. I would recommend sending it for review rather than desk rejection.

Referee Report

3 major / 2 minor

Summary. The paper claims to address the lack of public multi-source cybersecurity log datasets with per-entry ATT&CK technique labels by constructing a dataset of 870 sessions (70 attack, 800 benign) comprising ~2.3 million events from system, network, and browser logs on Windows. Attacks are generated using real tools like RAT and C2, labeled with 12 tactics and 53 techniques. They fine-tune three SLMs with LoRA and show improvements in chunk classification (base ~8% to 90-97%) and technique identification (up to 42% exact match).

Significance. If the labels prove reliable, this dataset would fill a documented gap left by network-only (CICIDS, UNSW-NB15), host-only (LMDG), and coarsely labeled (ATLAS) resources, enabling cross-source correlation at technique granularity. The consistent gains across three SLMs after LoRA fine-tuning provide concrete evidence that the collected data supports supervised learning, which is a strength of the empirical component.

major comments (3)

[Data generation and labeling] Data generation and labeling section: No inter-annotator agreement, label-validation procedure, or external review of the ATT&CK assignments is described. Because the 70 attack sessions were both generated and labeled internally, the absence of these metrics directly affects the trustworthiness of the per-event technique labels that underpin both the dataset contribution and the fine-tuning results.
[Experimental evaluation] Experimental evaluation section: The manuscript provides no information on the train/test split of sessions or events, the treatment of class imbalance (800 benign vs. 70 attack), or coverage statistics across the 53 techniques. These details are load-bearing for interpreting whether the reported jumps (chunk accuracy 8 % o 90–97 %, exact-match 42 %) reflect genuine cross-source pattern learning rather than overfitting to the authors’ synthetic distribution.
[Results] Results section: The 42 % exact-match figure for technique identification is presented without an ablation on label granularity or a comparison against a non-LLM baseline; combined with the lack of diversity analysis versus real incident reports, this weakens the claim that the fine-tuned models have captured generalizable ATT&CK reasoning.

minor comments (2)

[Abstract] Abstract: the phrase 'high partial-match scores' should be accompanied by the precise definition or scoring rule used for partial matches so readers can assess what the models actually learned.
[Introduction] Introduction: verify that every cited dataset (CICIDS, UNSW-NB15, LMDG, CICAPT-IIoT, ATLAS) appears in the reference list with complete bibliographic details.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we will incorporate to improve the manuscript.

read point-by-point responses

Referee: [Data generation and labeling] Data generation and labeling section: No inter-annotator agreement, label-validation procedure, or external review of the ATT&CK assignments is described. Because the 70 attack sessions were both generated and labeled internally, the absence of these metrics directly affects the trustworthiness of the per-event technique labels that underpin both the dataset contribution and the fine-tuning results.

Authors: We acknowledge that the manuscript does not describe inter-annotator agreement, formal validation procedures, or external review. Labeling was performed internally by the authors by mapping observed tool behaviors (RAT, C2, exfiltration) to ATT&CK technique definitions from the official matrix. To address the concern, we will expand the Data generation and labeling section with a detailed description of the mapping process, including examples of how specific events were assigned technique IDs and any consistency checks performed among co-authors. We cannot retroactively add IAA metrics that were not collected, but the public release of the dataset will enable external validation. revision: partial
Referee: [Experimental evaluation] Experimental evaluation section: The manuscript provides no information on the train/test split of sessions or events, the treatment of class imbalance (800 benign vs. 70 attack), or coverage statistics across the 53 techniques. These details are load-bearing for interpreting whether the reported jumps (chunk accuracy 8 % to 90–97 %, exact-match 42 %) reflect genuine cross-source pattern learning rather than overfitting to the authors’ synthetic distribution.

Authors: The referee correctly identifies missing details. We will add a dedicated subsection to Experimental evaluation that specifies: the session-based train/validation/test split (preventing event leakage across sources), the method used to handle imbalance during LoRA fine-tuning (e.g., class-weighted loss), and per-technique coverage counts showing how many of the 53 techniques appear in the 70 attack sessions. These additions will allow readers to evaluate whether the accuracy gains reflect cross-source learning. revision: yes
Referee: [Results] Results section: The 42 % exact-match figure for technique identification is presented without an ablation on label granularity or a comparison against a non-LLM baseline; combined with the lack of diversity analysis versus real incident reports, this weakens the claim that the fine-tuned models have captured generalizable ATT&CK reasoning.

Authors: We agree that additional analyses would strengthen the results section. We will add (1) an ablation comparing exact-match performance at technique versus tactic granularity and (2) a non-LLM baseline (e.g., a feature-based classifier on aggregated log statistics). A quantitative diversity comparison against real incident reports is not feasible without equivalently labeled multi-source real-world data, which does not currently exist at this granularity; we will instead add a qualitative discussion mapping our generated attacks to techniques frequently cited in public threat reports. revision: partial

Circularity Check

0 steps flagged

Empirical dataset construction and SLM evaluation with no circular derivations

full rationale

The paper describes capturing multi-source logs, author labeling of 70 attack sessions with ATT&CK IDs, and empirical fine-tuning/evaluation of three SLMs using LoRA, reporting direct accuracy metrics. No equations, fitted parameters renamed as predictions, self-citations load-bearing on uniqueness or ansatzes, or derivations that reduce to author-defined inputs by construction appear in the text. The central claims rest on observable data collection and measured performance deltas, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Contribution is empirical dataset construction and standard fine-tuning; no new mathematical parameters, axioms beyond domain conventions, or invented entities are introduced.

axioms (1)

domain assumption MITRE ATT&CK provides a stable, externally maintained taxonomy suitable for per-event labeling of malicious activity.
Invoked when assigning technique IDs to the 70 attack sessions.

pith-pipeline@v0.9.1-grok · 5893 in / 1350 out tokens · 32304 ms · 2026-06-26T23:54:01.471332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references

[1]

Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,

E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,”Leading Issues in Information Warfare & Security Research, vol. 1, no. 1, p. 80, 2011

2011
[2]

MITRE ATT&CK: Design and philosophy,

B. E. Stromet al., “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020

2020
[3]

#StopRansomware: RansomHub Ran- somware,

CISA, FBI, MS-ISAC, and HHS, “#StopRansomware: RansomHub Ran- somware,” CISA, Tech. Rep. AA24-242A, 2024. [Online]. Available: https://www.cisa.gov/news-events/cybersecurity-advisories/aa24-242a

2024
[4]

2026 global threat report,

CrowdStrike, “2026 global threat report,” CrowdStrike, Tech. Rep., 2026. [Online]. Available: https://www.crowdstrike.com/en-us/ global-threat-report/

2026
[5]

A detailed analysis of the KDD CUP 99 data set,

M. Tavallaeeet al., “A detailed analysis of the KDD CUP 99 data set,” in Proc. IEEE Symp. Computational Intelligence for Security and Defense Applications (CISDA), 2009, pp. 1–6

2009
[6]

Toward generating a new intrusion detection dataset and intrusion traffic characterization,

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in Proc. 4th Int. Conf. Information Systems Security and Privacy (ICISSP), 2018, pp. 108–116

2018
[7]

UNSW-NB15: A comprehensive data set for network intrusion detection systems,

N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems,” inProc. Military Communications and Information Systems Conf. (MilCIS), 2015, pp. 1–6

2015
[8]

An empirical comparison of botnet detection methods,

S. Garcíaet al., “An empirical comparison of botnet detection methods,” Computers & Security, vol. 45, pp. 100–123, 2014

2014
[9]

DAPT 2020 – constructing a benchmark dataset for advanced persistent threats,

S. Myneniet al., “DAPT 2020 – constructing a benchmark dataset for advanced persistent threats,” inDeployable Machine Learning for Security Defense. Springer, 2020, pp. 138–163

2020
[10]

Unraveled – a semi-synthetic dataset for advanced persistent threats,

——, “Unraveled – a semi-synthetic dataset for advanced persistent threats,”Computer Networks, vol. 227, p. 109688, 2023

2023
[11]

CICAPT-IIoT: A provenance-based APT attack dataset for IIoT environment,

E. Ghiasvandet al., “CICAPT-IIoT: A provenance-based APT attack dataset for IIoT environment,”arXiv:2407.11278, 2024

arXiv 2024
[12]

LMDG: Advancing lateral movement detection through high-fidelity dataset generation,

A. Mabrouk, M. Hatem, M. Mamun, and S. Saad, “LMDG: Advancing lateral movement detection through high-fidelity dataset generation,” arXiv:2508.02942, 2025

arXiv 2025
[13]

ATLAS: A sequence-based learning approach for attack investigation,

A. Alsaheelet al., “ATLAS: A sequence-based learning approach for attack investigation,” inProc. 30th USENIX Security Symp., 2021, pp. 3005–3022. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity21/presentation/alsaheel

2021
[14]

ATLASv2: ATLAS attack engagements, version 2,

A. Riddle, K. Westfall, and A. Bates, “ATLASv2: ATLAS attack engagements, version 2,”arXiv:2401.01341, 2024

arXiv 2024
[15]

Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT&CK framework,

S. S. Baguiet al., “Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT&CK framework,” Data, vol. 8, no. 1, p. 18, 2023

2023
[16]

A comprehensive survey of small language models in the era of large language models,

F. Wanget al., “A comprehensive survey of small language models in the era of large language models,”arXiv:2411.03350, 2024

arXiv 2024
[17]

LoRA: Low-rank adaptation of large language models,

E. J. Huet al., “LoRA: Low-rank adaptation of large language models,” inInt. Conf. Learning Representations (ICLR), 2022

2022
[18]

Transparent computing engagement data release,

DARPA, “Transparent computing engagement data release,” https:// github.com/darpa-i2o/Transparent-Computing, 2020

2020
[19]

Analyzing the usefulness of the DARPA OpTC dataset in cyber threat detection research,

M. M. Anjum, S. Iqbal, and B. Hamelin, “Analyzing the usefulness of the DARPA OpTC dataset in cyber threat detection research,” inProc. 26th ACM Symp. Access Control Models and Technologies (SACMAT), 2021, pp. 27–32

2021
[20]

Unified host and network data set,

M. J. M. Turcotte, A. D. Kent, and C. Hash, “Unified host and network data set,” inData Science for Cyber-Security. World Scientific, 2018, pp. 1–22

2018
[21]

A survey of large language models for cyber threat detection,

Y . Chenet al., “A survey of large language models for cyber threat detection,”Computers & Security, vol. 145, p. 104016, 2024

2024
[22]

LogGPT: Log anomaly detection via GPT,

X. Han, S. Yuan, and M. Trabelsi, “LogGPT: Log anomaly detection via GPT,” inProc. IEEE Int. Conf. Big Data (BigData), 2023, pp. 1117– 1122

2023
[23]

LLMs cannot reliably identify and reason about security vulnerabilities (yet?),

S. Ullahet al., “LLMs cannot reliably identify and reason about security vulnerabilities (yet?),” inProc. IEEE Symp. Security and Privacy (SP), 2024, pp. 862–880

2024
[24]

SecVulEval: Benchmarking LLMs for real- world C/C++ vulnerability detection,

M. B. U. Ahmedet al., “SecVulEval: Benchmarking LLMs for real- world C/C++ vulnerability detection,”arXiv:2505.19828, 2025

arXiv 2025
[25]

QLoRA: Efficient finetuning of quantized LLMs,

T. Dettmerset al., “QLoRA: Efficient finetuning of quantized LLMs,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[26]

TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI sources,

G. Husariet al., “TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI sources,” inProc. 33rd Annual Computer Security Applications Conf. (ACSAC), 2017, pp. 103–115

2017
[27]

Automated retrieval of ATT&CK tactics and techniques for cyber threat reports,

V . Legoyet al., “Automated retrieval of ATT&CK tactics and techniques for cyber threat reports,”arXiv:2004.14322, 2020

arXiv 2004
[28]

AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports,

Z. Li, J. Zeng, Y . Chen, and Z. Liang, “AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports,” inComputer Security – ESORICS 2022, vol. 13554. Springer, 2022, pp. 589–609

2022
[29]

Looking beyond IoCs: Automatically extracting attack patterns from external CTI,

M. T. Alamet al., “Looking beyond IoCs: Automatically extracting attack patterns from external CTI,” inProc. 26th Int. Symp. Research in Attacks, Intrusions and Defenses (RAID), 2023

2023
[30]

TTPXHunter: Actionable threat intelligence extraction as TTPs from finished cyber threat reports,

N. Raniet al., “TTPXHunter: Actionable threat intelligence extraction as TTPs from finished cyber threat reports,”ACM Digital Threats: Research and Practice, 2024

2024

[1] [1]

Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,

E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,”Leading Issues in Information Warfare & Security Research, vol. 1, no. 1, p. 80, 2011

2011

[2] [2]

MITRE ATT&CK: Design and philosophy,

B. E. Stromet al., “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020

2020

[3] [3]

#StopRansomware: RansomHub Ran- somware,

CISA, FBI, MS-ISAC, and HHS, “#StopRansomware: RansomHub Ran- somware,” CISA, Tech. Rep. AA24-242A, 2024. [Online]. Available: https://www.cisa.gov/news-events/cybersecurity-advisories/aa24-242a

2024

[4] [4]

2026 global threat report,

CrowdStrike, “2026 global threat report,” CrowdStrike, Tech. Rep., 2026. [Online]. Available: https://www.crowdstrike.com/en-us/ global-threat-report/

2026

[5] [5]

A detailed analysis of the KDD CUP 99 data set,

M. Tavallaeeet al., “A detailed analysis of the KDD CUP 99 data set,” in Proc. IEEE Symp. Computational Intelligence for Security and Defense Applications (CISDA), 2009, pp. 1–6

2009

[6] [6]

Toward generating a new intrusion detection dataset and intrusion traffic characterization,

I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in Proc. 4th Int. Conf. Information Systems Security and Privacy (ICISSP), 2018, pp. 108–116

2018

[7] [7]

UNSW-NB15: A comprehensive data set for network intrusion detection systems,

N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems,” inProc. Military Communications and Information Systems Conf. (MilCIS), 2015, pp. 1–6

2015

[8] [8]

An empirical comparison of botnet detection methods,

S. Garcíaet al., “An empirical comparison of botnet detection methods,” Computers & Security, vol. 45, pp. 100–123, 2014

2014

[9] [9]

DAPT 2020 – constructing a benchmark dataset for advanced persistent threats,

S. Myneniet al., “DAPT 2020 – constructing a benchmark dataset for advanced persistent threats,” inDeployable Machine Learning for Security Defense. Springer, 2020, pp. 138–163

2020

[10] [10]

Unraveled – a semi-synthetic dataset for advanced persistent threats,

——, “Unraveled – a semi-synthetic dataset for advanced persistent threats,”Computer Networks, vol. 227, p. 109688, 2023

2023

[11] [11]

CICAPT-IIoT: A provenance-based APT attack dataset for IIoT environment,

E. Ghiasvandet al., “CICAPT-IIoT: A provenance-based APT attack dataset for IIoT environment,”arXiv:2407.11278, 2024

arXiv 2024

[12] [12]

LMDG: Advancing lateral movement detection through high-fidelity dataset generation,

A. Mabrouk, M. Hatem, M. Mamun, and S. Saad, “LMDG: Advancing lateral movement detection through high-fidelity dataset generation,” arXiv:2508.02942, 2025

arXiv 2025

[13] [13]

ATLAS: A sequence-based learning approach for attack investigation,

A. Alsaheelet al., “ATLAS: A sequence-based learning approach for attack investigation,” inProc. 30th USENIX Security Symp., 2021, pp. 3005–3022. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity21/presentation/alsaheel

2021

[14] [14]

ATLASv2: ATLAS attack engagements, version 2,

A. Riddle, K. Westfall, and A. Bates, “ATLASv2: ATLAS attack engagements, version 2,”arXiv:2401.01341, 2024

arXiv 2024

[15] [15]

Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT&CK framework,

S. S. Baguiet al., “Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT&CK framework,” Data, vol. 8, no. 1, p. 18, 2023

2023

[16] [16]

A comprehensive survey of small language models in the era of large language models,

F. Wanget al., “A comprehensive survey of small language models in the era of large language models,”arXiv:2411.03350, 2024

arXiv 2024

[17] [17]

LoRA: Low-rank adaptation of large language models,

E. J. Huet al., “LoRA: Low-rank adaptation of large language models,” inInt. Conf. Learning Representations (ICLR), 2022

2022

[18] [18]

Transparent computing engagement data release,

DARPA, “Transparent computing engagement data release,” https:// github.com/darpa-i2o/Transparent-Computing, 2020

2020

[19] [19]

Analyzing the usefulness of the DARPA OpTC dataset in cyber threat detection research,

M. M. Anjum, S. Iqbal, and B. Hamelin, “Analyzing the usefulness of the DARPA OpTC dataset in cyber threat detection research,” inProc. 26th ACM Symp. Access Control Models and Technologies (SACMAT), 2021, pp. 27–32

2021

[20] [20]

Unified host and network data set,

M. J. M. Turcotte, A. D. Kent, and C. Hash, “Unified host and network data set,” inData Science for Cyber-Security. World Scientific, 2018, pp. 1–22

2018

[21] [21]

A survey of large language models for cyber threat detection,

Y . Chenet al., “A survey of large language models for cyber threat detection,”Computers & Security, vol. 145, p. 104016, 2024

2024

[22] [22]

LogGPT: Log anomaly detection via GPT,

X. Han, S. Yuan, and M. Trabelsi, “LogGPT: Log anomaly detection via GPT,” inProc. IEEE Int. Conf. Big Data (BigData), 2023, pp. 1117– 1122

2023

[23] [23]

LLMs cannot reliably identify and reason about security vulnerabilities (yet?),

S. Ullahet al., “LLMs cannot reliably identify and reason about security vulnerabilities (yet?),” inProc. IEEE Symp. Security and Privacy (SP), 2024, pp. 862–880

2024

[24] [24]

SecVulEval: Benchmarking LLMs for real- world C/C++ vulnerability detection,

M. B. U. Ahmedet al., “SecVulEval: Benchmarking LLMs for real- world C/C++ vulnerability detection,”arXiv:2505.19828, 2025

arXiv 2025

[25] [25]

QLoRA: Efficient finetuning of quantized LLMs,

T. Dettmerset al., “QLoRA: Efficient finetuning of quantized LLMs,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[26] [26]

TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI sources,

G. Husariet al., “TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI sources,” inProc. 33rd Annual Computer Security Applications Conf. (ACSAC), 2017, pp. 103–115

2017

[27] [27]

Automated retrieval of ATT&CK tactics and techniques for cyber threat reports,

V . Legoyet al., “Automated retrieval of ATT&CK tactics and techniques for cyber threat reports,”arXiv:2004.14322, 2020

arXiv 2004

[28] [28]

AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports,

Z. Li, J. Zeng, Y . Chen, and Z. Liang, “AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports,” inComputer Security – ESORICS 2022, vol. 13554. Springer, 2022, pp. 589–609

2022

[29] [29]

Looking beyond IoCs: Automatically extracting attack patterns from external CTI,

M. T. Alamet al., “Looking beyond IoCs: Automatically extracting attack patterns from external CTI,” inProc. 26th Int. Symp. Research in Attacks, Intrusions and Defenses (RAID), 2023

2023

[30] [30]

TTPXHunter: Actionable threat intelligence extraction as TTPs from finished cyber threat reports,

N. Raniet al., “TTPXHunter: Actionable threat intelligence extraction as TTPs from finished cyber threat reports,”ACM Digital Threats: Research and Practice, 2024

2024