ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense
Pith reviewed 2026-06-26 13:58 UTC · model grok-4.3
The pith
Treating the boundary between private production telemetry and reusable research artifacts as the design object produces a measurable privacy-utility boundary.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating the boundary between private production telemetry and reusable research artifacts as the design object, the methodology produces a measurable privacy-utility boundary, demonstrated by the requirement that anonymization preserve temporal order and entity consistency for HIKARI challenges and by the deterministic verifier detecting non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents.
What carries the argument
The privacy boundary between private production telemetry and reusable research artifacts, which serves as the explicit design object for extraction, anonymization, structuring, and validation of SIEM data while preserving task-relevant investigative structure.
If this is right
- Anonymization must preserve temporal order and entity consistency for the artifacts to support MITRE ATT&CK-mapped HIKARI challenges.
- A deterministic verifier can detect LLM actions that deviate from observed human baselines across the 200 SOCpilot incidents.
- The same artifact can serve both as training material that fails loudly and as a measurement substrate that fails quietly.
- Research on autonomous cyber defense can use production-derived artifacts instead of synthetic or dated datasets once the privacy boundary is treated as the design object.
Where Pith is reading between the lines
- The same boundary-design approach could be adapted to create research artifacts from other domains that hold sensitive operational telemetry.
- The contrast between loud failure for training and quiet failure for measurement indicates that utility must be evaluated separately for each consumer type.
- Extending the verifier across a larger set of incidents would test whether the observed deviations generalize beyond the current sample.
Load-bearing premise
The assumption that the deterministic verifier correctly identifies actions as non-compliant and absent from the human baseline, and that the 200 SOCpilot incidents provide a representative sample for measuring transferability.
What would settle it
An observation that the verifier flags actions present in the human baseline or that HIKARI challenges succeed without preservation of temporal order and entity consistency.
read the original abstract
Operational evidence is not automatically scientific evidence. The most realistic Security Operations Center (SOC) data is production telemetry, yet it remains scientifically inaccessible because raw logs cannot be released; as a result, research relies on synthetic or dated datasets. We treat the boundary between private production telemetry and reusable research artifacts as the design object: a methodology that extracts, anonymizes, structures, and validates Security Information and Event Management (SIEM) data from a production financial SOC while preserving task-relevant investigative structure within a declared privacy boundary. Two consumers stress the same artifact. As training material, it fails loudly: 37 MITRE ATT&CK-mapped HIKARI challenges work only when anonymization preserves temporal order and entity consistency. As a measurement substrate, it fails quietly: across 200 SOCpilot incidents, a deterministic verifier detects non-compliant Large Language Model (LLM) actions that are absent from the human baseline. The result is a measurable privacy-utility boundary rather than a formal anonymity claim.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ARENA, an architecture that treats the boundary between private production SOC telemetry and reusable research artifacts as the design object. It describes a methodology to extract, anonymize, structure, and validate SIEM data from a production financial SOC while preserving task-relevant structure. The artifact is evaluated in two settings: as training material for 37 MITRE ATT&CK-mapped HIKARI challenges (which require preservation of temporal order and entity consistency) and as a measurement substrate for 200 SOCpilot incidents, where a deterministic verifier identifies non-compliant LLM actions absent from a human baseline, yielding a measurable privacy-utility boundary rather than a formal anonymity guarantee.
Significance. If the methodology, verifier, and baseline construction hold under scrutiny, the work would provide a practical route to making realistic production SOC data available for research on autonomous cyber defense, addressing the longstanding gap between inaccessible real telemetry and synthetic or outdated public datasets. The dual-use demonstration (training failures and measurement failures) offers a concrete, falsifiable illustration of privacy-utility trade-offs.
major comments (2)
- [Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.
- [Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. We address each major comment below, acknowledging omissions in the current manuscript and committing to revisions that directly strengthen the claims without misrepresenting the work.
read point-by-point responses
-
Referee: [Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.
Authors: The referee correctly identifies that these specifications are absent from the manuscript. In the revised version we will expand the measurement-substrate section to supply: the complete set of deterministic decision rules used by the verifier; the protocol for collecting and annotating the human baseline (including confirmation that the same 200 incidents were used and any inter-rater reliability statistics); the validation procedures applied to quantify false-positive and false-negative rates; and the explicit selection criteria applied to the 200 incidents. These additions will make the privacy-utility boundary result reproducible and address the load-bearing nature of the claim. revision: yes
-
Referee: [Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.
Authors: We agree that the absence of explicit independent validation leaves the claim open to a circularity concern. The verifier rules were constructed from pre-existing SOC operational compliance standards rather than from observed LLM behaviors; however, the manuscript does not currently document the independent validation steps taken. The revision will add a dedicated subsection that (a) traces each rule to its external SOC-standard source and (b) reports any validation performed (e.g., application to synthetic compliant/non-compliant cases or additional reviewer cross-checks). If further external validation data cannot be supplied without compromising the privacy boundary, we will state this limitation explicitly. revision: yes
Circularity Check
No significant circularity; empirical methodology stands on observed outcomes
full rationale
The paper describes a data-anonymization pipeline and its use as both training material and measurement substrate for comparing LLM vs. human SOC actions. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central result—that a deterministic verifier flags LLM actions absent from a human baseline across 200 incidents—is presented as an empirical observation rather than a derivation that reduces to its own inputs by construction. The absence of any load-bearing self-referential step keeps the derivation chain self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Production SOC telemetry contains task-relevant investigative structure that can be preserved under anonymization within a declared privacy boundary
invented entities (1)
-
ARENA architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cost of a data breach report 2024,
IBM Security, “Cost of a data breach report 2024,” https://www.ibm. com/reports/data-breach, 2024, accessed: 2026-06-12
2024
-
[2]
PocketAgents: A manifest-driven library of autonomous defense agents,
S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “PocketAgents: A manifest-driven library of autonomous defense agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.21694
Pith/arXiv arXiv 2026
-
[3]
AutoSUT: The environment semantics gap in structured CTI for adversary emulation,
——, “AutoSUT: The environment semantics gap in structured CTI for adversary emulation,” 2026. [Online]. Available: https: //arxiv.org/abs/2606.08700
Pith/arXiv arXiv 2026
-
[4]
Benchmarking large language models for cyber- security advisory,
N. Kaushiket al., “Benchmarking large language models for cyber- security advisory,”arXiv preprint arXiv:2405.20441, 2024, SECURE benchmark
arXiv 2024
-
[5]
Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),
The Apache Software Foundation, “Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),” https://caldera. apache.org/, 2026, accessed: 2026-06-17
2026
-
[6]
´A. L. R. Ferraz, S. Barbieri, M. E. de Souza, L. A. Pereira J ´unior, “The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.12078
Pith/arXiv arXiv 2026
-
[7]
SOCpilot: Verifying policy compliance for LLM-assisted incident response,
S. Barbieri, L. V . d. Meneses, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” 2026. [Online]. Available: https://arxiv.org/abs/2605.05501
Pith/arXiv arXiv 2026
-
[8]
A framework for formalizing llm agent security,
V . Siu, J. He, K. Montgomery, Z. Wang, N. Gong, C. Wang, D. Song, “A framework for formalizing llm agent security,”arXiv preprint arXiv:2603.19469, 2026
arXiv 2026
-
[9]
A critical evaluation of defenses against prompt injection attacks,
Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025
arXiv 2025
-
[10]
Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,
M. Polese, L. Bonati, S. D’Oro, S. Basagni, T. Melodia, “Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,” 2022. [Online]. Available: https://arxiv.org/abs/2202.01032
arXiv 2022
-
[11]
ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,
G. d. S. Machado, G. Z. Bruno, A. Huff, J. M. C. Brito, C. B. Both, “ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,” 2026. [Online]. Available: https://arxiv.org/abs/2603.03667
arXiv 2026
-
[12]
AutoRAN: Automated and zero-touch Open RAN systems,
S. Maxenti, R. Shirkhani, M. Elkael, L. Bonati, S. D’Oro, T. Melodia, M. Polese, “AutoRAN: Automated and zero-touch Open RAN systems,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11233
arXiv 2025
-
[13]
When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,
W. C. Sonaglio, ´A. L. R. Ferraz, A. E. Melo, M. E. de Souza, G. Noubir, L. A. Pereira J ´unior, “When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,” 2026, arXiv:2603.04662
Pith/arXiv arXiv 2026
-
[14]
A systematic security testing approach for InterUSS-based environments,
H. Curi de Miranda, ´A. L. R. Ferraz, W. C. Sonaglio, L. A. Pe- reira J´unior, “A systematic security testing approach for InterUSS-based environments,” 2026, arXiv:2605.11339
Pith/arXiv arXiv 2026
-
[15]
Claude models overview,
Anthropic, “Claude models overview,” https://docs.anthropic.com/en/ docs/about-claude/models/overview, 2026, accessed: 2026-06-18
2026
-
[16]
FlexRIC tutorial: xApp development,
OpenAirInterface Alliance, “FlexRIC tutorial: xApp development,” https://openairinterface.org/flexric-tutorial-xapp-development/, 2026, accessed: 2026-06-18
2026
-
[17]
TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,
S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,” 2026. [Online]. Available: https://arxiv.org/abs/2606.18320
Pith/arXiv arXiv 2026
-
[18]
CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,
Microsoft, “CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,” https://github. com/microsoft/CyberBattleSim, 2021, accessed: 2026-06-12
2021
-
[19]
Automated repeatable adversary threat emulation with effects language (EL),
Suresh K. Damodaran and Paul D. Rowe, “Automated repeatable adversary threat emulation with effects language (EL),”Digital Threats: Research and Practice, 2026. [Online]. Available: https: //doi.org/10.1145/3816043
-
[20]
The science of cyber security experimentation: The DETER project,
T. Benzel, “The science of cyber security experimentation: The DETER project,” inAnnual Computer Security Applications Conf. (ACSAC), 2011
2011
-
[21]
An integrated experimental environment for distributed systems and networks,
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. New- bold, M. Hibler, C. Barb, A. Joglekar, “An integrated experimental environment for distributed systems and networks,” inUSENIX Symp. on Operating Systems Design and Implementation (OSDI), 2002
2002
-
[22]
ATT&CK evaluations,
MITRE Engenuity, “ATT&CK evaluations,” https://attackevals. mitre-engenuity.org/, 2026, accessed: 2026-06-18
2026
-
[23]
Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,
A. Chona, I. Kozlov, A. Kumar, “Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,” arXiv:2604.19533, 2026
Pith/arXiv arXiv 2026
-
[24]
Piarena: A platform for prompt injection evaluation,
R. Geng, C. Yin, Y . Wang, Y . Chen, J. Jia, “Piarena: A platform for prompt injection evaluation,”arXiv preprint arXiv:2604.08499, 2026
Pith/arXiv arXiv 2026
-
[25]
Safety at scale: a comprehensive survey of large model and agent safety,
X. Ma, Y . Gao, Y . Wang, R. Wang, X. Wang, Y . Sun, Y . Ding, H. Xu, Y . Chen, Y . Zhao, H. Huang, Y . Li, Y . Wu, J. Zhang, X. Zheng, Y . Bai, Y . Li, Z. Wu, X. Qiu, J. Zhang, X. Han, H. Li, J. Sun, C. Wang, J. Gu, B. Wu, S. Chen, T. Zhang, Y . Liu, M. Gong, T. Liu, S. Pan, C. Xie, T. Pang, Y . Dong, R. Jia, Y . Zhang, S. Ma, X. Zhang, N. Gong, C. Xiao,...
2025
-
[26]
On the trustworthiness of generative foundation models: Guideline, assessment, and perspective,
Y . Huang, C. Gao, S. Wu, H. Wang, X. Wang, Y . Zhou, Y . Wang, J. Ye, J. Shi, Q. Zhang, Y . Li, H. Bao, Z. Liu, T. Guan, D. Chen, R. Chen, K. Guo, A. Zou, B. H. Kuen-Yew, C. Xiong, E. Stengel-Eskin, H. Zhang, H. Yin, H. Zhang, H. Yao, J. Yoon, J. Zhang, K. Shu, K. Zhu, R. Krishna, S. Swayamdipta, T. Shi, W. Shi, X. Li, Y . Li, Y . Hao, Z. Jia, Z. Li, X. ...
Pith/arXiv arXiv 2025
-
[27]
Sok: On the offensive potential of ai,
S. L. Schr ¨oer, G. Apruzzese, S. Human, P. Laskov, H. S. Anderson, E. W. N. Bernroider, A. Fass, B. Nassi, V . Rimmer, F. Roli, S. Salam, C. E. A. Shen, A. Sunyaev, T. Wadhwa-Brown, I. Wagner, G. Wang, “Sok: On the offensive potential of ai,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025
2025
-
[28]
Safeagent: Safeguarding llm agents via an automated risk simulator,
X. Zhou, W. Wang, L. Lu, J. Shi, G. Tie, Y . Xu, L. Chen, P. Zhou, N. Z. Gong, L. Sun, “Safeagent: Safeguarding llm agents via an automated risk simulator,”arXiv preprint arXiv:2505.17735, 2025
arXiv 2025
-
[29]
Promptlocate: Localizing prompt injection attacks,
Y . Jia, Y . Liu, Z. Shao, J. Jia, N. Gong, “Promptlocate: Localizing prompt injection attacks,”arXiv preprint arXiv:2510.12252, 2025
arXiv 2025
-
[30]
Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,
R. Wang, Y . Jia, N. Z. Gong, “Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,”arXiv preprint arXiv:2512.09321, 2025
arXiv 2025
-
[31]
Websentinel: Detecting and localizing prompt injection attacks for web agents,
X. Wang, Y . Liu, Z. Wang, D. Song, N. Gong, “Websentinel: Detecting and localizing prompt injection attacks for web agents,”arXiv preprint arXiv:2602.03792, 2026
arXiv 2026
-
[32]
Prompt injection attack to tool selection in llm agents,
J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, “Prompt injection attack to tool selection in llm agents,”arXiv preprint arXiv:2504.19793, 2025
Pith/arXiv arXiv 2025
-
[33]
Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,
R. Geng, Y . Wang, C. Yin, M. Cheng, Y . Chen, J. Jia, “Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,” arXiv preprint arXiv:2511.10720, 2025
arXiv 2025
-
[34]
Jailbreaking safeguarded text-to-image models via large language models,
Z. Jiang, Y . Hu, Y . Yang, Y . Cao, N. Z. Gong, “Jailbreaking safeguarded text-to-image models via large language models,” inFindings of the Association for Computational Linguistics: EACL, 2026
2026
-
[35]
Jailbreaking black box large language models in twenty queries,
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong, “Jailbreaking black box large language models in twenty queries,” in 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025
2025
-
[36]
W. Zou, R. Geng, B. Wang, J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” USENIX Security Symposium, 2025, arXiv:2402.07867
arXiv 2025
-
[37]
Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,
R. Geng, Y . Wang, Y . Chen, J. Jia, “Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,”arXiv preprint arXiv:2508.18652, 2025
arXiv 2025
-
[38]
J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025
arXiv 2025
-
[39]
Cleanbase: Detecting malicious documents in rag knowledge database,
W. Jin, X. Wang, W. Zou, J. Jia, N. Gong, “Cleanbase: Detecting malicious documents in rag knowledge database,”arXiv preprint ar- Xiv:2605.00460, 2026
Pith/arXiv arXiv 2026
-
[40]
From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,
D. Feng, W. Cui, Y . Jiang, W. Yu, D. Li, “From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,” inIEEE Access, 2026
2026
-
[41]
Maltool: Malicious tool attacks on llm agents,
Y . Hu, Y . Jia, M. Li, D. Song, N. Gong, “Maltool: Malicious tool attacks on llm agents,”arXiv preprint arXiv:2602.12194, 2026
Pith/arXiv arXiv 2026
-
[42]
Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,
H. Ye, Z. Zhang, J. Jia, H. Hu, “Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,”arXiv preprint arXiv:2604.07536, 2026
Pith/arXiv arXiv 2026
-
[43]
A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,
Anonymous, “A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,” OpenReview preprint, 2025
2025
-
[44]
Se- cure retrieval-augmented generation against poisoning attacks,
Z. Cheng, J. Sun, A. Gao, Y . Quan, Z. Liu, X. Hu, M. Fang, “Se- cure retrieval-augmented generation against poisoning attacks,”arXiv preprint arXiv:2510.25025, 2025
arXiv 2025
-
[45]
Traceback of poisoning attacks to retrieval-augmented generation,
B. Zhang, H. Xin, M. Fang, Z. Liu, B. Yi, T. Li, Z. Liu, “Traceback of poisoning attacks to retrieval-augmented generation,” inProceedings of the ACM on Web Conference 2025, 2025
2025
-
[46]
De- fending against prompt injection with datafilter,
Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, “De- fending against prompt injection with datafilter,”arXiv preprint ar- Xiv:2510.19207, 2025
arXiv 2025
-
[47]
Preventing prompt injection with type-directed privilege separation,
D. Jacob, E. Alghamdi, Z. Hu, B. Alomair, D. Wagner, “Preventing prompt injection with type-directed privilege separation,”arXiv preprint arXiv:2509.25926, 2025
Pith/arXiv arXiv 2025
-
[48]
AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,
H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,”arXiv preprint ar- Xiv:2503.18666, 2025
Pith/arXiv arXiv 2025
-
[49]
Ml-based behavioral malware detection is far from a solved problem,
Y . Kaya, Y . Chen, M. Botacin, S. Saha, F. Pierazzi, L. Cavallaro, D. Wagner, T. Dumitras ¸, “Ml-based behavioral malware detection is far from a solved problem,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025
2025
-
[50]
The long-horizon task mirage? diagnosing where and why agentic systems break,
X. J. Wang, H. Bai, Y . Sun, H. Wang, S. Zhang, W. Hu, M. Schroder, B. Mutlu, D. Song, R. D. Nowak, “The long-horizon task mirage? diagnosing where and why agentic systems break,”arXiv preprint arXiv:2604.11978, 2026
Pith/arXiv arXiv 2026
-
[51]
Get my drift? catching llm task drift with activation deltas,
S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, A. Paverd, “Get my drift? catching llm task drift with activation deltas,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025
2025
-
[52]
Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,
J. Piet, X. Huang, D. Jacob, A. Chow, M. Alrashed, G. Zhao, Z. Hu, C. Sitawarin, B. Alomair, D. Wagner, “Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,” inProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, 2025
2025
-
[53]
“real attackers don’t compute gradients
G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, K. Roundy, ““real attackers don’t compute gradients”: Bridging the gap between adversarial ml research and practice,” in2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2023
2023
-
[54]
Uncovering vulnerabilities of llm-assisted cyber threat intelligence,
Y . Meng, L. Tang, F. Yu, J. Jia, G. Yan, P. Yang, Z. Xi, “Uncovering vulnerabilities of llm-assisted cyber threat intelligence,”arXiv preprint arXiv:2509.23573, 2025
Pith/arXiv arXiv 2025
-
[55]
Trident: Improving malware detection with llms and behavioral features,
R. Saul, J. Jiang, E. Chia, D. Wagner, “Trident: Improving malware detection with llms and behavioral features,”arXiv preprint ar- Xiv:2605.00297, 2026
Pith/arXiv arXiv 2026
-
[56]
Seedaichemy: Llm-driven seed corpus generation for fuzzing,
A. Wen, N. A. Alzahrani, J. Jiang, A. Joe, K. Shieh, A. Zhang, B. Alo- mair, D. Wagner, “Seedaichemy: Llm-driven seed corpus generation for fuzzing,”arXiv preprint arXiv:2511.12448, 2025
arXiv 2025
-
[57]
Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,
L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, X. Shen, “Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,”arXiv preprint arXiv:2502.20421, 2025
arXiv 2025
-
[58]
Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,
P. Sharma, H. Wen, V . Yegneswaran, A. Gehani, P. Porras, Z. Lin, “Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,”arXiv preprint arXiv:2509.21634, 2025
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.