ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

\'Agney Lopes Roth Ferraz; Gioliano de Oliveira Braga; Henrique Curi de Miranda; Louren\c{c}o Alves Pereira J\'unior; Sidnei Barbieri; Wagner Comin Sonaglio

arxiv: 2606.21377 · v1 · pith:DOGFLU3Ynew · submitted 2026-06-19 · 💻 cs.CR

ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

Sidnei Barbieri , \'Agney Lopes Roth Ferraz , Wagner Comin Sonaglio , Gioliano de Oliveira Braga , Henrique Curi de Miranda , Louren\c{c}o Alves Pereira J\'unior This is my paper

Pith reviewed 2026-06-26 13:58 UTC · model grok-4.3

classification 💻 cs.CR

keywords privacy-utility boundarySIEM data anonymizationautonomous cyber defensetransferability measurementproduction telemetrySOCpilot incidentsHIKARI challengesLLM action verification

0 comments

The pith

Treating the boundary between private production telemetry and reusable research artifacts as the design object produces a measurable privacy-utility boundary.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a methodology for extracting, anonymizing, structuring, and validating SIEM data from a production financial SOC to create reusable research artifacts. This addresses the problem that realistic operational evidence remains inaccessible for scientific study because raw logs cannot be released. The methodology is stressed in two ways: as training material it requires anonymization to preserve temporal order and entity consistency for 37 MITRE ATT&CK-mapped HIKARI challenges, and as a measurement substrate a deterministic verifier identifies non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents. A sympathetic reader would care because the result is a concrete, testable privacy-utility boundary rather than an abstract anonymity claim.

Core claim

By treating the boundary between private production telemetry and reusable research artifacts as the design object, the methodology produces a measurable privacy-utility boundary, demonstrated by the requirement that anonymization preserve temporal order and entity consistency for HIKARI challenges and by the deterministic verifier detecting non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents.

What carries the argument

The privacy boundary between private production telemetry and reusable research artifacts, which serves as the explicit design object for extraction, anonymization, structuring, and validation of SIEM data while preserving task-relevant investigative structure.

If this is right

Anonymization must preserve temporal order and entity consistency for the artifacts to support MITRE ATT&CK-mapped HIKARI challenges.
A deterministic verifier can detect LLM actions that deviate from observed human baselines across the 200 SOCpilot incidents.
The same artifact can serve both as training material that fails loudly and as a measurement substrate that fails quietly.
Research on autonomous cyber defense can use production-derived artifacts instead of synthetic or dated datasets once the privacy boundary is treated as the design object.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same boundary-design approach could be adapted to create research artifacts from other domains that hold sensitive operational telemetry.
The contrast between loud failure for training and quiet failure for measurement indicates that utility must be evaluated separately for each consumer type.
Extending the verifier across a larger set of incidents would test whether the observed deviations generalize beyond the current sample.

Load-bearing premise

The assumption that the deterministic verifier correctly identifies actions as non-compliant and absent from the human baseline, and that the 200 SOCpilot incidents provide a representative sample for measuring transferability.

What would settle it

An observation that the verifier flags actions present in the human baseline or that HIKARI challenges succeed without preservation of temporal order and entity consistency.

read the original abstract

Operational evidence is not automatically scientific evidence. The most realistic Security Operations Center (SOC) data is production telemetry, yet it remains scientifically inaccessible because raw logs cannot be released; as a result, research relies on synthetic or dated datasets. We treat the boundary between private production telemetry and reusable research artifacts as the design object: a methodology that extracts, anonymizes, structures, and validates Security Information and Event Management (SIEM) data from a production financial SOC while preserving task-relevant investigative structure within a declared privacy boundary. Two consumers stress the same artifact. As training material, it fails loudly: 37 MITRE ATT&CK-mapped HIKARI challenges work only when anonymization preserves temporal order and entity consistency. As a measurement substrate, it fails quietly: across 200 SOCpilot incidents, a deterministic verifier detects non-compliant Large Language Model (LLM) actions that are absent from the human baseline. The result is a measurable privacy-utility boundary rather than a formal anonymity claim.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable way to anonymize real SOC production data around task needs rather than blanket privacy claims, with two concrete failure-mode tests, but the deterministic verifier and human baseline lack any described validation.

read the letter

The main takeaway is a methodology that pulls SIEM logs from a live financial SOC, anonymizes them while trying to keep investigative structure, and then checks whether the result still works for two different jobs: training on HIKARI challenges and measuring LLM behavior on SOCpilot incidents.

What stands out is the explicit focus on the privacy-utility boundary as the thing being engineered. They report that 37 MITRE-mapped HIKARI tasks only succeed when anonymization keeps temporal order and entity consistency. On the measurement side, across 200 incidents the deterministic verifier flags LLM actions that do not appear in the human baseline. Those are specific, observable outcomes rather than abstract claims.

The soft spot is exactly where the stress-test note points: nothing is said about how the verifier decides an action is non-compliant, how the human baseline was built or checked for reliability, or why those particular 200 incidents were picked. If the verifier rules line up with the kinds of mistakes LLMs make, the "absent from baseline" finding becomes hard to interpret as evidence of transferability. The abstract supplies no independent check against that risk.

This is aimed at researchers who build or evaluate autonomous cyber-defense systems and need realistic but shareable data. A reader working on LLM agents for SOC work or on benchmark construction would find the framing useful even if the current write-up is thin on mechanics.

It deserves peer review because the underlying problem is real and the two-use-case structure is a reasonable way to test the artifact. The full paper would need to add the missing verifier and baseline details before the measurement claim can be taken as solid.

Referee Report

2 major / 0 minor

Summary. The paper presents ARENA, an architecture that treats the boundary between private production SOC telemetry and reusable research artifacts as the design object. It describes a methodology to extract, anonymize, structure, and validate SIEM data from a production financial SOC while preserving task-relevant structure. The artifact is evaluated in two settings: as training material for 37 MITRE ATT&CK-mapped HIKARI challenges (which require preservation of temporal order and entity consistency) and as a measurement substrate for 200 SOCpilot incidents, where a deterministic verifier identifies non-compliant LLM actions absent from a human baseline, yielding a measurable privacy-utility boundary rather than a formal anonymity guarantee.

Significance. If the methodology, verifier, and baseline construction hold under scrutiny, the work would provide a practical route to making realistic production SOC data available for research on autonomous cyber defense, addressing the longstanding gap between inaccessible real telemetry and synthetic or outdated public datasets. The dual-use demonstration (training failures and measurement failures) offers a concrete, falsifiable illustration of privacy-utility trade-offs.

major comments (2)

[Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.
[Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each major comment below, acknowledging omissions in the current manuscript and committing to revisions that directly strengthen the claims without misrepresenting the work.

read point-by-point responses

Referee: [Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.

Authors: The referee correctly identifies that these specifications are absent from the manuscript. In the revised version we will expand the measurement-substrate section to supply: the complete set of deterministic decision rules used by the verifier; the protocol for collecting and annotating the human baseline (including confirmation that the same 200 incidents were used and any inter-rater reliability statistics); the validation procedures applied to quantify false-positive and false-negative rates; and the explicit selection criteria applied to the 200 incidents. These additions will make the privacy-utility boundary result reproducible and address the load-bearing nature of the claim. revision: yes
Referee: [Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.

Authors: We agree that the absence of explicit independent validation leaves the claim open to a circularity concern. The verifier rules were constructed from pre-existing SOC operational compliance standards rather than from observed LLM behaviors; however, the manuscript does not currently document the independent validation steps taken. The revision will add a dedicated subsection that (a) traces each rule to its external SOC-standard source and (b) reports any validation performed (e.g., application to synthetic compliant/non-compliant cases or additional reviewer cross-checks). If further external validation data cannot be supplied without compromising the privacy boundary, we will state this limitation explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical methodology stands on observed outcomes

full rationale

The paper describes a data-anonymization pipeline and its use as both training material and measurement substrate for comparing LLM vs. human SOC actions. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central result—that a deterministic verifier flags LLM actions absent from a human baseline across 200 incidents—is presented as an empirical observation rather than a derivation that reduces to its own inputs by construction. The absence of any load-bearing self-referential step keeps the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that production SOC telemetry contains preservable task-relevant structure and on the invented entity of the ARENA architecture itself; no free parameters are stated.

axioms (1)

domain assumption Production SOC telemetry contains task-relevant investigative structure that can be preserved under anonymization within a declared privacy boundary
Invoked as the design object of the methodology in the abstract

invented entities (1)

ARENA architecture no independent evidence
purpose: Extracting, anonymizing, structuring, and validating SIEM data to measure transferability of autonomous cyber defense
New proposed system introduced in the title and abstract

pith-pipeline@v0.9.1-grok · 5733 in / 1439 out tokens · 39409 ms · 2026-06-26T13:58:10.296340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 1 canonical work pages

[1]

Cost of a data breach report 2024,

IBM Security, “Cost of a data breach report 2024,” https://www.ibm. com/reports/data-breach, 2024, accessed: 2026-06-12

2024
[2]

PocketAgents: A manifest-driven library of autonomous defense agents,

S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “PocketAgents: A manifest-driven library of autonomous defense agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.21694

Pith/arXiv arXiv 2026
[3]

AutoSUT: The environment semantics gap in structured CTI for adversary emulation,

——, “AutoSUT: The environment semantics gap in structured CTI for adversary emulation,” 2026. [Online]. Available: https: //arxiv.org/abs/2606.08700

Pith/arXiv arXiv 2026
[4]

Benchmarking large language models for cyber- security advisory,

N. Kaushiket al., “Benchmarking large language models for cyber- security advisory,”arXiv preprint arXiv:2405.20441, 2024, SECURE benchmark

arXiv 2024
[5]

Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),

The Apache Software Foundation, “Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),” https://caldera. apache.org/, 2026, accessed: 2026-06-17

2026
[6]

The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,

´A. L. R. Ferraz, S. Barbieri, M. E. de Souza, L. A. Pereira J ´unior, “The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.12078

Pith/arXiv arXiv 2026
[7]

SOCpilot: Verifying policy compliance for LLM-assisted incident response,

S. Barbieri, L. V . d. Meneses, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” 2026. [Online]. Available: https://arxiv.org/abs/2605.05501

Pith/arXiv arXiv 2026
[8]

A framework for formalizing llm agent security,

V . Siu, J. He, K. Montgomery, Z. Wang, N. Gong, C. Wang, D. Song, “A framework for formalizing llm agent security,”arXiv preprint arXiv:2603.19469, 2026

arXiv 2026
[9]

A critical evaluation of defenses against prompt injection attacks,

Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

arXiv 2025
[10]

Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,

M. Polese, L. Bonati, S. D’Oro, S. Basagni, T. Melodia, “Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,” 2022. [Online]. Available: https://arxiv.org/abs/2202.01032

arXiv 2022
[11]

ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,

G. d. S. Machado, G. Z. Bruno, A. Huff, J. M. C. Brito, C. B. Both, “ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,” 2026. [Online]. Available: https://arxiv.org/abs/2603.03667

arXiv 2026
[12]

AutoRAN: Automated and zero-touch Open RAN systems,

S. Maxenti, R. Shirkhani, M. Elkael, L. Bonati, S. D’Oro, T. Melodia, M. Polese, “AutoRAN: Automated and zero-touch Open RAN systems,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11233

arXiv 2025
[13]

When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,

W. C. Sonaglio, ´A. L. R. Ferraz, A. E. Melo, M. E. de Souza, G. Noubir, L. A. Pereira J ´unior, “When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,” 2026, arXiv:2603.04662

Pith/arXiv arXiv 2026
[14]

A systematic security testing approach for InterUSS-based environments,

H. Curi de Miranda, ´A. L. R. Ferraz, W. C. Sonaglio, L. A. Pe- reira J´unior, “A systematic security testing approach for InterUSS-based environments,” 2026, arXiv:2605.11339

Pith/arXiv arXiv 2026
[15]

Claude models overview,

Anthropic, “Claude models overview,” https://docs.anthropic.com/en/ docs/about-claude/models/overview, 2026, accessed: 2026-06-18

2026
[16]

FlexRIC tutorial: xApp development,

OpenAirInterface Alliance, “FlexRIC tutorial: xApp development,” https://openairinterface.org/flexric-tutorial-xapp-development/, 2026, accessed: 2026-06-18

2026
[17]

TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,

S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,” 2026. [Online]. Available: https://arxiv.org/abs/2606.18320

Pith/arXiv arXiv 2026
[18]

CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,

Microsoft, “CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,” https://github. com/microsoft/CyberBattleSim, 2021, accessed: 2026-06-12

2021
[19]

Automated repeatable adversary threat emulation with effects language (EL),

Suresh K. Damodaran and Paul D. Rowe, “Automated repeatable adversary threat emulation with effects language (EL),”Digital Threats: Research and Practice, 2026. [Online]. Available: https: //doi.org/10.1145/3816043

work page doi:10.1145/3816043 2026
[20]

The science of cyber security experimentation: The DETER project,

T. Benzel, “The science of cyber security experimentation: The DETER project,” inAnnual Computer Security Applications Conf. (ACSAC), 2011

2011
[21]

An integrated experimental environment for distributed systems and networks,

B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. New- bold, M. Hibler, C. Barb, A. Joglekar, “An integrated experimental environment for distributed systems and networks,” inUSENIX Symp. on Operating Systems Design and Implementation (OSDI), 2002

2002
[22]

ATT&CK evaluations,

MITRE Engenuity, “ATT&CK evaluations,” https://attackevals. mitre-engenuity.org/, 2026, accessed: 2026-06-18

2026
[23]

Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,

A. Chona, I. Kozlov, A. Kumar, “Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,” arXiv:2604.19533, 2026

Pith/arXiv arXiv 2026
[24]

Piarena: A platform for prompt injection evaluation,

R. Geng, C. Yin, Y . Wang, Y . Chen, J. Jia, “Piarena: A platform for prompt injection evaluation,”arXiv preprint arXiv:2604.08499, 2026

Pith/arXiv arXiv 2026
[25]

Safety at scale: a comprehensive survey of large model and agent safety,

X. Ma, Y . Gao, Y . Wang, R. Wang, X. Wang, Y . Sun, Y . Ding, H. Xu, Y . Chen, Y . Zhao, H. Huang, Y . Li, Y . Wu, J. Zhang, X. Zheng, Y . Bai, Y . Li, Z. Wu, X. Qiu, J. Zhang, X. Han, H. Li, J. Sun, C. Wang, J. Gu, B. Wu, S. Chen, T. Zhang, Y . Liu, M. Gong, T. Liu, S. Pan, C. Xie, T. Pang, Y . Dong, R. Jia, Y . Zhang, S. Ma, X. Zhang, N. Gong, C. Xiao,...

2025
[26]

On the trustworthiness of generative foundation models: Guideline, assessment, and perspective,

Y . Huang, C. Gao, S. Wu, H. Wang, X. Wang, Y . Zhou, Y . Wang, J. Ye, J. Shi, Q. Zhang, Y . Li, H. Bao, Z. Liu, T. Guan, D. Chen, R. Chen, K. Guo, A. Zou, B. H. Kuen-Yew, C. Xiong, E. Stengel-Eskin, H. Zhang, H. Yin, H. Zhang, H. Yao, J. Yoon, J. Zhang, K. Shu, K. Zhu, R. Krishna, S. Swayamdipta, T. Shi, W. Shi, X. Li, Y . Li, Y . Hao, Z. Jia, Z. Li, X. ...

Pith/arXiv arXiv 2025
[27]

Sok: On the offensive potential of ai,

S. L. Schr ¨oer, G. Apruzzese, S. Human, P. Laskov, H. S. Anderson, E. W. N. Bernroider, A. Fass, B. Nassi, V . Rimmer, F. Roli, S. Salam, C. E. A. Shen, A. Sunyaev, T. Wadhwa-Brown, I. Wagner, G. Wang, “Sok: On the offensive potential of ai,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025
[28]

Safeagent: Safeguarding llm agents via an automated risk simulator,

X. Zhou, W. Wang, L. Lu, J. Shi, G. Tie, Y . Xu, L. Chen, P. Zhou, N. Z. Gong, L. Sun, “Safeagent: Safeguarding llm agents via an automated risk simulator,”arXiv preprint arXiv:2505.17735, 2025

arXiv 2025
[29]

Promptlocate: Localizing prompt injection attacks,

Y . Jia, Y . Liu, Z. Shao, J. Jia, N. Gong, “Promptlocate: Localizing prompt injection attacks,”arXiv preprint arXiv:2510.12252, 2025

arXiv 2025
[30]

Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,

R. Wang, Y . Jia, N. Z. Gong, “Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,”arXiv preprint arXiv:2512.09321, 2025

arXiv 2025
[31]

Websentinel: Detecting and localizing prompt injection attacks for web agents,

X. Wang, Y . Liu, Z. Wang, D. Song, N. Gong, “Websentinel: Detecting and localizing prompt injection attacks for web agents,”arXiv preprint arXiv:2602.03792, 2026

arXiv 2026
[32]

Prompt injection attack to tool selection in llm agents,

J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, “Prompt injection attack to tool selection in llm agents,”arXiv preprint arXiv:2504.19793, 2025

Pith/arXiv arXiv 2025
[33]

Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,

R. Geng, Y . Wang, C. Yin, M. Cheng, Y . Chen, J. Jia, “Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,” arXiv preprint arXiv:2511.10720, 2025

arXiv 2025
[34]

Jailbreaking safeguarded text-to-image models via large language models,

Z. Jiang, Y . Hu, Y . Yang, Y . Cao, N. Z. Gong, “Jailbreaking safeguarded text-to-image models via large language models,” inFindings of the Association for Computational Linguistics: EACL, 2026

2026
[35]

Jailbreaking black box large language models in twenty queries,

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong, “Jailbreaking black box large language models in twenty queries,” in 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025
[36]

Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” USENIX Security Symposium, 2025, arXiv:2402.07867

arXiv 2025
[37]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,

R. Geng, Y . Wang, Y . Chen, J. Jia, “Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,”arXiv preprint arXiv:2508.18652, 2025

arXiv 2025
[38]

Graphrag under fire,

J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025

arXiv 2025
[39]

Cleanbase: Detecting malicious documents in rag knowledge database,

W. Jin, X. Wang, W. Zou, J. Jia, N. Gong, “Cleanbase: Detecting malicious documents in rag knowledge database,”arXiv preprint ar- Xiv:2605.00460, 2026

Pith/arXiv arXiv 2026
[40]

From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,

D. Feng, W. Cui, Y . Jiang, W. Yu, D. Li, “From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,” inIEEE Access, 2026

2026
[41]

Maltool: Malicious tool attacks on llm agents,

Y . Hu, Y . Jia, M. Li, D. Song, N. Gong, “Maltool: Malicious tool attacks on llm agents,”arXiv preprint arXiv:2602.12194, 2026

Pith/arXiv arXiv 2026
[42]

Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,

H. Ye, Z. Zhang, J. Jia, H. Hu, “Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,”arXiv preprint arXiv:2604.07536, 2026

Pith/arXiv arXiv 2026
[43]

A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,

Anonymous, “A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,” OpenReview preprint, 2025

2025
[44]

Se- cure retrieval-augmented generation against poisoning attacks,

Z. Cheng, J. Sun, A. Gao, Y . Quan, Z. Liu, X. Hu, M. Fang, “Se- cure retrieval-augmented generation against poisoning attacks,”arXiv preprint arXiv:2510.25025, 2025

arXiv 2025
[45]

Traceback of poisoning attacks to retrieval-augmented generation,

B. Zhang, H. Xin, M. Fang, Z. Liu, B. Yi, T. Li, Z. Liu, “Traceback of poisoning attacks to retrieval-augmented generation,” inProceedings of the ACM on Web Conference 2025, 2025

2025
[46]

De- fending against prompt injection with datafilter,

Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, “De- fending against prompt injection with datafilter,”arXiv preprint ar- Xiv:2510.19207, 2025

arXiv 2025
[47]

Preventing prompt injection with type-directed privilege separation,

D. Jacob, E. Alghamdi, Z. Hu, B. Alomair, D. Wagner, “Preventing prompt injection with type-directed privilege separation,”arXiv preprint arXiv:2509.25926, 2025

Pith/arXiv arXiv 2025
[48]

AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,

H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,”arXiv preprint ar- Xiv:2503.18666, 2025

Pith/arXiv arXiv 2025
[49]

Ml-based behavioral malware detection is far from a solved problem,

Y . Kaya, Y . Chen, M. Botacin, S. Saha, F. Pierazzi, L. Cavallaro, D. Wagner, T. Dumitras ¸, “Ml-based behavioral malware detection is far from a solved problem,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025
[50]

The long-horizon task mirage? diagnosing where and why agentic systems break,

X. J. Wang, H. Bai, Y . Sun, H. Wang, S. Zhang, W. Hu, M. Schroder, B. Mutlu, D. Song, R. D. Nowak, “The long-horizon task mirage? diagnosing where and why agentic systems break,”arXiv preprint arXiv:2604.11978, 2026

Pith/arXiv arXiv 2026
[51]

Get my drift? catching llm task drift with activation deltas,

S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, A. Paverd, “Get my drift? catching llm task drift with activation deltas,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025
[52]

Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,

J. Piet, X. Huang, D. Jacob, A. Chow, M. Alrashed, G. Zhao, Z. Hu, C. Sitawarin, B. Alomair, D. Wagner, “Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,” inProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, 2025

2025
[53]

“real attackers don’t compute gradients

G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, K. Roundy, ““real attackers don’t compute gradients”: Bridging the gap between adversarial ml research and practice,” in2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2023

2023
[54]

Uncovering vulnerabilities of llm-assisted cyber threat intelligence,

Y . Meng, L. Tang, F. Yu, J. Jia, G. Yan, P. Yang, Z. Xi, “Uncovering vulnerabilities of llm-assisted cyber threat intelligence,”arXiv preprint arXiv:2509.23573, 2025

Pith/arXiv arXiv 2025
[55]

Trident: Improving malware detection with llms and behavioral features,

R. Saul, J. Jiang, E. Chia, D. Wagner, “Trident: Improving malware detection with llms and behavioral features,”arXiv preprint ar- Xiv:2605.00297, 2026

Pith/arXiv arXiv 2026
[56]

Seedaichemy: Llm-driven seed corpus generation for fuzzing,

A. Wen, N. A. Alzahrani, J. Jiang, A. Joe, K. Shieh, A. Zhang, B. Alo- mair, D. Wagner, “Seedaichemy: Llm-driven seed corpus generation for fuzzing,”arXiv preprint arXiv:2511.12448, 2025

arXiv 2025
[57]

Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,

L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, X. Shen, “Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,”arXiv preprint arXiv:2502.20421, 2025

arXiv 2025
[58]

Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,

P. Sharma, H. Wen, V . Yegneswaran, A. Gehani, P. Porras, Z. Lin, “Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,”arXiv preprint arXiv:2509.21634, 2025

arXiv 2025

[1] [1]

Cost of a data breach report 2024,

IBM Security, “Cost of a data breach report 2024,” https://www.ibm. com/reports/data-breach, 2024, accessed: 2026-06-12

2024

[2] [2]

PocketAgents: A manifest-driven library of autonomous defense agents,

S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “PocketAgents: A manifest-driven library of autonomous defense agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.21694

Pith/arXiv arXiv 2026

[3] [3]

AutoSUT: The environment semantics gap in structured CTI for adversary emulation,

——, “AutoSUT: The environment semantics gap in structured CTI for adversary emulation,” 2026. [Online]. Available: https: //arxiv.org/abs/2606.08700

Pith/arXiv arXiv 2026

[4] [4]

Benchmarking large language models for cyber- security advisory,

N. Kaushiket al., “Benchmarking large language models for cyber- security advisory,”arXiv preprint arXiv:2405.20441, 2024, SECURE benchmark

arXiv 2024

[5] [5]

Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),

The Apache Software Foundation, “Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),” https://caldera. apache.org/, 2026, accessed: 2026-06-17

2026

[6] [6]

The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,

´A. L. R. Ferraz, S. Barbieri, M. E. de Souza, L. A. Pereira J ´unior, “The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.12078

Pith/arXiv arXiv 2026

[7] [7]

SOCpilot: Verifying policy compliance for LLM-assisted incident response,

S. Barbieri, L. V . d. Meneses, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” 2026. [Online]. Available: https://arxiv.org/abs/2605.05501

Pith/arXiv arXiv 2026

[8] [8]

A framework for formalizing llm agent security,

V . Siu, J. He, K. Montgomery, Z. Wang, N. Gong, C. Wang, D. Song, “A framework for formalizing llm agent security,”arXiv preprint arXiv:2603.19469, 2026

arXiv 2026

[9] [9]

A critical evaluation of defenses against prompt injection attacks,

Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

arXiv 2025

[10] [10]

Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,

M. Polese, L. Bonati, S. D’Oro, S. Basagni, T. Melodia, “Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,” 2022. [Online]. Available: https://arxiv.org/abs/2202.01032

arXiv 2022

[11] [11]

ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,

G. d. S. Machado, G. Z. Bruno, A. Huff, J. M. C. Brito, C. B. Both, “ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,” 2026. [Online]. Available: https://arxiv.org/abs/2603.03667

arXiv 2026

[12] [12]

AutoRAN: Automated and zero-touch Open RAN systems,

S. Maxenti, R. Shirkhani, M. Elkael, L. Bonati, S. D’Oro, T. Melodia, M. Polese, “AutoRAN: Automated and zero-touch Open RAN systems,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11233

arXiv 2025

[13] [13]

When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,

W. C. Sonaglio, ´A. L. R. Ferraz, A. E. Melo, M. E. de Souza, G. Noubir, L. A. Pereira J ´unior, “When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,” 2026, arXiv:2603.04662

Pith/arXiv arXiv 2026

[14] [14]

A systematic security testing approach for InterUSS-based environments,

H. Curi de Miranda, ´A. L. R. Ferraz, W. C. Sonaglio, L. A. Pe- reira J´unior, “A systematic security testing approach for InterUSS-based environments,” 2026, arXiv:2605.11339

Pith/arXiv arXiv 2026

[15] [15]

Claude models overview,

Anthropic, “Claude models overview,” https://docs.anthropic.com/en/ docs/about-claude/models/overview, 2026, accessed: 2026-06-18

2026

[16] [16]

FlexRIC tutorial: xApp development,

OpenAirInterface Alliance, “FlexRIC tutorial: xApp development,” https://openairinterface.org/flexric-tutorial-xapp-development/, 2026, accessed: 2026-06-18

2026

[17] [17]

TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,

S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,” 2026. [Online]. Available: https://arxiv.org/abs/2606.18320

Pith/arXiv arXiv 2026

[18] [18]

CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,

Microsoft, “CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,” https://github. com/microsoft/CyberBattleSim, 2021, accessed: 2026-06-12

2021

[19] [19]

Automated repeatable adversary threat emulation with effects language (EL),

Suresh K. Damodaran and Paul D. Rowe, “Automated repeatable adversary threat emulation with effects language (EL),”Digital Threats: Research and Practice, 2026. [Online]. Available: https: //doi.org/10.1145/3816043

work page doi:10.1145/3816043 2026

[20] [20]

The science of cyber security experimentation: The DETER project,

T. Benzel, “The science of cyber security experimentation: The DETER project,” inAnnual Computer Security Applications Conf. (ACSAC), 2011

2011

[21] [21]

An integrated experimental environment for distributed systems and networks,

B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. New- bold, M. Hibler, C. Barb, A. Joglekar, “An integrated experimental environment for distributed systems and networks,” inUSENIX Symp. on Operating Systems Design and Implementation (OSDI), 2002

2002

[22] [22]

ATT&CK evaluations,

MITRE Engenuity, “ATT&CK evaluations,” https://attackevals. mitre-engenuity.org/, 2026, accessed: 2026-06-18

2026

[23] [23]

Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,

A. Chona, I. Kozlov, A. Kumar, “Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,” arXiv:2604.19533, 2026

Pith/arXiv arXiv 2026

[24] [24]

Piarena: A platform for prompt injection evaluation,

R. Geng, C. Yin, Y . Wang, Y . Chen, J. Jia, “Piarena: A platform for prompt injection evaluation,”arXiv preprint arXiv:2604.08499, 2026

Pith/arXiv arXiv 2026

[25] [25]

Safety at scale: a comprehensive survey of large model and agent safety,

X. Ma, Y . Gao, Y . Wang, R. Wang, X. Wang, Y . Sun, Y . Ding, H. Xu, Y . Chen, Y . Zhao, H. Huang, Y . Li, Y . Wu, J. Zhang, X. Zheng, Y . Bai, Y . Li, Z. Wu, X. Qiu, J. Zhang, X. Han, H. Li, J. Sun, C. Wang, J. Gu, B. Wu, S. Chen, T. Zhang, Y . Liu, M. Gong, T. Liu, S. Pan, C. Xie, T. Pang, Y . Dong, R. Jia, Y . Zhang, S. Ma, X. Zhang, N. Gong, C. Xiao,...

2025

[26] [26]

On the trustworthiness of generative foundation models: Guideline, assessment, and perspective,

Y . Huang, C. Gao, S. Wu, H. Wang, X. Wang, Y . Zhou, Y . Wang, J. Ye, J. Shi, Q. Zhang, Y . Li, H. Bao, Z. Liu, T. Guan, D. Chen, R. Chen, K. Guo, A. Zou, B. H. Kuen-Yew, C. Xiong, E. Stengel-Eskin, H. Zhang, H. Yin, H. Zhang, H. Yao, J. Yoon, J. Zhang, K. Shu, K. Zhu, R. Krishna, S. Swayamdipta, T. Shi, W. Shi, X. Li, Y . Li, Y . Hao, Z. Jia, Z. Li, X. ...

Pith/arXiv arXiv 2025

[27] [27]

Sok: On the offensive potential of ai,

S. L. Schr ¨oer, G. Apruzzese, S. Human, P. Laskov, H. S. Anderson, E. W. N. Bernroider, A. Fass, B. Nassi, V . Rimmer, F. Roli, S. Salam, C. E. A. Shen, A. Sunyaev, T. Wadhwa-Brown, I. Wagner, G. Wang, “Sok: On the offensive potential of ai,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025

[28] [28]

Safeagent: Safeguarding llm agents via an automated risk simulator,

X. Zhou, W. Wang, L. Lu, J. Shi, G. Tie, Y . Xu, L. Chen, P. Zhou, N. Z. Gong, L. Sun, “Safeagent: Safeguarding llm agents via an automated risk simulator,”arXiv preprint arXiv:2505.17735, 2025

arXiv 2025

[29] [29]

Promptlocate: Localizing prompt injection attacks,

Y . Jia, Y . Liu, Z. Shao, J. Jia, N. Gong, “Promptlocate: Localizing prompt injection attacks,”arXiv preprint arXiv:2510.12252, 2025

arXiv 2025

[30] [30]

Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,

R. Wang, Y . Jia, N. Z. Gong, “Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,”arXiv preprint arXiv:2512.09321, 2025

arXiv 2025

[31] [31]

Websentinel: Detecting and localizing prompt injection attacks for web agents,

X. Wang, Y . Liu, Z. Wang, D. Song, N. Gong, “Websentinel: Detecting and localizing prompt injection attacks for web agents,”arXiv preprint arXiv:2602.03792, 2026

arXiv 2026

[32] [32]

Prompt injection attack to tool selection in llm agents,

J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, “Prompt injection attack to tool selection in llm agents,”arXiv preprint arXiv:2504.19793, 2025

Pith/arXiv arXiv 2025

[33] [33]

Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,

R. Geng, Y . Wang, C. Yin, M. Cheng, Y . Chen, J. Jia, “Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,” arXiv preprint arXiv:2511.10720, 2025

arXiv 2025

[34] [34]

Jailbreaking safeguarded text-to-image models via large language models,

Z. Jiang, Y . Hu, Y . Yang, Y . Cao, N. Z. Gong, “Jailbreaking safeguarded text-to-image models via large language models,” inFindings of the Association for Computational Linguistics: EACL, 2026

2026

[35] [35]

Jailbreaking black box large language models in twenty queries,

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong, “Jailbreaking black box large language models in twenty queries,” in 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025

[36] [36]

Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” USENIX Security Symposium, 2025, arXiv:2402.07867

arXiv 2025

[37] [37]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,

R. Geng, Y . Wang, Y . Chen, J. Jia, “Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,”arXiv preprint arXiv:2508.18652, 2025

arXiv 2025

[38] [38]

Graphrag under fire,

J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025

arXiv 2025

[39] [39]

Cleanbase: Detecting malicious documents in rag knowledge database,

W. Jin, X. Wang, W. Zou, J. Jia, N. Gong, “Cleanbase: Detecting malicious documents in rag knowledge database,”arXiv preprint ar- Xiv:2605.00460, 2026

Pith/arXiv arXiv 2026

[40] [40]

From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,

D. Feng, W. Cui, Y . Jiang, W. Yu, D. Li, “From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,” inIEEE Access, 2026

2026

[41] [41]

Maltool: Malicious tool attacks on llm agents,

Y . Hu, Y . Jia, M. Li, D. Song, N. Gong, “Maltool: Malicious tool attacks on llm agents,”arXiv preprint arXiv:2602.12194, 2026

Pith/arXiv arXiv 2026

[42] [42]

Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,

H. Ye, Z. Zhang, J. Jia, H. Hu, “Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,”arXiv preprint arXiv:2604.07536, 2026

Pith/arXiv arXiv 2026

[43] [43]

A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,

Anonymous, “A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,” OpenReview preprint, 2025

2025

[44] [44]

Se- cure retrieval-augmented generation against poisoning attacks,

Z. Cheng, J. Sun, A. Gao, Y . Quan, Z. Liu, X. Hu, M. Fang, “Se- cure retrieval-augmented generation against poisoning attacks,”arXiv preprint arXiv:2510.25025, 2025

arXiv 2025

[45] [45]

Traceback of poisoning attacks to retrieval-augmented generation,

B. Zhang, H. Xin, M. Fang, Z. Liu, B. Yi, T. Li, Z. Liu, “Traceback of poisoning attacks to retrieval-augmented generation,” inProceedings of the ACM on Web Conference 2025, 2025

2025

[46] [46]

De- fending against prompt injection with datafilter,

Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, “De- fending against prompt injection with datafilter,”arXiv preprint ar- Xiv:2510.19207, 2025

arXiv 2025

[47] [47]

Preventing prompt injection with type-directed privilege separation,

D. Jacob, E. Alghamdi, Z. Hu, B. Alomair, D. Wagner, “Preventing prompt injection with type-directed privilege separation,”arXiv preprint arXiv:2509.25926, 2025

Pith/arXiv arXiv 2025

[48] [48]

AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,

H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,”arXiv preprint ar- Xiv:2503.18666, 2025

Pith/arXiv arXiv 2025

[49] [49]

Ml-based behavioral malware detection is far from a solved problem,

Y . Kaya, Y . Chen, M. Botacin, S. Saha, F. Pierazzi, L. Cavallaro, D. Wagner, T. Dumitras ¸, “Ml-based behavioral malware detection is far from a solved problem,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025

[50] [50]

The long-horizon task mirage? diagnosing where and why agentic systems break,

X. J. Wang, H. Bai, Y . Sun, H. Wang, S. Zhang, W. Hu, M. Schroder, B. Mutlu, D. Song, R. D. Nowak, “The long-horizon task mirage? diagnosing where and why agentic systems break,”arXiv preprint arXiv:2604.11978, 2026

Pith/arXiv arXiv 2026

[51] [51]

Get my drift? catching llm task drift with activation deltas,

S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, A. Paverd, “Get my drift? catching llm task drift with activation deltas,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

2025

[52] [52]

Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,

J. Piet, X. Huang, D. Jacob, A. Chow, M. Alrashed, G. Zhao, Z. Hu, C. Sitawarin, B. Alomair, D. Wagner, “Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,” inProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, 2025

2025

[53] [53]

“real attackers don’t compute gradients

G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, K. Roundy, ““real attackers don’t compute gradients”: Bridging the gap between adversarial ml research and practice,” in2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2023

2023

[54] [54]

Uncovering vulnerabilities of llm-assisted cyber threat intelligence,

Y . Meng, L. Tang, F. Yu, J. Jia, G. Yan, P. Yang, Z. Xi, “Uncovering vulnerabilities of llm-assisted cyber threat intelligence,”arXiv preprint arXiv:2509.23573, 2025

Pith/arXiv arXiv 2025

[55] [55]

Trident: Improving malware detection with llms and behavioral features,

R. Saul, J. Jiang, E. Chia, D. Wagner, “Trident: Improving malware detection with llms and behavioral features,”arXiv preprint ar- Xiv:2605.00297, 2026

Pith/arXiv arXiv 2026

[56] [56]

Seedaichemy: Llm-driven seed corpus generation for fuzzing,

A. Wen, N. A. Alzahrani, J. Jiang, A. Joe, K. Shieh, A. Zhang, B. Alo- mair, D. Wagner, “Seedaichemy: Llm-driven seed corpus generation for fuzzing,”arXiv preprint arXiv:2511.12448, 2025

arXiv 2025

[57] [57]

Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,

L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, X. Shen, “Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,”arXiv preprint arXiv:2502.20421, 2025

arXiv 2025

[58] [58]

Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,

P. Sharma, H. Wen, V . Yegneswaran, A. Gehani, P. Porras, Z. Lin, “Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,”arXiv preprint arXiv:2509.21634, 2025

arXiv 2025