pith. sign in

arxiv: 2606.21377 · v1 · pith:DOGFLU3Ynew · submitted 2026-06-19 · 💻 cs.CR

ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

Pith reviewed 2026-06-26 13:58 UTC · model grok-4.3

classification 💻 cs.CR
keywords privacy-utility boundarySIEM data anonymizationautonomous cyber defensetransferability measurementproduction telemetrySOCpilot incidentsHIKARI challengesLLM action verification
0
0 comments X

The pith

Treating the boundary between private production telemetry and reusable research artifacts as the design object produces a measurable privacy-utility boundary.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a methodology for extracting, anonymizing, structuring, and validating SIEM data from a production financial SOC to create reusable research artifacts. This addresses the problem that realistic operational evidence remains inaccessible for scientific study because raw logs cannot be released. The methodology is stressed in two ways: as training material it requires anonymization to preserve temporal order and entity consistency for 37 MITRE ATT&CK-mapped HIKARI challenges, and as a measurement substrate a deterministic verifier identifies non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents. A sympathetic reader would care because the result is a concrete, testable privacy-utility boundary rather than an abstract anonymity claim.

Core claim

By treating the boundary between private production telemetry and reusable research artifacts as the design object, the methodology produces a measurable privacy-utility boundary, demonstrated by the requirement that anonymization preserve temporal order and entity consistency for HIKARI challenges and by the deterministic verifier detecting non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents.

What carries the argument

The privacy boundary between private production telemetry and reusable research artifacts, which serves as the explicit design object for extraction, anonymization, structuring, and validation of SIEM data while preserving task-relevant investigative structure.

If this is right

  • Anonymization must preserve temporal order and entity consistency for the artifacts to support MITRE ATT&CK-mapped HIKARI challenges.
  • A deterministic verifier can detect LLM actions that deviate from observed human baselines across the 200 SOCpilot incidents.
  • The same artifact can serve both as training material that fails loudly and as a measurement substrate that fails quietly.
  • Research on autonomous cyber defense can use production-derived artifacts instead of synthetic or dated datasets once the privacy boundary is treated as the design object.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same boundary-design approach could be adapted to create research artifacts from other domains that hold sensitive operational telemetry.
  • The contrast between loud failure for training and quiet failure for measurement indicates that utility must be evaluated separately for each consumer type.
  • Extending the verifier across a larger set of incidents would test whether the observed deviations generalize beyond the current sample.

Load-bearing premise

The assumption that the deterministic verifier correctly identifies actions as non-compliant and absent from the human baseline, and that the 200 SOCpilot incidents provide a representative sample for measuring transferability.

What would settle it

An observation that the verifier flags actions present in the human baseline or that HIKARI challenges succeed without preservation of temporal order and entity consistency.

read the original abstract

Operational evidence is not automatically scientific evidence. The most realistic Security Operations Center (SOC) data is production telemetry, yet it remains scientifically inaccessible because raw logs cannot be released; as a result, research relies on synthetic or dated datasets. We treat the boundary between private production telemetry and reusable research artifacts as the design object: a methodology that extracts, anonymizes, structures, and validates Security Information and Event Management (SIEM) data from a production financial SOC while preserving task-relevant investigative structure within a declared privacy boundary. Two consumers stress the same artifact. As training material, it fails loudly: 37 MITRE ATT&CK-mapped HIKARI challenges work only when anonymization preserves temporal order and entity consistency. As a measurement substrate, it fails quietly: across 200 SOCpilot incidents, a deterministic verifier detects non-compliant Large Language Model (LLM) actions that are absent from the human baseline. The result is a measurable privacy-utility boundary rather than a formal anonymity claim.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents ARENA, an architecture that treats the boundary between private production SOC telemetry and reusable research artifacts as the design object. It describes a methodology to extract, anonymize, structure, and validate SIEM data from a production financial SOC while preserving task-relevant structure. The artifact is evaluated in two settings: as training material for 37 MITRE ATT&CK-mapped HIKARI challenges (which require preservation of temporal order and entity consistency) and as a measurement substrate for 200 SOCpilot incidents, where a deterministic verifier identifies non-compliant LLM actions absent from a human baseline, yielding a measurable privacy-utility boundary rather than a formal anonymity guarantee.

Significance. If the methodology, verifier, and baseline construction hold under scrutiny, the work would provide a practical route to making realistic production SOC data available for research on autonomous cyber defense, addressing the longstanding gap between inaccessible real telemetry and synthetic or outdated public datasets. The dual-use demonstration (training failures and measurement failures) offers a concrete, falsifiable illustration of privacy-utility trade-offs.

major comments (2)
  1. [Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.
  2. [Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each major comment below, acknowledging omissions in the current manuscript and committing to revisions that directly strengthen the claims without misrepresenting the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract and measurement-substrate section: the central claim that the deterministic verifier detects non-compliant LLM actions absent from the human baseline across 200 SOCpilot incidents is load-bearing for the privacy-utility boundary result, yet the manuscript supplies no specification of the verifier's decision rules, how the human baseline was collected or annotated (same incidents vs. controls, inter-rater reliability), validation steps against false positives/negatives, or the selection criteria for the 200 incidents.

    Authors: The referee correctly identifies that these specifications are absent from the manuscript. In the revised version we will expand the measurement-substrate section to supply: the complete set of deterministic decision rules used by the verifier; the protocol for collecting and annotating the human baseline (including confirmation that the same 200 incidents were used and any inter-rater reliability statistics); the validation procedures applied to quantify false-positive and false-negative rates; and the explicit selection criteria applied to the 200 incidents. These additions will make the privacy-utility boundary result reproducible and address the load-bearing nature of the claim. revision: yes

  2. Referee: [Abstract] Abstract (and any section describing the measurement substrate): without independent validation of verifier correctness and baseline construction, the observation that certain LLM actions are 'absent from the human baseline' risks circularity if the verifier's rules implicitly encode assumptions aligned with expected LLM failure modes rather than external ground truth.

    Authors: We agree that the absence of explicit independent validation leaves the claim open to a circularity concern. The verifier rules were constructed from pre-existing SOC operational compliance standards rather than from observed LLM behaviors; however, the manuscript does not currently document the independent validation steps taken. The revision will add a dedicated subsection that (a) traces each rule to its external SOC-standard source and (b) reports any validation performed (e.g., application to synthetic compliant/non-compliant cases or additional reviewer cross-checks). If further external validation data cannot be supplied without compromising the privacy boundary, we will state this limitation explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical methodology stands on observed outcomes

full rationale

The paper describes a data-anonymization pipeline and its use as both training material and measurement substrate for comparing LLM vs. human SOC actions. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central result—that a deterministic verifier flags LLM actions absent from a human baseline across 200 incidents—is presented as an empirical observation rather than a derivation that reduces to its own inputs by construction. The absence of any load-bearing self-referential step keeps the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that production SOC telemetry contains preservable task-relevant structure and on the invented entity of the ARENA architecture itself; no free parameters are stated.

axioms (1)
  • domain assumption Production SOC telemetry contains task-relevant investigative structure that can be preserved under anonymization within a declared privacy boundary
    Invoked as the design object of the methodology in the abstract
invented entities (1)
  • ARENA architecture no independent evidence
    purpose: Extracting, anonymizing, structuring, and validating SIEM data to measure transferability of autonomous cyber defense
    New proposed system introduced in the title and abstract

pith-pipeline@v0.9.1-grok · 5733 in / 1439 out tokens · 39409 ms · 2026-06-26T13:58:10.296340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 1 canonical work pages

  1. [1]

    Cost of a data breach report 2024,

    IBM Security, “Cost of a data breach report 2024,” https://www.ibm. com/reports/data-breach, 2024, accessed: 2026-06-12

  2. [2]

    PocketAgents: A manifest-driven library of autonomous defense agents,

    S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “PocketAgents: A manifest-driven library of autonomous defense agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.21694

  3. [3]

    AutoSUT: The environment semantics gap in structured CTI for adversary emulation,

    ——, “AutoSUT: The environment semantics gap in structured CTI for adversary emulation,” 2026. [Online]. Available: https: //arxiv.org/abs/2606.08700

  4. [4]

    Benchmarking large language models for cyber- security advisory,

    N. Kaushiket al., “Benchmarking large language models for cyber- security advisory,”arXiv preprint arXiv:2405.20441, 2024, SECURE benchmark

  5. [5]

    Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),

    The Apache Software Foundation, “Apache Caldera: Automated adver- sary emulation platform (originally MITRE Caldera),” https://caldera. apache.org/, 2026, accessed: 2026-06-17

  6. [6]

    The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,

    ´A. L. R. Ferraz, S. Barbieri, M. E. de Souza, L. A. Pereira J ´unior, “The procedural semantics gap in structured CTI: A measurement- driven STIX analysis for APT emulation,” 2026. [Online]. Available: https://arxiv.org/abs/2512.12078

  7. [7]

    SOCpilot: Verifying policy compliance for LLM-assisted incident response,

    S. Barbieri, L. V . d. Meneses, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” 2026. [Online]. Available: https://arxiv.org/abs/2605.05501

  8. [8]

    A framework for formalizing llm agent security,

    V . Siu, J. He, K. Montgomery, Z. Wang, N. Gong, C. Wang, D. Song, “A framework for formalizing llm agent security,”arXiv preprint arXiv:2603.19469, 2026

  9. [9]

    A critical evaluation of defenses against prompt injection attacks,

    Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

  10. [10]

    Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,

    M. Polese, L. Bonati, S. D’Oro, S. Basagni, T. Melodia, “Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges,” 2022. [Online]. Available: https://arxiv.org/abs/2202.01032

  11. [11]

    ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,

    G. d. S. Machado, G. Z. Bruno, A. Huff, J. M. C. Brito, C. B. Both, “ORION: Intent-aware orchestration in Open RAN for SLA-driven network management,” 2026. [Online]. Available: https://arxiv.org/abs/2603.03667

  12. [12]

    AutoRAN: Automated and zero-touch Open RAN systems,

    S. Maxenti, R. Shirkhani, M. Elkael, L. Bonati, S. D’Oro, T. Melodia, M. Polese, “AutoRAN: Automated and zero-touch Open RAN systems,” 2025. [Online]. Available: https://arxiv.org/abs/2504.11233

  13. [13]

    When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,

    W. C. Sonaglio, ´A. L. R. Ferraz, A. E. Melo, M. E. de Souza, G. Noubir, L. A. Pereira J ´unior, “When connectivity is not enough: Cross-layer attacks on UA V C2 over 5G,” 2026, arXiv:2603.04662

  14. [14]

    A systematic security testing approach for InterUSS-based environments,

    H. Curi de Miranda, ´A. L. R. Ferraz, W. C. Sonaglio, L. A. Pe- reira J´unior, “A systematic security testing approach for InterUSS-based environments,” 2026, arXiv:2605.11339

  15. [15]

    Claude models overview,

    Anthropic, “Claude models overview,” https://docs.anthropic.com/en/ docs/about-claude/models/overview, 2026, accessed: 2026-06-18

  16. [16]

    FlexRIC tutorial: xApp development,

    OpenAirInterface Alliance, “FlexRIC tutorial: xApp development,” https://openairinterface.org/flexric-tutorial-xapp-development/, 2026, accessed: 2026-06-18

  17. [17]

    TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,

    S. Barbieri, ´A. L. R. Ferraz, L. A. Pereira J ´unior, “TopVenues: A reproducible corpus and tooling substrate for cybersecurity literature reviews,” 2026. [Online]. Available: https://arxiv.org/abs/2606.18320

  18. [18]

    CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,

    Microsoft, “CyberBattleSim: An experimentation and research platform for automated agents in simulated enterprise networks,” https://github. com/microsoft/CyberBattleSim, 2021, accessed: 2026-06-12

  19. [19]

    Automated repeatable adversary threat emulation with effects language (EL),

    Suresh K. Damodaran and Paul D. Rowe, “Automated repeatable adversary threat emulation with effects language (EL),”Digital Threats: Research and Practice, 2026. [Online]. Available: https: //doi.org/10.1145/3816043

  20. [20]

    The science of cyber security experimentation: The DETER project,

    T. Benzel, “The science of cyber security experimentation: The DETER project,” inAnnual Computer Security Applications Conf. (ACSAC), 2011

  21. [21]

    An integrated experimental environment for distributed systems and networks,

    B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. New- bold, M. Hibler, C. Barb, A. Joglekar, “An integrated experimental environment for distributed systems and networks,” inUSENIX Symp. on Operating Systems Design and Implementation (OSDI), 2002

  22. [22]

    ATT&CK evaluations,

    MITRE Engenuity, “ATT&CK evaluations,” https://attackevals. mitre-engenuity.org/, 2026, accessed: 2026-06-18

  23. [23]

    Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,

    A. Chona, I. Kozlov, A. Kumar, “Cyber Defense Benchmark: Agentic threat hunting evaluation for LLMs in SecOps,” arXiv:2604.19533, 2026

  24. [24]

    Piarena: A platform for prompt injection evaluation,

    R. Geng, C. Yin, Y . Wang, Y . Chen, J. Jia, “Piarena: A platform for prompt injection evaluation,”arXiv preprint arXiv:2604.08499, 2026

  25. [25]

    Safety at scale: a comprehensive survey of large model and agent safety,

    X. Ma, Y . Gao, Y . Wang, R. Wang, X. Wang, Y . Sun, Y . Ding, H. Xu, Y . Chen, Y . Zhao, H. Huang, Y . Li, Y . Wu, J. Zhang, X. Zheng, Y . Bai, Y . Li, Z. Wu, X. Qiu, J. Zhang, X. Han, H. Li, J. Sun, C. Wang, J. Gu, B. Wu, S. Chen, T. Zhang, Y . Liu, M. Gong, T. Liu, S. Pan, C. Xie, T. Pang, Y . Dong, R. Jia, Y . Zhang, S. Ma, X. Zhang, N. Gong, C. Xiao,...

  26. [26]

    On the trustworthiness of generative foundation models: Guideline, assessment, and perspective,

    Y . Huang, C. Gao, S. Wu, H. Wang, X. Wang, Y . Zhou, Y . Wang, J. Ye, J. Shi, Q. Zhang, Y . Li, H. Bao, Z. Liu, T. Guan, D. Chen, R. Chen, K. Guo, A. Zou, B. H. Kuen-Yew, C. Xiong, E. Stengel-Eskin, H. Zhang, H. Yin, H. Zhang, H. Yao, J. Yoon, J. Zhang, K. Shu, K. Zhu, R. Krishna, S. Swayamdipta, T. Shi, W. Shi, X. Li, Y . Li, Y . Hao, Z. Jia, Z. Li, X. ...

  27. [27]

    Sok: On the offensive potential of ai,

    S. L. Schr ¨oer, G. Apruzzese, S. Human, P. Laskov, H. S. Anderson, E. W. N. Bernroider, A. Fass, B. Nassi, V . Rimmer, F. Roli, S. Salam, C. E. A. Shen, A. Sunyaev, T. Wadhwa-Brown, I. Wagner, G. Wang, “Sok: On the offensive potential of ai,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

  28. [28]

    Safeagent: Safeguarding llm agents via an automated risk simulator,

    X. Zhou, W. Wang, L. Lu, J. Shi, G. Tie, Y . Xu, L. Chen, P. Zhou, N. Z. Gong, L. Sun, “Safeagent: Safeguarding llm agents via an automated risk simulator,”arXiv preprint arXiv:2505.17735, 2025

  29. [29]

    Promptlocate: Localizing prompt injection attacks,

    Y . Jia, Y . Liu, Z. Shao, J. Jia, N. Gong, “Promptlocate: Localizing prompt injection attacks,”arXiv preprint arXiv:2510.12252, 2025

  30. [30]

    Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,

    R. Wang, Y . Jia, N. Z. Gong, “Obliinjection: Order-oblivious prompt injection attack to llm agents with multi-source data,”arXiv preprint arXiv:2512.09321, 2025

  31. [31]

    Websentinel: Detecting and localizing prompt injection attacks for web agents,

    X. Wang, Y . Liu, Z. Wang, D. Song, N. Gong, “Websentinel: Detecting and localizing prompt injection attacks for web agents,”arXiv preprint arXiv:2602.03792, 2026

  32. [32]

    Prompt injection attack to tool selection in llm agents,

    J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, “Prompt injection attack to tool selection in llm agents,”arXiv preprint arXiv:2504.19793, 2025

  33. [33]

    Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,

    R. Geng, Y . Wang, C. Yin, M. Cheng, Y . Chen, J. Jia, “Pisanitizer: Pre- venting prompt injection to long-context llms via prompt sanitization,” arXiv preprint arXiv:2511.10720, 2025

  34. [34]

    Jailbreaking safeguarded text-to-image models via large language models,

    Z. Jiang, Y . Hu, Y . Yang, Y . Cao, N. Z. Gong, “Jailbreaking safeguarded text-to-image models via large language models,” inFindings of the Association for Computational Linguistics: EACL, 2026

  35. [35]

    Jailbreaking black box large language models in twenty queries,

    P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong, “Jailbreaking black box large language models in twenty queries,” in 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

  36. [36]

    Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

    W. Zou, R. Geng, B. Wang, J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” USENIX Security Symposium, 2025, arXiv:2402.07867

  37. [37]

    Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,

    R. Geng, Y . Wang, Y . Chen, J. Jia, “Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation,”arXiv preprint arXiv:2508.18652, 2025

  38. [38]

    Graphrag under fire,

    J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025

  39. [39]

    Cleanbase: Detecting malicious documents in rag knowledge database,

    W. Jin, X. Wang, W. Zou, J. Jia, N. Gong, “Cleanbase: Detecting malicious documents in rag knowledge database,”arXiv preprint ar- Xiv:2605.00460, 2026

  40. [40]

    From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,

    D. Feng, W. Cui, Y . Jiang, W. Yu, D. Li, “From static roles to context- aware decisions: Integrating llms and rag into access control frameworks for power systems,” inIEEE Access, 2026

  41. [41]

    Maltool: Malicious tool attacks on llm agents,

    Y . Hu, Y . Jia, M. Li, D. Song, N. Gong, “Maltool: Malicious tool attacks on llm agents,”arXiv preprint arXiv:2602.12194, 2026

  42. [42]

    Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,

    H. Ye, Z. Zhang, J. Jia, H. Hu, “Trustdesc: Preventing tool poisoning in llm applications via trusted description generation,”arXiv preprint arXiv:2604.07536, 2026

  43. [43]

    A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,

    Anonymous, “A2asecbench: A protocol-aware security benchmark for agent-to-agent multi-agent systems,” OpenReview preprint, 2025

  44. [44]

    Se- cure retrieval-augmented generation against poisoning attacks,

    Z. Cheng, J. Sun, A. Gao, Y . Quan, Z. Liu, X. Hu, M. Fang, “Se- cure retrieval-augmented generation against poisoning attacks,”arXiv preprint arXiv:2510.25025, 2025

  45. [45]

    Traceback of poisoning attacks to retrieval-augmented generation,

    B. Zhang, H. Xin, M. Fang, Z. Liu, B. Yi, T. Li, Z. Liu, “Traceback of poisoning attacks to retrieval-augmented generation,” inProceedings of the ACM on Web Conference 2025, 2025

  46. [46]

    De- fending against prompt injection with datafilter,

    Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, “De- fending against prompt injection with datafilter,”arXiv preprint ar- Xiv:2510.19207, 2025

  47. [47]

    Preventing prompt injection with type-directed privilege separation,

    D. Jacob, E. Alghamdi, Z. Hu, B. Alomair, D. Wagner, “Preventing prompt injection with type-directed privilege separation,”arXiv preprint arXiv:2509.25926, 2025

  48. [48]

    AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,

    H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable llm agents,”arXiv preprint ar- Xiv:2503.18666, 2025

  49. [49]

    Ml-based behavioral malware detection is far from a solved problem,

    Y . Kaya, Y . Chen, M. Botacin, S. Saha, F. Pierazzi, L. Cavallaro, D. Wagner, T. Dumitras ¸, “Ml-based behavioral malware detection is far from a solved problem,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

  50. [50]

    The long-horizon task mirage? diagnosing where and why agentic systems break,

    X. J. Wang, H. Bai, Y . Sun, H. Wang, S. Zhang, W. Hu, M. Schroder, B. Mutlu, D. Song, R. D. Nowak, “The long-horizon task mirage? diagnosing where and why agentic systems break,”arXiv preprint arXiv:2604.11978, 2026

  51. [51]

    Get my drift? catching llm task drift with activation deltas,

    S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, A. Paverd, “Get my drift? catching llm task drift with activation deltas,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

  52. [52]

    Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,

    J. Piet, X. Huang, D. Jacob, A. Chow, M. Alrashed, G. Zhao, Z. Hu, C. Sitawarin, B. Alomair, D. Wagner, “Jailbreaksovertime: Detecting jailbreak attacks under distribution shift,” inProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, 2025

  53. [53]

    “real attackers don’t compute gradients

    G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, K. Roundy, ““real attackers don’t compute gradients”: Bridging the gap between adversarial ml research and practice,” in2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2023

  54. [54]

    Uncovering vulnerabilities of llm-assisted cyber threat intelligence,

    Y . Meng, L. Tang, F. Yu, J. Jia, G. Yan, P. Yang, Z. Xi, “Uncovering vulnerabilities of llm-assisted cyber threat intelligence,”arXiv preprint arXiv:2509.23573, 2025

  55. [55]

    Trident: Improving malware detection with llms and behavioral features,

    R. Saul, J. Jiang, E. Chia, D. Wagner, “Trident: Improving malware detection with llms and behavioral features,”arXiv preprint ar- Xiv:2605.00297, 2026

  56. [56]

    Seedaichemy: Llm-driven seed corpus generation for fuzzing,

    A. Wen, N. A. Alzahrani, J. Jiang, A. Joe, K. Shieh, A. Zhang, B. Alo- mair, D. Wagner, “Seedaichemy: Llm-driven seed corpus generation for fuzzing,”arXiv preprint arXiv:2511.12448, 2025

  57. [57]

    Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,

    L. Li, X. Yang, W. Wu, H. Wang, T. Ohtsuki, X. Fu, M. Pan, X. Shen, “Mobillm: Enabling llm fine-tuning on the mobile device via server assisted side tuning,”arXiv preprint arXiv:2502.20421, 2025

  58. [58]

    Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,

    P. Sharma, H. Wen, V . Yegneswaran, A. Gehani, P. Porras, Z. Lin, “Mobillm: An agentic ai framework for closed-loop threat mitigation in 6g open rans,”arXiv preprint arXiv:2509.21634, 2025