Uncovering vulnerabilities of llm-assisted cyber threat intelligence

Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi · 2025 · cs.CR · arXiv 2509.23573

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Large language models (LLMs) are increasingly used to help security analysts manage the surge of cyber threats, automating tasks from vulnerability assessment to incident response. Yet in operational CTI workflows, reliability gaps remain substantial. Existing explanations often point to generic model issues (e.g., hallucination), but we argue the dominant bottleneck is the threat landscape itself: CTI is heterogeneous, volatile, and fragmented. Under these conditions, evidence is intertwined, crowdsourced, and temporally unstable, which are properties that standard LLM-based studies rarely capture. In this paper, we present a comprehensive empirical study of LLM vulnerabilities in CTI reasoning. We introduce a human-in-the-loop categorization framework that robustly labels failure modes across the CTI lifecycle, avoiding the brittleness of automated "LLM-as-a-judge" pipelines. We identify three domain-specific cognitive failures: spurious correlations from superficial metadata, contradictory knowledge from conflicting sources, and constrained generalization to emerging threats. We validate these mechanisms via causal interventions and show that targeted defenses reduce failure rates significantly. Together, these results offer a concrete roadmap for building resilient, domain-aware CTI agents.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.

ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

cs.CR · 2026-06-19 · unverdicted · novelty 5.0

ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory cs.LG · 2026-05-10 · unverdicted · none · ref 21 · internal anchor
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense cs.CR · 2026-06-19 · unverdicted · none · ref 54 · internal anchor
ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.

Uncovering vulnerabilities of llm-assisted cyber threat intelligence

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer