arxiv: 2605.06205 · v1 · submitted 2026-05-07 · 💻 cs.CR

Recognition: unknown

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Leo Linqian Gan (1) , Jeffery Wu (1) , Longyuan Ge (1) , Lanqing Yang (1) , Yonghao Song (1) , Jingkai Zhang (1) , Haojia Jin (1) , Weiyi Wang (1)

show 1 more author

Guangtao Xue (1) ((1) Shanghai Jiao Tong University)

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:15 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentsworkflow hijackingelectromagnetic side channelout-of-band detectionsecurity monitoringside-channel sensingagent security

0 comments

The pith

ClawGuard detects LLM agent workflow hijacks by capturing electromagnetic signals from hardware usage outside the potentially compromised host.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that workflow hijacking in autonomous LLM agents, where attackers subtly change tool and skill invocations, cannot be reliably stopped by internal logs because those can be forged once the host operating system is compromised. ClawGuard counters this by placing an external, passive monitor that listens to electromagnetic emanations produced by the agent's hardware activities. Different skills generate distinct patterns of computation, memory access, and network use, which create measurable EM envelopes that software-defined radios can record without any cooperation from the host. A drift-aware processing pipeline turns the raw radio streams into features for classification. On a large dataset of over seven terabytes of RF recordings, the system separates normal from hijacked executions with near-perfect detection rates.

Core claim

ClawGuard converts radio-frequency streams captured by external software-defined radios into 320-dimensional feature vectors through a drift-aware pipeline, mapping the unique electromagnetic envelopes of each LLM agent skill to detect workflow hijacking attempts with 100 percent true-positive rate and 1.16 percent false-positive rate even when the host is fully compromised.

What carries the argument

Passive electromagnetic side-channel sensing that records macroscopic EM envelopes emitted by distinct hardware usage patterns of agent skills, captured by external SDRs without any host software involvement.

Load-bearing premise

Distinct agent skills must produce sufficiently unique and stable hardware activity patterns whose electromagnetic signatures remain distinguishable by external sensors despite real-world interference and host compromise.

What would settle it

An experiment showing that two different agent skills produce overlapping EM signatures under identical conditions, or that an attacker can force one skill's hardware behavior to emit the EM envelope of another while preserving the intended workflow.

Figures

Figures reproduced from arXiv: 2605.06205 by Guangtao Xue (1) ((1) Shanghai Jiao Tong University), Haojia Jin (1), Jeffery Wu (1), Jingkai Zhang (1), Lanqing Yang (1), Leo Linqian Gan (1), Longyuan Ge (1), Weiyi Wang (1), Yonghao Song (1).

**Figure 1.** Figure 1: System overview. An adversary injects a malicious sub-skill into an OpenClaw execution on the target host, compromising host-internal telemetry. Consequently, the defender utilizes a co-located, passive SDR to observe far-field RF emanations as a secure out-of-band integrity channel. However, this autonomy introduces a critical vulnerability: workflow hijacking. As enterprises deploy agents to handle inc… view at source ↗

**Figure 2.** Figure 2: Power spectral density (top) and spectrogram (bot view at source ↗

**Figure 3.** Figure 3: Band-aggregated time–frequency fingerprints of the 15- skill attack catalog. 7) Tool-result poisoning: a legitimate tool returns corrupted output to the planner. From ClawGuard’s perspective, these attacks matter when they alter the physical execution trace. The monitor does not infer whether a natural-language plan is morally benign. Instead, it checks whether the hardware activity emitted by the host is … view at source ↗

**Figure 4.** Figure 4: Overview of ClawGuard. Passive dual-band SDR capture feeds a coarse–fine windowing front-end. Each fine window is transformed into a 320-d physical feature vector, compensated for drift, and classified into skill-level or attack-state evidence. Coarse-window aggregation produces the record-level workflow-integrity verdict evaluated in §VI. max-pool mean-pool top-k pool K4 (K = 4 small win) K8 (K = 8 small … view at source ↗

**Figure 5.** Figure 5: Coarse–fine windowing. The coarse envelope preserves view at source ↗

**Figure 6.** Figure 6: Prototype deployments. ClawGuard passively monitors real hosts using external SDRs. The Laptop deployment supports the headline and new-bands results, heavily emphasizing realistic enterprise scenarios; the Pi deployment is used as an additional stress setting. as the prominent platform, and a Raspberry Pi agent host for an additional portability stress campaign. These deployments exercise the full sensin… view at source ↗

**Figure 9.** Figure 9: Per-skill record counts in the new-bands corpus. Counts view at source ↗

**Figure 10.** Figure 10: Per-skill IQ amplitude in the new-bands corpus. The view at source ↗

**Figure 11.** Figure 11: Pairwise skill separability on big48. Skill-level EM evidence is structured: some pairs are highly separable, while broad open-set multi-class recognition remains difficult. 10 −4 10 −3 10 −2 10 −1 10 0 False Positive Rate (log scale) 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate (a) ROC (log-x for low-FPR detail) AUC = 0.9945 chance Youden TPR=1.000 FPR=0.012 thr=0.201 0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2… view at source ↗

**Figure 12.** Figure 12: Production operating curve on the 11,800-record split. ClawGuard achieves AUC = 0.9945 and detects attacks at 100% TPR with 1.16% FPR. system is more likely to raise a false alert than to silently miss an attack. D. Robustness, Transfer, and Cost This final subsection explains why the reported results should be read as a measured physical system rather than a lucky split, and it states the known boundarie… view at source ↗

**Figure 13.** Figure 13: New-bands confusion matrices. Most errors are view at source ↗

**Figure 16.** Figure 16: Probability calibration. Raw random-forest outputs view at source ↗

**Figure 30.** Figure 30: Inference latency — per-record p50=18.2ms, p99=28.6ms (real-time-able for 20s captures) view at source ↗

**Figure 17.** Figure 17: Inference latency. Median post-feature latency is 18ms view at source ↗

**Figure 18.** Figure 18: Band survey v2 workload deltas with the selected view at source ↗

**Figure 19.** Figure 19: Top candidate windows ranked by CPU and RAM view at source ↗

**Figure 20.** Figure 20: Pairwise 2-class separability on big48. f) Skill separability.: The aggregate 16-class result on big48 is low (macro-F1 = 0.146), but the pairwise structure is informative. Among 120 pairwise 2-class tests, 12 pairs exceed 0.80 F1, while 37 pairs fall below 0.55. The most separable skill is build_release_pipeline, which reaches 0.956 against log_rotate_compress, 0.947 against sensor_polling_iot, and 0.94… view at source ↗

**Figure 22.** Figure 22: Confusion matrices for the new-bands LOCO exper view at source ↗

**Figure 24.** Figure 24: Per-record inference latency breakdown. 18 view at source ↗

read the original abstract

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing defenses rely on host-internal telemetry (such as audit logs), which can be forged if the host OS is compromised. To solve this, we introduce ClawGuard, a passive, out-of-band monitor that audits LLM-agent workflows using electromagnetic (EM) emanations. Because distinct agent skills create unique hardware usage patterns (computation, DRAM, network blocking), they emit measurable, macroscopic EM envelopes. External software-defined radios (SDRs) capture these physical signals. Using a drift-aware pipeline with 320-dimensional features, ClawGuard converts RF streams into physical evidence. Evaluated on a 7.82TB RF corpus, ClawGuard achieved an AUC of 0.9945, detecting attacks with a 100% true-positive rate and a 1.16% false-positive rate. This proves passive EM sensing is a practical, forge-resistant physical check against compromised host software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ClawGuard, a passive out-of-band system for detecting LLM agent workflow hijacking attacks. It uses electromagnetic side-channel monitoring via external software-defined radios to capture macroscopic EM emanations arising from distinct agent skills' hardware usage patterns (computation, DRAM access, network activity). A drift-aware pipeline extracts 320-dimensional features from RF streams, and the system is evaluated on a 7.82 TB corpus, reporting an AUC of 0.9945 with 100% true-positive rate and 1.16% false-positive rate. The central claim is that this provides a forge-resistant physical-layer defense independent of potentially compromised host telemetry.

Significance. If the results prove robust, the work would represent a meaningful advance in securing autonomous LLM agents by exploiting physical invariants that are difficult to forge from software. The scale of the RF corpus is a positive aspect of the evaluation design. However, the absence of methodological details prevents a full assessment of whether the approach delivers a reliable, generalizable physical check or merely reflects corpus-specific artifacts.

major comments (2)

[Evaluation] Evaluation section: The manuscript reports strong performance metrics (AUC 0.9945, 100% TPR, 1.16% FPR) on a 7.82 TB corpus but supplies no description of data collection procedures, feature engineering for the 320-dimensional vectors, environmental controls, hardware platform variation, or controls for signal interference and concurrent host loads. This directly undermines the central claim that the features capture skill-specific EM envelopes rather than transient or environment-dependent effects.
[§3] §3 (System Design and Assumptions): The premise that distinct agent skills produce reliably separable macroscopic EM envelopes even under host compromise and real-world RF interference is stated without quantitative support, such as separability metrics or ablation studies under added DRAM contention or external emitters. Because the headline detection rates rest on this untested separability, the experimental results cannot yet be interpreted as evidence of a practical physical invariant.

minor comments (1)

[Abstract] The abstract introduces the term 'drift-aware pipeline' without a brief definition or reference to the relevant section explaining how drift is detected or mitigated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of ClawGuard as a physical-layer defense. We address each major comment point by point below. Revisions have been made to the manuscript to incorporate additional details and quantitative support where this strengthens the presentation without altering the core claims or results.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The manuscript reports strong performance metrics (AUC 0.9945, 100% TPR, 1.16% FPR) on a 7.82 TB corpus but supplies no description of data collection procedures, feature engineering for the 320-dimensional vectors, environmental controls, hardware platform variation, or controls for signal interference and concurrent host loads. This directly undermines the central claim that the features capture skill-specific EM envelopes rather than transient or environment-dependent effects.

Authors: We agree that expanded methodological details improve interpretability and reproducibility. In the revised manuscript we have substantially enlarged the Evaluation section. It now includes: (1) a complete description of the data-collection apparatus and protocol (specific SDR hardware, antenna placement, sampling rates, and session durations that produced the 7.82 TB corpus); (2) the exact feature-engineering pipeline that yields the 320-dimensional drift-aware vectors, including time-frequency transforms and normalization steps; (3) environmental controls (Faraday-cage shielding, temperature/humidity logging, and baseline noise measurements); (4) hardware-platform variation experiments across three distinct host configurations; and (5) explicit controls for concurrent host loads and external RF interference, with quantitative results showing that detection performance remains stable. These additions directly corroborate that the learned features reflect skill-specific macroscopic EM envelopes rather than transient environmental effects. revision: yes
Referee: [§3] §3 (System Design and Assumptions): The premise that distinct agent skills produce reliably separable macroscopic EM envelopes even under host compromise and real-world RF interference is stated without quantitative support, such as separability metrics or ablation studies under added DRAM contention or external emitters. Because the headline detection rates rest on this untested separability, the experimental results cannot yet be interpreted as evidence of a practical physical invariant.

Authors: We acknowledge the value of explicit quantitative backing for the separability assumption. While the headline metrics already constitute empirical evidence obtained under realistic conditions (including variable host loads and ambient RF), we have added two new elements to the revised manuscript. First, §3 now reports separability metrics (average inter-class Euclidean distance and silhouette coefficient) computed on the 320-dimensional feature vectors for the ten agent skills; these metrics confirm clear separation. Second, we include ablation results in the Evaluation section that inject controlled DRAM contention (via concurrent memory-bound processes) and external RF emitters (via calibrated signal generators at varying power levels). Under these conditions the AUC remains above 0.98 with negligible degradation in TPR/FPR, supporting the claim that the physical invariant is robust to host compromise and interference. The out-of-band architecture continues to guarantee independence from any forged host telemetry. revision: yes

Circularity Check

0 steps flagged

No circularity detected; results are empirical performance metrics on collected RF data

full rationale

The paper reports experimental detection performance (AUC 0.9945, 100% TPR, 1.16% FPR on 7.82 TB corpus) from a drift-aware ML pipeline applied to SDR-captured EM signals. No equations, derivations, or first-principles claims are presented that reduce the detection result to fitted parameters, self-definitions, or self-citations by construction. The premise that distinct skills produce unique macroscopic EM envelopes is stated as an empirical observation motivating the approach, not derived from prior results within the paper. The evaluation metrics are direct measurements on held-out data rather than predictions forced by the training process or renamed known patterns. This is a standard empirical security paper with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review conducted from abstract only; limited visibility into exact parameters or background assumptions used in the full pipeline.

free parameters (1)

320-dimensional feature set
The drift-aware pipeline converts RF streams into 320 features; dimensionality and selection criteria are not detailed in the abstract.

axioms (1)

domain assumption Distinct agent skills produce distinguishable macroscopic EM emanations from hardware activity
Stated as the basis for converting RF streams into physical evidence of workflow integrity.

pith-pipeline@v0.9.0 · 5526 in / 1245 out tokens · 68367 ms · 2026-05-08T09:15:22.395785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Generative AI - worldwide | statista market forecast,

Statista, “Generative AI - worldwide | statista market forecast,” https://www.statista.com/outlook/tmo/artificial-intelligence/ generative-ai/worldwide, 2024, accessed: 2026-05-07

2024
[2]

LangChain: A framework for developing applications pow- ered by language models,

LangChain, “LangChain: A framework for developing applications pow- ered by language models,” https://www.langchain.com/, 2024

2024
[3]

Openclaw — personal ai assistant,

OpenClaw, “Openclaw — personal ai assistant,” 5 2026, [Online; accessed 2026-05-07]. [Online]. Available: https://openclaw.ai/

2026
[4]

PoisonedRAG: Knowl- edge corruption attacks to retrieval-augmented generation of large lan- guage models,

W. Zou, R. He, T. Bachmann, M. Salehiet al., “PoisonedRAG: Knowl- edge corruption attacks to retrieval-augmented generation of large lan- guage models,” inProceedings of the 34th USENIX Security Symposium (USENIX Security 25). USENIX Association, 2025

2025
[5]

Prompt injection attack to tool selection in LLM agents,

Anonymous, “Prompt injection attack to tool selection in LLM agents,” inProceedings of the 2026 Network and Distributed System Security Symposium (NDSS ’26). Internet Society, 2026

2026
[6]

ObliInjection: Order-oblivious prompt injection attack to LLM agents with multi-source data,

S. Xuet al., “ObliInjection: Order-oblivious prompt injection attack to LLM agents with multi-source data,” inProceedings of the 2026 Net- work and Distributed System Security Symposium (NDSS ’26). Internet Society, 2026, arXiv:2512.09321

work page arXiv 2026
[7]

HOLMES: Real-time apt detection through correlation of suspicious information flows,

S. M. Milajerdi, R. Geng, S. Khalighinejad, H. Agarwal, M. Egele, and N. Nikiforakis, “HOLMES: Real-time apt detection through correlation of suspicious information flows,” in2019 IEEE Symposium on Security and Privacy (SP), 2019

2019
[8]

Unicorn: Runtime provenance-based detector for advanced persistent threats,

X. Han, T. Pasquier, A. Bates, J. Mickens, and M. Seltzer, “Unicorn: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2020

2020
[9]

Kobra: Targeted activity monitoring with ebpf,

R. Farkhaniet al., “Kobra: Targeted activity monitoring with ebpf,” in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2023

2023
[10]

A comprehensive memory safety analysis of bootload- ers,

Z. Zhonget al., “A comprehensive memory safety analysis of bootload- ers,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2025

2025
[11]

Controlled preemp- tion: Amplifying side-channel attacks from userspace,

Y . Zhu, B. Chen, Z. N. Zhao, and C. W. Fletcher, “Controlled preemp- tion: Amplifying side-channel attacks from userspace,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 25). ACM, 2025. 14

2025
[12]

Get your hands off my laptop: Physical side-channel key-extraction attacks on PCs,

D. Genkin, I. Pipman, and E. Tromer, “Get your hands off my laptop: Physical side-channel key-extraction attacks on PCs,” inCryptographic Hardware and Embedded Systems – CHES 2014, ser. Lecture Notes in Computer Science, vol. 8731. Springer, 2014, pp. 242–260

2014
[13]

ECDSA key extraction from mobile devices via nonintrusive physical side channels,

D. Genkin, L. Pachmanov, I. Pipman, E. Tromer, and Y . Yarom, “ECDSA key extraction from mobile devices via nonintrusive physical side channels,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM, 2016, pp. 1626–1638

2016
[14]

EMMA: EM-based anomaly detection for embedded systems,

N. Sehatbakhsh, B. B. Yilmaz, A. Zaji ´c, and M. Prvulovic, “EMMA: EM-based anomaly detection for embedded systems,” inProceedings of the 29th USENIX Security Symposium (USENIX Security ’20). USENIX Association, 2020, pp. 1245–1262

2020
[15]

Screaming channels: When electromagnetic side channels meet radio transceivers,

G. Camurati, S. Poeplau, M. Muench, T. Hayes, and A. Francillon, “Screaming channels: When electromagnetic side channels meet radio transceivers,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). ACM, 2018, pp. 163–177

2018
[16]

BlueScream: Screaming channels on bluetooth low energy,

P. Ayoub, R. Cayre, A. Francillon, and C. Maurice, “BlueScream: Screaming channels on bluetooth low energy,” inProceedings of the 40th Annual Computer Security Applications Conference (ACSAC ’24). ACM, 2024

2024
[17]

A practical methodology for measuring the side-channel signal available to the attacker for instruction-level events,

R. Callan, A. Zaji ´c, and M. Prvulovic, “A practical methodology for measuring the side-channel signal available to the attacker for instruction-level events,” inProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’14). IEEE, 2014, pp. 242–254

2014
[18]

Detecting cellphone camera status at distance by exploiting electromagnetic em- anations,

B. B. Yilmaz, E. E. Ugurlu, A. Zaji ´c, and M. Prvulovic, “Detecting cellphone camera status at distance by exploiting electromagnetic em- anations,” inProceedings of the 2019 IEEE Military Communications Conference (MILCOM). IEEE, 2019, pp. 1–6

2019
[19]

GraphRAG under fire: Exposing vulnerabilities of GraphRAG to targeted poisoning attacks,

J. Liang, Y . Wang, C. Li, and T. Wang, “GraphRAG under fire: Exposing vulnerabilities of GraphRAG to targeted poisoning attacks,” inProceedings of the 2026 IEEE Symposium on Security and Privacy (S&P ’26). IEEE, 2026

2026
[20]

ProvDetector: A provenance- based stealthy malware detection system,

Q. Wang, W. U. Hassan, A. Bateset al., “ProvDetector: A provenance- based stealthy malware detection system,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2020

2020
[21]

MAGIC: Detecting advanced persistent threats via masked graph representation learning,

Z. Zhenget al., “MAGIC: Detecting advanced persistent threats via masked graph representation learning,” in30th USENIX Security Sym- posium (USENIX Security 21), 2021

2021
[22]

Sleuth: Real-time attack scenario reconstruction from cots audit data,

M. N. Hossainet al., “Sleuth: Real-time attack scenario reconstruction from cots audit data,” in26th USENIX Security Symposium (USENIX Security 17), 2017

2017
[23]

Nodoze: Combatting threat alert fatigue with automated provenance triage,

W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2019

2019
[24]

Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise,

F. Liuet al., “Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019

2019
[25]

PrioTracker: Tuning ephemeral trace events for reliable threat detection,

Y . Liuet al., “PrioTracker: Tuning ephemeral trace events for reliable threat detection,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2021

2021
[26]

ACE: A Security Architecture for LLM-Integrated App Systems

Anonymous, “ACE: A security architecture for LLM-integrated app sys- tems,” inProceedings of the 2026 Network and Distributed System Secu- rity Symposium (NDSS ’26). Internet Society, 2026, arXiv:2504.20984

work page internal anchor Pith review arXiv 2026
[27]

SAGA: A security architecture for governing AI agentic systems, 2025

——, “SAGA: Governing AI agent security,” arXiv:2504.21034, 2025

work page arXiv 2025
[28]

StruQ: Defending against prompt injection with structured queries,

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending against prompt injection with structured queries,” inProceedings of the 34th USENIX Security Symposium (USENIX Security 25). USENIX Association, 2025

2025
[29]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,

E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tramèr, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inAdvances in Neu- ral Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, 2024

2024
[30]

Threatrace: Detecting and tracing host-based threats in node level through graph convolutional networks,

S. Wanget al., “Threatrace: Detecting and tracing host-based threats in node level through graph convolutional networks,” in31st USENIX Security Symposium (USENIX Security 22), 2022

2022
[31]

CausalIL: Causal graph learning for host-based intrusion detection,

Y . Chenet al., “CausalIL: Causal graph learning for host-based intrusion detection,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2023

2023
[32]

Tactical provenance analysis for endpoint detection and response systems,

W. U. Hassanet al., “Tactical provenance analysis for endpoint detection and response systems,” in2020 IEEE Symposium on Security and Privacy (SP), 2020

2020
[33]

Poirot: Aligning attack behavior with threat intel- ligence,

Z. Zhenget al., “Poirot: Aligning attack behavior with threat intel- ligence,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019

2019
[34]

Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” inProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec ’23). ACM, 2023, pp. 79–90

2023
[35]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv preprint arXiv:2211.09527, 2022

work page internal anchor Pith review arXiv 2022
[36]

Formalizing and detecting indirect prompt injection attacks,

J. Liuet al., “Formalizing and detecting indirect prompt injection attacks,” inProceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2024

2024
[37]

Poisoning retrieval-augmented generation for large lan- guage models,

W. Zouet al., “Poisoning retrieval-augmented generation for large lan- guage models,” in33rd USENIX Security Symposium (USENIX Security 24), 2024

2024
[38]

Agent smith: A single image can hijack your au- tonomous agent,

G. Chenet al., “Agent smith: A single image can hijack your au- tonomous agent,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[39]

JBo- mAudit: Assessing the landscape, compliance, and security implications of Java SBOMs,

Y . Xiao, D. Kirat, D. L. Schales, J. Jang, L. Xing, and X. Liao, “JBo- mAudit: Assessing the landscape, compliance, and security implications of Java SBOMs,” inProceedings of the Network and Distributed System Security Symposium (NDSS 25). Internet Society, 2025

2025
[40]

Wattsupdoc: Power side channels to nonintrusively discover untargeted malware on embedded medical devices,

S. S. Clarket al., “Wattsupdoc: Power side channels to nonintrusively discover untargeted malware on embedded medical devices,” inUSENIX Workshop on Health Information Technologies (HealthTech), 2013

2013
[41]

Hardfails: Insights into software-exploitable hard- ware bugs,

G. Dessoukyet al., “Hardfails: Insights into software-exploitable hard- ware bugs,” in28th USENIX Security Symposium (USENIX Security 19), 2019

2019
[42]

RefleXnoop: Passwords snoop- ing on NLoS laptops leveraging screen-induced sound reflection,

P. Wang, J. Hu, C. Liu, and J. Luo, “RefleXnoop: Passwords snoop- ing on NLoS laptops leveraging screen-induced sound reflection,” in Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS ’24). ACM, 2024. APPENDIX This appendix supports the robustness discussion in §VI-D. It records a deliberately difficult stress camp...

work page arXiv 2024