Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

· 2024 · arXiv 2403.03792

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

cs.IR · 2024-09-16 · unverdicted · novelty 7.0

Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.

ACE: A Security Architecture for LLM-Integrated App Systems

cs.CR · 2025-04-29 · unverdicted · novelty 6.0

ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.

Evaluation of Prompt Injection Defenses in Large Language Models

cs.CR · 2026-04-26 · unverdicted · novelty 5.0 · 2 refs

Only output filtering with hardcoded rules in application code prevented prompt injection leaks in LLMs, as all model-based defenses were defeated by an adaptive attacker.

citing papers explorer

Showing 4 of 4 citing papers.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents cs.CR · 2024-06-19 · unverdicted · none · ref 42
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey cs.IR · 2024-09-16 · unverdicted · none · ref 81
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
ACE: A Security Architecture for LLM-Integrated App Systems cs.CR · 2025-04-29 · unverdicted · none · ref 27
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
Evaluation of Prompt Injection Defenses in Large Language Models cs.CR · 2026-04-26 · unverdicted · none · ref 14 · 2 links
Only output filtering with hardcoded rules in application code prevented prompt injection leaks in LLMs, as all model-based defenses were defeated by an adaptive attacker.

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer