A three-layer framework combining input filtering, provenance hierarchy, and output auditing reduces prompt injection attack success rate in RAG chatbots from 71.4% to 11.3% on 5,080 samples across three models.
Do llms know they are being tested? evaluation awareness and incentive-sensitive failures in gpt-oss-20b,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots
A three-layer framework combining input filtering, provenance hierarchy, and output auditing reduces prompt injection attack success rate in RAG chatbots from 71.4% to 11.3% on 5,080 samples across three models.