Do llms know they are being tested? evaluation awareness and incentive-sensitive failures in gpt-oss-20b,

· 2025 · arXiv 2510.08624

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

cs.CR · 2026-06-17 · unverdicted · novelty 5.0

A three-layer framework combining input filtering, provenance hierarchy, and output auditing reduces prompt injection attack success rate in RAG chatbots from 71.4% to 11.3% on 5,080 samples across three models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots cs.CR · 2026-06-17 · unverdicted · none · ref 14
A three-layer framework combining input filtering, provenance hierarchy, and output auditing reduces prompt injection attack success rate in RAG chatbots from 71.4% to 11.3% on 5,080 samples across three models.

Do llms know they are being tested? evaluation awareness and incentive-sensitive failures in gpt-oss-20b,

fields

years

verdicts

representative citing papers

citing papers explorer