A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
A Closer Look at System Prompt Robustness
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
LLMs display political plasticity via prompt-driven ideological adaptation that is more reliable in larger newer models, but inverted questions produce counterintuitive shifts suggesting data leakage.
KV cache compression causes certain instructions to degrade rapidly and be ignored in multi-instruction prompting, with system prompt leakage worsened by method choice, instruction order, and eviction bias; simple policy changes can mitigate this.
citing papers explorer
-
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
-
A Study of LLMs' Preferences for Libraries and Programming Languages
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
-
Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models
LLMs display political plasticity via prompt-driven ideological adaptation that is more reliable in larger newer models, but inverted questions produce counterintuitive shifts suggesting data leakage.
-
The Pitfalls of KV Cache Compression
KV cache compression causes certain instructions to degrade rapidly and be ignored in multi-instruction prompting, with system prompt leakage worsened by method choice, instruction order, and eviction bias; simple policy changes can mitigate this.