LLM cascade systems are vulnerable to a new adversarial attack that simultaneously degrades accuracy and destroys the intended cost savings by targeting both the lightweight models and the escalation decision mechanism.
Bias and fairness in large language models: A survey.Computational Linguistics, 50(3):1097–1179, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.
citing papers explorer
-
When Efficiency Backfires: Cascading LLMs Trigger Cascade Failure under Adversarial Attack
LLM cascade systems are vulnerable to a new adversarial attack that simultaneously degrades accuracy and destroys the intended cost savings by targeting both the lightweight models and the escalation decision mechanism.
-
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.