LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
Is Safety Standard Same for Everyone? User -Specific Safety Evaluation of Large Language Models,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.
citing papers explorer
-
Beyond Context: Large Language Models' Failure to Grasp Users' Intent
LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
-
LLM Harms: A Taxonomy and Discussion
This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.