Multi-turn jailbreak attacks on medical AI increase unsafe responses from 35% to 80% by turn 4, expose 19x model gaps invisible in single-turn tests, and a lightweight classifier reduces unsafe outputs by 52 points at the cost of 45% false alarms on benign queries.
A Novel Evaluation Benchmark for Medical LLM s Illuminating Safety and Effectiveness in Clinical Domains
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Survey of RLM adoption in 28 disciplines reveals maturity disparities via a new assessment framework, with focus on development, evaluation, and public resources.
citing papers explorer
-
MultiTurnPSB: Evaluating Multi-Turn Jailbreak Attacks an dClassifier-Based Defenses for Medical AI Safety
Multi-turn jailbreak attacks on medical AI increase unsafe responses from 35% to 80% by turn 4, expose 19x model gaps invisible in single-turn tests, and a lightweight classifier reduces unsafe outputs by 52 points at the cost of 45% false alarms on benign queries.
-
Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches
Survey of RLM adoption in 28 disciplines reveals maturity disparities via a new assessment framework, with focus on development, evaluation, and public resources.