Fine-tuned ModernBERT-family encoders match LLM judges on F1, false negative rate, and precision-recall for harmful output detection across adversarial datasets and attack types while promising lower cost and latency.
arXiv:2504.00441 (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.
citing papers explorer
-
Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation
Fine-tuned ModernBERT-family encoders match LLM judges on F1, false negative rate, and precision-recall for harmful output detection across adversarial datasets and attack types while promising lower cost and latency.
-
Prompt Governance? On Governing Technologies Governed by Natural Language
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.