ML-Bench is a multilingual safety benchmark derived from actual regional laws and regulations, paired with ML-Guard guardrail models that outperform 11 baselines on existing and new benchmarks.
InThe Twelfth International Con- ference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
RCS learns projections on LVLM internal representations to produce contrastive scores that separate malicious jailbreaks from benign inputs, with MCD and KCD variants claiming SOTA generalization to unseen attacks.
citing papers explorer
-
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
ML-Bench is a multilingual safety benchmark derived from actual regional laws and regulations, paired with ML-Guard guardrail models that outperform 11 baselines on existing and new benchmarks.
-
Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring
RCS learns projections on LVLM internal representations to produce contrastive scores that separate malicious jailbreaks from benign inputs, with MCD and KCD variants claiming SOTA generalization to unseen attacks.