pith. sign in

Reliable and

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 6

clear filters

representative citing papers

Why Do Safety Guardrails Degrade Across Languages?

cs.CL · 2026-05-16 · conditional · novelty 6.0

A latent variable IRT framework decouples four safety-driving factors across 61 model configurations and 10 languages using 1.9 million evaluations, revealing that safety is largely unidimensional and that high cross-lingual gaps cluster in physical harm prompts and lower-resource languages.

citing papers explorer

Showing 6 of 6 citing papers after filters.