pith. sign in

Pavel Dolin, Weizhi Li, Gautam Dasarathy, and Visar Berisha

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

representative citing papers

Why Do Safety Guardrails Degrade Across Languages?

cs.CL · 2026-05-16 · conditional · novelty 6.0

A latent variable IRT framework decouples four safety-driving factors across 61 model configurations and 10 languages using 1.9 million evaluations, revealing that safety is largely unidimensional and that high cross-lingual gaps cluster in physical harm prompts and lower-resource languages.

citing papers explorer

Showing 3 of 3 citing papers.