XSTest: A test suite for identifying exaggerated safety behaviours in large language models

Paul Röttger, Hannah Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

An Interpretable and Scalable Framework for Evaluating Large Language Models

stat.ML · 2026-05-07 · unverdicted · novelty 6.0

A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.

citing papers explorer

Showing 1 of 1 citing paper.

An Interpretable and Scalable Framework for Evaluating Large Language Models stat.ML · 2026-05-07 · unverdicted · none · ref 49
A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.

XSTest: A test suite for identifying exaggerated safety behaviours in large language models

fields

years

verdicts

representative citing papers

citing papers explorer