A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.
Can we trust AI benchmarks? an interdisciplinary review of current issues in AI evaluation.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1):850–864, Oct
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
An Interpretable and Scalable Framework for Evaluating Large Language Models
A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.