Time to impeach LLM -as-a-judge: Programs are the future of evaluation

Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala · 2025 · arXiv 2506.10403

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

cs.CL · 2026-04-06 · unverdicted · novelty 6.0

OmniScore is a family of lightweight deterministic learned metrics that approximate LLM-judge behavior for reliable multilingual evaluation of generative text in tasks such as QA, translation, and summarization.

Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation

cs.CL · 2026-04-06

citing papers explorer

Showing 2 of 2 citing papers.

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation cs.CL · 2026-04-06 · unverdicted · none · ref 3
OmniScore is a family of lightweight deterministic learned metrics that approximate LLM-judge behavior for reliable multilingual evaluation of generative text in tasks such as QA, translation, and summarization.
Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation cs.CL · 2026-04-06 · unreviewed · ref 13

Time to impeach LLM -as-a-judge: Programs are the future of evaluation

fields

years

verdicts

representative citing papers

citing papers explorer