OmniScore is a family of lightweight deterministic learned metrics that approximate LLM-judge behavior for reliable multilingual evaluation of generative text in tasks such as QA, translation, and summarization.
Time to impeach LLM -as-a-judge: Programs are the future of evaluation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it