G- Eval: NLG evaluation using GPT-4 with better human alignment

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

VERDI derives three structural confidence signals from decomposed LLM verification traces and calibrates them with Platt-scaled logistic regression to achieve AUROC 0.72-0.91 on GPT models and 0.56-0.70 on Qwen models where log-probabilities fail.

citing papers explorer

Showing 1 of 1 citing paper.

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference cs.LG · 2026-05-11 · unverdicted · none · ref 5
VERDI derives three structural confidence signals from decomposed LLM verification traces and calibrates them with Platt-scaled logistic regression to achieve AUROC 0.72-0.91 on GPT models and 0.56-0.70 on Qwen models where log-probabilities fail.

G- Eval: NLG evaluation using GPT-4 with better human alignment

fields

years

verdicts

representative citing papers

citing papers explorer