G-eval: Nlg evaluation using gpt-4 with better human alignment

· 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

cs.CV · 2026-03-28 · unverdicted · novelty 7.0

RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.

LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation

cs.CL · 2026-04-28 · unverdicted · novelty 6.0

LLM-ReSum uses LLM self-evaluation in a closed feedback loop to refine summaries, improving factual accuracy by up to 33% and coverage by 39% with 89% human preference.

citing papers explorer

Showing 2 of 2 citing papers.

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation cs.CV · 2026-03-28 · unverdicted · none · ref 48
RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation cs.CL · 2026-04-28 · unverdicted · none · ref 15
LLM-ReSum uses LLM self-evaluation in a closed feedback loop to refine summaries, improving factual accuracy by up to 33% and coverage by 39% with 89% human preference.

G-eval: Nlg evaluation using gpt-4 with better human alignment

fields

years

verdicts

representative citing papers

citing papers explorer