pith. sign in

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization.arXiv preprint arXiv:2306.05087, 2023

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

method 2 background 1

citation-polarity summary

representative citing papers

Large Language Models are not Fair Evaluators

cs.CL · 2023-05-29 · conditional · novelty 6.0

LLMs show strong position bias when scoring model outputs, allowing easy manipulation of rankings, but calibration with multiple evidence, position balancing, and selective human input reduces this bias to better match human judgments.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

citing papers explorer

Showing 8 of 8 citing papers.