pith. sign in

arXiv preprint arXiv:2303.13809 , year=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CL 3

representative citing papers

Large Language Models are not Fair Evaluators

cs.CL · 2023-05-29 · conditional · novelty 6.0

LLMs show strong position bias when scoring model outputs, allowing easy manipulation of rankings, but calibration with multiple evidence, position balancing, and selective human input reduces this bias to better match human judgments.

citing papers explorer

Showing 3 of 3 citing papers.

  • The Prompt Report: A Systematic Survey of Prompt Engineering Techniques cs.CL · 2024-06-06 · accept · none · ref 14

    This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.

  • Large Language Models are not Fair Evaluators cs.CL · 2023-05-29 · conditional · none · ref 75

    LLMs show strong position bias when scoring model outputs, allowing easy manipulation of rankings, but calibration with multiple evidence, position balancing, and selective human input reduces this bias to better match human judgments.

  • Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 24

    A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.