pith. sign in

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we study two connected questions: how to monitor the quality of human preference annotators and how to incentivize them to provide high-quality annotations. In current practice, expert-based monitoring is a natural workhorse for quality control, but it performs poorly in preference annotation because annotators are heterogeneous and downstream model performance is an indirect and noisy proxy for annotation quality. We therefore propose a self-consistency monitoring scheme tailored to preference annotation, and analyze the statistical sample complexity of both methods. This practitioner-facing analysis identifies how many inspected samples are needed to reliably assess an annotator and shows when self-consistency monitoring can outperform expert-based monitoring. We then use the resulting monitoring signal as the performance measure in a principal-agent model, which lets us study a second sample-complexity question: how many monitored samples are needed before simple contracts perform close to the ideal benchmark in which annotation quality is perfectly observable. Under this continuous action space, we show that this shortfall scales as $\Theta(1/\sqrt{\mathcal{I} n \log n})$ for binary contracts and $\Theta(1/(\mathcal{I}n))$ for linear contracts, where $\mathcal{I}$ is the Fisher information and $n$ is the number of samples; we further show that the linear contracts are rate-optimal among general contracts. This contrasts with the known result that binary contracts are optimal and of $\exp(-\Theta(n))$ when the action space is discrete \citep{frick2023monitoring}.

fields

cs.CL 1 cs.GT 1

years

2025 2

verdicts

UNVERDICTED 2

representative citing papers

Incentivizing High-Quality Human Annotations with Golden Questions

cs.GT · 2025-05-25 · unverdicted · novelty 7.0

The paper derives a Θ(1/√(n log n)) hypothesis testing rate under strategic annotator behavior and shows that high-certainty, format-similar golden questions better reveal annotation quality than standard checks.

citing papers explorer

Showing 2 of 2 citing papers.