The paper derives a Θ(1/√(n log n)) hypothesis testing rate under strategic annotator behavior and shows that high-certainty, format-similar golden questions better reveal annotation quality than standard checks.
Cicero: A dataset for contextualized commonsense inference in dialogues
3 Pith papers cite this work. Polarity classification is still indexing.
years
2025 3verdicts
UNVERDICTED 3representative citing papers
Develops self-consistency monitoring for preference annotators and derives sample-complexity bounds showing linear contracts achieve near-ideal performance faster than binary ones under continuous actions.
Introduces a latent user quality model and EM algorithm to infer and filter noisy user-provided pairwise preferences for improved LLM alignment.
citing papers explorer
-
Incentivizing High-Quality Human Annotations with Golden Questions
The paper derives a Θ(1/√(n log n)) hypothesis testing rate under strategic annotator behavior and shows that high-certainty, format-similar golden questions better reveal annotation quality than standard checks.
-
How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators
Develops self-consistency monitoring for preference annotators and derives sample-complexity bounds showing linear contracts achieve near-ideal performance faster than binary ones under continuous actions.
-
Users as Annotators: LLM Preference Learning from Comparison Mode
Introduces a latent user quality model and EM algorithm to infer and filter noisy user-provided pairwise preferences for improved LLM alignment.