Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.
The “problem” of human label variation: On ground truth in data, modeling and evaluation
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.
Highlighting is largely social (crowd predicts salience better than personal history), but individuality appears strongly in which salient passages a person selects, driven by thematic preferences.
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
A framework treating clinician overrides as implicit preferences to jointly train reward and capability models for clinical AI, with a taxonomy and alternating optimization to prevent suppression bias.
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.
A statistical framework decomposes human annotation outcomes into four interpretable variation sources and extends classical measurement-error models to handle both shared and individualized notions of truth.
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
citing papers explorer
No citing papers match the current filters.