LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.
The “problem” of human label variation: On ground truth in data, modeling and evaluation
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
A framework treating clinician overrides as implicit preferences to jointly train reward and capability models for clinical AI, with a taxonomy and alternating optimization to prevent suppression bias.
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.
A statistical framework decomposes human annotation outcomes into four interpretable variation sources and extends classical measurement-error models to handle both shared and individualized notions of truth.
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
citing papers explorer
-
Implicit Humanization in Everyday LLM Moral Judgments
LLM responses to moral judgment queries reinforce implicit humanization, potentially exacerbating overreliance and misplaced trust.
-
Understanding Annotator Safety Policy with Interpretability
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
-
Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care
A framework treating clinician overrides as implicit preferences to jointly train reward and capability models for clinical AI, with a taxonomy and alternating optimization to prevent suppression bias.
-
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
-
Who Watches the Watchmen? Humans Disagree With Translation Metrics on Unseen Domains
Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.
-
From Ground Truth to Measurement: A Statistical Framework for Human Labeling
A statistical framework decomposes human annotation outcomes into four interpretable variation sources and extends classical measurement-error models to handle both shared and individualized notions of truth.
-
Calibrating Model-Based Evaluation Metrics for Summarization
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.