From Noise to Signal to Selbstzweck: Reframing Human Label Variation in the Era of Post-training in NLP

· 2025 · cs.CL · arXiv 2510.12817

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Human Label Variation (HLV) refers to legitimate disagreement in annotation that reflects the diversity of human perspectives rather than mere error. Long treated in NLP as noise to be eliminated, HLV has only recently been reframed as a signal for improving model robustness. With the rise of large language models (LLMs) and post-training methods such as human feedback-based alignment, the role of HLV has become increasingly consequential. Yet current preference-learning datasets routinely collapse multiple annotations into a single label, flattening diverse perspectives into artificial consensus. Preserving HLV is necessary not only for pluralistic alignment but also for sociotechnical safety evaluation, where model behavior must be assessed in relation to human interaction and societal context. This position paper argues that preserving HLV as an embodiment of human pluralism must be treated as a Selbstzweck, an intrinsic value in itself. We analyze the limitations of existing preference datasets and propose actionable strategies for incorporating HLV into dataset construction to better preserve pluralistic human values.

representative citing papers

LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

LP-Eval is a new expert-co-designed rubric and annotated dataset showing that LLMs mostly produce well-formed legal propositions from EU court decisions, with higher expert-rated quality for established cases and improved LLM-as-judge alignment when using the rubric.

citing papers explorer

Showing 1 of 1 citing paper.

LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation cs.CL · 2026-05-19 · unverdicted · none · ref 30 · internal anchor
LP-Eval is a new expert-co-designed rubric and annotated dataset showing that LLMs mostly produce well-formed legal propositions from EU court decisions, with higher expert-rated quality for established cases and improved LLM-as-judge alignment when using the rubric.

From Noise to Signal to Selbstzweck: Reframing Human Label Variation in the Era of Post-training in NLP

fields

years

verdicts

representative citing papers

citing papers explorer