RLHF preference measurement is a social science validity problem because annotators routinely produce non-attitudes, constructed responses, and artifacts rather than stable values.
prioritize accuracy over fluency
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.HC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Measuring Human Preferences in RLHF is a Social Science Problem
RLHF preference measurement is a social science validity problem because annotators routinely produce non-attitudes, constructed responses, and artifacts rather than stable values.