LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.
This paper introduces Inter-Prompt Reliability (IPR), a framework for evaluating the stability of LLM outputs across semantically equivalent but linguistically varied prompts
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CY 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling
LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.