Reflexive annotating elicits intersectional and positional metadata from crowd workers to make AI alignment annotations more situated and less assumed-neutral.
Personalisation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.
A tradeoff model shows generative AI can reduce bias against diverse preferences by strategically eliciting information instead of always inferring from majority patterns.
citing papers explorer
-
"Label from Somewhere": Reflexive Annotating for Situated AI Alignment
Reflexive annotating elicits intersectional and positional metadata from crowd workers to make AI alignment annotations more situated and less assumed-neutral.
-
Simple synthetic data reduces sycophancy in large language models
Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.
-
When to Ask a Question: Understanding Communication Strategies in Generative AI Tools
A tradeoff model shows generative AI can reduce bias against diverse preferences by strategically eliciting information instead of always inferring from majority patterns.