Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Flattery, fluff, and fog: Diagnosing and mitigating idiosyncratic biases in preference models
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
citing papers explorer
-
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
-
AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.
-
User Detection and Response Patterns of Sycophantic Behavior in Conversational AI
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
- Reinforcement Learning from Human Feedback