Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Flattery, fluff, and fog: Diagnosing and mitigating idiosyncratic biases in preference models
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
representative citing papers
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
citing papers explorer
-
User Detection and Response Patterns of Sycophantic Behavior in Conversational AI
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.