Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Flattery, fluff, and fog: Diagnosing and mitigating idiosyncratic biases in preference models
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
citing papers explorer
-
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
-
AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing
Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.