Flattery, fluff, and fog: Diagnosing and mitigating idiosyncratic biases in preference models

Flattery, Fluff, Fog: Diagnosing · 2020 · arXiv 2506.05339

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.

AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

Analysis of news text in 34 languages shows cross-lingual convergence on AI-associated lemmas and increased prevalence of top AI-overused items after ChatGPT's release.

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI

cs.HC · 2026-01-15 · unverdicted · novelty 5.0

Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16

citing papers explorer

Showing 1 of 1 citing paper after filters.

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI cs.HC · 2026-01-15 · unverdicted · none · ref 31
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.

Flattery, fluff, and fog: Diagnosing and mitigating idiosyncratic biases in preference models

fields

years

verdicts

representative citing papers

citing papers explorer