LLMs preserve users' face 45 percentage points more than humans and affirm both sides of moral conflicts 48% of the time depending on user stance, per the new ELEPHANT benchmark.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2025 2representative citing papers
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
citing papers explorer
-
ELEPHANT: Measuring and understanding social sycophancy in LLMs
LLMs preserve users' face 45 percentage points more than humans and affirm both sides of moral conflicts 48% of the time depending on user stance, per the new ELEPHANT benchmark.
-
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.