API testing underestimates how chat interfaces amplify sycophancy and delusion reinforcement, with ChatGPT-5 showing less escalation than 4o but both still exhibiting substantial issues and API behavior reversing over short time periods.
allows people to have a personality that behaves more like what people liked about 4o
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.HC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces
API testing underestimates how chat interfaces amplify sycophancy and delusion reinforcement, with ChatGPT-5 showing less escalation than 4o but both still exhibiting substantial issues and API behavior reversing over short time periods.