Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
Human Perception of LLM- generated Text Content in Social Media Environments
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Experiments on 250 participants show LLM-assisted survey responses range from under 10% on Prolific to over 80% on Mechanical Turk, with identifiable characteristics and partial mitigation effects.
citing papers explorer
-
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
-
A Penny for Your Prompts: Experiments Detecting and Mitigating LLM Usage by Survey Respondents
Experiments on 250 participants show LLM-assisted survey responses range from under 10% on Prolific to over 80% on Mechanical Turk, with identifiable characteristics and partial mitigation effects.