Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

Daniela Fernandes; Daniel Buschek; Lev Tankelevitch; Robin Welsch; Thomas Kosch

arxiv: 2605.25856 · v1 · pith:CUMOVX5Tnew · submitted 2026-05-25 · 💻 cs.HC · cs.AI

Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

Daniela Fernandes , Daniel Buschek , Lev Tankelevitch , Thomas Kosch , Robin Welsch This is my paper

classification 💻 cs.HC cs.AI

keywords reasoningtracesperformancemodelbaselinealongsideansweranswer-only

0 comments

read the original abstract

Large Language Model interfaces are increasingly verbose, exposing intermediate reasoning traces alongside final answers. Traces are framed as transparency mechanisms, yet it is unclear how people use them to solve problems. We report a preregistered between-subjects study (N = 559) in which participants solved ten LSAT-style reasoning problems under one of three conditions: an Answer-only baseline, a Full-trace revealed before the answer, and a Summary-trace presented alongside the answer. Summaries preserved task performance at the no-trace baseline while significantly elevating trust and hedonic appeal, establishing that trace exposure shifts subjective appraisal of the interaction without bringing performance benefits. Under an open-weight reasoning model exposing verbose intermediate output, full traces additionally impaired performance relative to the answer-only baseline. Across all conditions, participants substantially overestimated their performance, and no trace format supported calibrated self-evaluation. Further analysis indicates that hedonic appeal, not trust, carries the indirect path to overestimation, consistent with a processing-fluency account. Reasoning traces are best understood as user-facing interface artifacts rather than transparent windows into model cognition, and calibration is unlikely to emerge from the traces themselves and may best be scaffolded by interactions that elicit users' own reasoning first.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When LLM Rationales Become User-Facing: Effects on Trust Perception, Decision-Making, and Gaze Behaviors
cs.HC 2026-06 unverdicted novelty 5.0

Two linked user studies find that LLM rationale correctness and certainty framing affect trust and decision confidence while presentation format does not, and incorrect rationales increase gaze attention and pupil size.