arxiv: 2604.03237 · v1 · submitted 2026-01-31 · 💻 cs.HC · cs.AI· cs.CL

Recognition: no theorem link

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Ruth Cohen , Lu Feng , Ayala Bloch , Sarit Kraus

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:43 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CL

keywords human-AI collaborationLLM explanationstrust calibrationpersuasion paradoxtask performanceerror recoveryvisual reasoninglogical reasoning

0 comments

The pith

LLM explanations increase user confidence and reliance but fail to improve accuracy in visual tasks and can suppress error recovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies a Persuasion Paradox where fluent LLM explanations raise users' confidence in AI predictions without producing corresponding gains in human-AI team accuracy. In controlled studies with RAVEN visual reasoning matrices, explanations left accuracy unchanged or lower than predictions alone and reduced users' ability to override model mistakes. In LSAT logical reasoning, explanations improved accuracy and recovery rates over both bare predictions and expert-written alternatives. The work shows that standard subjective measures of trust and clarity do not track objective performance and that uncertainty displays or selective deferral policies outperform explanation interfaces on visual tasks.

Core claim

Fluent natural-language explanations from LLMs systematically increase user confidence and reliance on AI outputs without reliably improving, and sometimes undermining, objective task accuracy. The effect is strongly task-dependent: explanations add no value beyond the prediction itself in abstract visual reasoning and actively impair error recovery, whereas they produce the highest accuracy and recovery in language-based deductive reasoning. Interfaces that surface model uncertainty through probabilities or that defer uncertain cases to humans achieve better team performance than explanation-based designs.

What carries the argument

The Persuasion Paradox, isolated by a multi-stage reveal protocol that separately presents AI predictions and then explanations across between-subjects conditions in RAVEN and LSAT tasks.

If this is right

Subjective metrics of trust and perceived clarity are unreliable indicators of actual human-AI team performance.
Explanation effectiveness depends on cognitive modality, helping language-based reasoning but not visual pattern tasks.
Probability displays and selective automation policies deliver higher accuracy and better error recovery than explanation interfaces in visual domains.
Designs should prioritize calibrated reliance and error recovery over fluent narrative support.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interfaces may need to withhold explanations on low-certainty visual tasks while supplying them selectively on language tasks.
The same persuasion mechanism could appear in other high-stakes settings such as legal analysis or clinical decision support.
Hybrid interfaces that combine uncertainty signals with optional explanations merit direct comparison against pure explanation designs.

Load-bearing premise

The chosen visual matrix and logical deduction tasks plus the staged reveal method capture the essential dynamics of human-AI collaboration across domains.

What would settle it

A replication on additional tasks such as medical image diagnosis or contract review in which explanation interfaces produce measurably higher team accuracy and error recovery than probability-only or deferral interfaces.

Figures

Figures reproduced from arXiv: 2604.03237 by Ayala Bloch, Lu Feng, Ruth Cohen, Sarit Kraus.

**Figure 2.** Figure 2: Overall accuracy and self-reported confidence across the three reveal stages in the multi-stage RAVEN [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Relationship between agreement with correct AI predictions and recovery from incorrect AI predictions [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Objective accuracy across human–AI support conditions in the RAVEN task, including a derived [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrative user interface examples from the LSAT study. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Relationship between agreement with correct AI predictions and recovery from incorrect AI predictions [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Objective accuracy across human–AI support conditions in the LSAT task. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy beyond the AI prediction alone, and substantially suppress users' ability to recover from model errors. Interfaces exposing model uncertainty via predicted probabilities, as well as a selective automation policy that defers uncertain cases to humans, achieve significantly higher accuracy and error recovery than explanation-based interfaces. In contrast, for language-based logical reasoning tasks, LLM explanations yield the highest accuracy and recovery rates, outperforming both expert-written explanations and probability-based support. This divergence reveals that the effectiveness of narrative explanations is strongly task-dependent and mediated by cognitive modality. Our findings demonstrate that commonly used subjective metrics such as trust, confidence, and perceived clarity are poor predictors of human-AI team performance. Rather than treating explanations as a universal solution, we argue for a shift toward interaction designs that prioritize calibrated reliance and effective error recovery over persuasive fluency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM explanations raise confidence without lifting accuracy in visual tasks and suppress error recovery there, while helping in logical tasks, though the staged reveal leaves an anchoring confound possible.

read the letter

The main finding is a clear task split: in RAVEN visual reasoning, fluent LLM explanations increase user reliance but add nothing to accuracy over the raw prediction and reduce recovery from model errors, while probability displays and selective automation perform better on objective measures. In LSAT logical problems the pattern reverses and explanations produce the highest accuracy and recovery rates, beating both the AI alone and expert-written text. That modality-specific reversal is the piece that was not already in the cited literature. The between-subjects design with staged reveal is a straightforward way to separate the prediction from the added narrative, and the result that trust and clarity ratings do not track team accuracy is worth having on record. The baselines give the comparisons some external anchor rather than just pitting explanations against nothing. The soft spot is the one the stress-test note flags. Showing the AI prediction first then the explanation can let users anchor on the initial answer before the text arrives, which could produce the observed drop in error recovery in the visual condition without the explanation content itself being the cause. A simultaneous-presentation arm would have closed that gap. Sample sizes, exact statistical tests, and exclusion rules are not visible in the abstract, so those details need checking in the full text before the reversal can be treated as settled. The two tasks are narrow, so claims about broader human-AI settings will need more domains. This is for HCI and human-AI team researchers who design interfaces where error recovery matters more than perceived trust. It is worth sending for peer review because the empirical pattern is sharp enough and the methods are controlled enough to justify referee time, even if the order-effect issue requires a revision round.

Referee Report

2 major / 2 minor

Summary. The manuscript reports three controlled human-subject experiments on RAVEN visual reasoning matrices and LSAT logical reasoning problems. Using a multi-stage reveal design (AI prediction shown first, followed by explanation) and between-subjects conditions, the authors identify a 'Persuasion Paradox': LLM explanations increase user confidence and reliance on AI predictions but do not improve (and in visual tasks can undermine) objective accuracy and error recovery. Uncertainty displays and selective automation outperform explanation interfaces in visual tasks, while explanations help in logical tasks. The work concludes that subjective metrics like trust and clarity are poor predictors of team performance and advocates task-specific designs prioritizing calibrated reliance.

Significance. If the results hold, the findings challenge the widespread assumption that natural-language explanations are a universal remedy for improving human-AI collaboration. The demonstration of modality-dependent effects (visual vs. language-based reasoning) and the comparative advantage of uncertainty-based interfaces over fluent explanations provide concrete, actionable guidance for HCI system design. The multi-study empirical approach, with its focus on objective accuracy and error recovery rather than subjective ratings, strengthens the contribution if the design successfully isolates explanation effects.

major comments (2)

[§4 (Experimental Design, multi-stage reveal)] §4 (Experimental Design, multi-stage reveal): The between-subjects comparison does not include a simultaneous-presentation control arm. In the RAVEN condition, the reported suppression of error recovery may arise from users anchoring on the initial AI prediction before the explanation arrives, rather than from the narrative content of the explanation itself. This order effect is load-bearing for the central Persuasion Paradox claim and requires either an additional control condition or explicit discussion of why anchoring cannot account for the accuracy differences.
[§5 (Results)] §5 (Results): The manuscript does not report sample sizes per condition, statistical power calculations, or exclusion criteria in sufficient detail to verify that the accuracy and recovery-rate differences (e.g., between explanation and probability conditions) are robust. These details are essential for assessing whether the task-dependent divergence is reliably supported by the data.

minor comments (2)

[Figures] Figure captions and axis labels should explicitly state whether error bars represent standard error, 95% CI, or another measure, and whether accuracy is reported as percentage or proportion.
[§2 (Related Work)] The discussion of prior work on anchoring and order effects in decision-making (e.g., Tversky & Kahneman) is underdeveloped relative to the design concern raised above.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the constructive feedback on our manuscript. The comments have helped us strengthen the presentation of our experimental design and results. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: §4 (Experimental Design, multi-stage reveal): The between-subjects comparison does not include a simultaneous-presentation control arm. In the RAVEN condition, the reported suppression of error recovery may arise from users anchoring on the initial AI prediction before the explanation arrives, rather than from the narrative content of the explanation itself. This order effect is load-bearing for the central Persuasion Paradox claim and requires either an additional control condition or explicit discussion of why anchoring cannot account for the accuracy differences.

Authors: We thank the referee for raising this important point about potential anchoring effects in our multi-stage reveal design. All conditions in our experiments, including the AI prediction-only baseline, probability display, and selective automation, follow the same multi-stage presentation where the AI prediction is shown first. This allows us to isolate the incremental effect of adding explanations versus uncertainty information. The task-dependent pattern—explanations harming performance in visual tasks but helping in logical tasks—suggests that the effect is not solely due to anchoring, as anchoring would be expected to operate similarly across tasks. Nevertheless, we acknowledge that a simultaneous presentation control would provide stronger evidence. We have added explicit discussion in the revised §4 and §6 (Limitations) explaining why we believe anchoring does not fully account for the results, and we have included this as a suggested direction for future research. revision: partial
Referee: §5 (Results): The manuscript does not report sample sizes per condition, statistical power calculations, or exclusion criteria in sufficient detail to verify that the accuracy and recovery-rate differences (e.g., between explanation and probability conditions) are robust. These details are essential for assessing whether the task-dependent divergence is reliably supported by the data.

Authors: We apologize for the omission of these details in the original submission. In the revised manuscript, we have expanded §5 (Results) and the supplementary materials to report: (1) exact sample sizes per condition (e.g., n=48 for RAVEN explanation condition, n=50 for probability, etc.), (2) post-hoc power analyses showing power > 0.85 for the key accuracy and recovery differences, and (3) detailed exclusion criteria (participants failing attention checks or completing the task in under 5 minutes were excluded, resulting in X% exclusion rate). These additions confirm that the reported differences are statistically robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical results from independent human-subject experiments

full rationale

The paper presents findings from three controlled human-subject studies on RAVEN matrices and LSAT problems using multi-stage reveal designs and between-subjects conditions. No mathematical derivations, equations, fitted parameters, or self-referential definitions exist that would reduce any claim to its inputs by construction. The Persuasion Paradox is reported as a direct empirical observation from measured accuracy, confidence, and error recovery rates. Any self-citations are incidental and not load-bearing for the central results, which rely on external experimental data rather than internal redefinition or renaming of known patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical human-subjects study; it introduces no free parameters, mathematical axioms, or invented entities.

pith-pipeline@v0.9.0 · 5567 in / 1074 out tokens · 27581 ms · 2026-05-16T08:43:07.469381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Baddeley, A.The episodic buffer: a new component of working memory?Trends in cognitive sciences 4, 11 (2000), , Vol. 1, No. 1, Article . Publication date: April 2026. The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance 11 417–423

work page 2000
[2]

T., and Weld, D.Does the whole exceed its parts? The effect of AI explanations on complementary team performance

Bansal, G., Wu, T., Zhou, J., Fok, R., Nushi, B., Kamar, E., Ribeiro, M. T., and Weld, D.Does the whole exceed its parts? The effect of AI explanations on complementary team performance. InProceedings of the 2021 CHI conference on human factors in computing systems(2021), pp. 1–16

work page 2021
[3]

Z., and Glassman, E

Buçinca, Z., Lin, P., Gajos, K. Z., and Glassman, E. L.Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. InProceedings of the 25th international conference on intelligent user interfaces (2020), pp. 454–464

work page 2020
[4]

A., Just, M

Carpenter, P. A., Just, M. A., and Shell, P.What one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test.Psychological review 97, 3 (1990), 404

work page 1990
[5]

L.Automation bias in intelligent time critical decision support systems

Cummings, M. L.Automation bias in intelligent time critical decision support systems. InDecision making in aviation. Routledge, 2017, pp. 289–294

work page 2017
[6]

Metrics for Explainable AI: Challenges and Prospects

Hoffman, R. R., Mueller, S. T., Klein, G., and Litman, J.Metrics for explainable AI: Challenges and prospects.arXiv preprint arXiv:1812.04608(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

E., and Haier, R

Jung, R. E., and Haier, R. J.The parieto-frontal integration theory (p-fit) of intelligence: converging neuroimaging evidence.Behavioral and brain sciences 30, 2 (2007), 135–154

work page 2007
[8]

V., Gruen, D., and Miller, S.Questioning the AI: informing design practices for explainable AI user experiences

Liao, Q. V., Gruen, D., and Miller, S.Questioning the AI: informing design practices for explainable AI user experiences. InProceedings of the 2020 CHI conference on human factors in computing systems(2020), pp. 1–15

work page 2020
[9]

M., and Lee, S.-I.A unified approach to interpreting model predictions.Advances in neural information processing systems 30(2017)

Lundberg, S. M., and Lee, S.-I.A unified approach to interpreting model predictions.Advances in neural information processing systems 30(2017)

work page 2017
[10]

Advances in neural information processing systems 31(2018)

Madras, D., Pitassi, T., and Zemel, R.Predict responsibly: improving fairness and accuracy by learning to defer. Advances in neural information processing systems 31(2018)

work page 2018
[11]

Miller, T.Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence 267(2019), 1–38

work page 2019
[12]

InInternational conference on machine learning(2020), PMLR, pp

Mozannar, H., and Sontag, D.Consistent estimators for learning to defer to an expert. InInternational conference on machine learning(2020), PMLR, pp. 7076–7087

work page 2020
[13]

Naveed, S., Stevens, G., and Robin-Kern, D.An overview of the empirical evaluation of explainable AI: A compre- hensive guideline for user-centered evaluation in XAI.Applied Sciences 14, 23 (2024), 11288

work page 2024
[14]

Parasuraman, R., and Riley, V.Humans and automation: Use, misuse, disuse, abuse.Human factors 39, 2 (1997), 230–253

work page 1997
[15]

InProceedings of the 2022 CHI conference on human factors in computing systems(2022), pp

Rechkemmer, A., and Yin, M.When confidence meets accuracy: Exploring the effects of multiple performance indicators on trust in machine learning models. InProceedings of the 2022 CHI conference on human factors in computing systems(2022), pp. 1–14

work page 2022
[16]

Why should I trust you?

Ribeiro, M. T., Singh, S., and Guestrin, C.“Why should I trust you?” Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining(2016), pp. 1135–1144

work page 2016
[17]

Spitzer, P., Holstein, J., Morrison, K., Holstein, K., Satzger, G., and Kühl, N.Don’t be fooled: The misinformation effect of explanations in human–AI collaboration.International Journal of Human–Computer Interaction(2025), 1–29

work page 2025
[18]

Steyvers, M., Tejeda, H., Kerrigan, G., and Smyth, P.Bayesian modeling of human–ai complementarity.Proceedings of the National Academy of Sciences 119, 11 (2022), e2111547119

work page 2022
[19]

H., Y an, X., Xu, Z., W ang, B., and Marasović, A.Teaching people LLM’s errors and getting it right.arXiv preprint arXiv:2512.21422(2025)

Stringham, N., Chaleshtori, F. H., Y an, X., Xu, Z., W ang, B., and Marasović, A.Teaching people LLM’s errors and getting it right.arXiv preprint arXiv:2512.21422(2025). [20]Sweller, J.Cognitive load during problem solving: Effects on learning.Cognitive science 12, 2 (1988), 257–285

work page arXiv 2025
[20]

InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(2024), pp

Vereschak, O., Alizadeh, F., Bailly, G., and Caramiaux, B.Trust in AI-assisted decision making: Perspectives from those behind the system and those for whom the decision is made. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(2024), pp. 1–14

work page 2024
[21]

Yu, W., Jiang, Z., Dong, Y., and Feng, J.Reclor: A reading comprehension dataset requiring logical reasoning.arXiv preprint arXiv:2002.04326(2020)

work page arXiv 2002
[22]

D., and Fergus, R.Visualizing and understanding convolutional networks

Zeiler, M. D., and Fergus, R.Visualizing and understanding convolutional networks. InEuropean conference on computer vision(2014), Springer, pp. 818–833

work page 2014
[23]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(2019), pp

Zhang, C., Gao, F., Jia, B., Zhu, Y., and Zhu, S.-C.Raven: A dataset for relational and analogical visual reasoning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(2019), pp. 5317–5327. , Vol. 1, No. 1, Article . Publication date: April 2026

work page 2019