Humans incorrectly reject confident accusatory AI judgments

Bennett Kleinberg; Bruno Verschuere; Merylin Monaro; Pietro Pietrini; Riccardo Loconte

arxiv: 2512.02848 · v1 · submitted 2025-12-02 · 💻 cs.HC

Humans incorrectly reject confident accusatory AI judgments

Riccardo Loconte , Merylin Monaro , Pietro Pietrini , Bruno Verschuere , Bennett Kleinberg This is my paper

Pith reviewed 2026-05-17 02:10 UTC · model grok-4.3

classification 💻 cs.HC

keywords deception detectionAI judgmentshuman-AI interactionalgorithmic aversionprediction confidenceveracity judgmentsaccusatory predictions

0 comments

The pith

People reject confident AI accusations of deception even from accurate models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how people respond to AI predictions about whether statements are truthful or deceptive. It varies the AI's overall accuracy and the confidence level reported for each individual prediction. Participants adopted judgments from a high-accuracy AI more often than from a low-accuracy one. Yet they deviated more as the AI's stated confidence rose, especially when the prediction accused someone of lying. Human adjustments to these predictions either left performance unchanged or made it worse than the AI operating alone.

Core claim

Humans adopt AI deception-detection judgments more when the model has high overall accuracy, but they deviate from individual predictions as the model's confidence increases, especially for accusations of deception. Human interaction with the algorithmic outputs either reduces accuracy or produces no gain relative to the AI alone. This pattern is partly explained by participants overestimating human deception-detection ability.

What carries the argument

Experimental variation of AI overall accuracy (high versus low) and per-prediction confidence levels, followed by measurement of human agreement and combined human-AI accuracy.

If this is right

Humans adopt predictions from a high-accuracy AI model more often than from a low-accuracy model.
Higher stated confidence in a prediction reduces human acceptance, especially when the AI accuses deception.
Allowing humans to adjust AI deception judgments does not improve and can lower overall accuracy.
Overestimation of personal deception-detection skill contributes to rejection of confident AI accusations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI systems used for high-stakes accusations may need to lower displayed confidence or add context to reduce automatic rejection.
Similar patterns of rejecting confident algorithmic accusations could appear in other domains such as medical diagnosis or credit decisions.
Explicit training that corrects overestimation of human deception-detection ability might increase acceptance of accurate AI predictions.

Load-bearing premise

Participants understood and reacted to the displayed model accuracy and confidence levels exactly as the experiment intended, without other unmeasured factors shaping their choices.

What would settle it

An experiment in which participants accept high-confidence AI deception predictions at rates equal to or higher than low-confidence ones, and the resulting human-AI accuracy exceeds AI-alone accuracy.

read the original abstract

Automated verbal deception detection using methods from Artificial Intelligence (AI) has been shown to outperform humans in disentangling lies from truths. Research suggests that transparency and interpretability of computational methods tend to increase human acceptance of using AI to support decisions. However, the extent to which humans accept AI judgments for deception detection remains unclear. We experimentally examined how an AI model's accuracy (i.e., its overall performance in deception detection) and confidence (i.e., the model's uncertainty in single-statements predictions) influence human adoption of the model's judgments. Participants (n=373) were presented with veracity judgments of an AI model with high or low overall accuracy and various degrees of prediction confidence. The results showed that humans followed predictions from a highly accurate model more than from a less accurate one. Interestingly, the more confident the model, the more people deviated from it, especially if the model predicted deception. We also found that human interaction with algorithmic predictions either worsened the machine's performance or was ineffective. While this human aversion to accept highly confident algorithmic predictions was partly explained by participants' tendency to overestimate humans' deception detection abilities, we also discuss how truth-default theory and the social costs of accusing someone of lying help explain the findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The experiment shows people reject high-confidence AI accusations of deception more than low-confidence ones, even from accurate models, and that overriding the AI does not improve outcomes.

read the letter

The main thing to know is that this study finds an interaction where higher model confidence leads to greater human deviation from the prediction, especially when the AI accuses someone of lying. Accuracy overall increases adoption, but confidence works against it in the accusatory case, and letting people adjust the judgments either hurts performance or adds nothing. They link part of the pattern to participants overestimating human deception detection skill and to the social costs of accusations. That specific confidence-by-accusation effect is the piece that extends earlier transparency work without just repeating it. The sample of 373 is reasonable for a behavioral experiment and the directional results on acceptance and performance are reported clearly enough in the abstract to see the pattern. The design tests both overall accuracy and per-prediction confidence as separate factors, which is a straightforward way to isolate the variables. The finding that human overrides do not help is a practical note worth having for anyone thinking about deployment. The soft spot is the missing manipulation check on whether participants actually registered the stated confidence levels as reliability information rather than reacting to the accusation content or social implications alone. Without a post-task probe or similar, it is possible the deviation reflects aversion to confident-sounding accusations more than to the numeric or verbal confidence cue itself. The abstract also skips details on exact statistical tests, stimulus construction, and exclusion rules, so the strength of the interaction rests on how those were handled. This is the sort of paper that would interest researchers working on human-AI decision support in security, HR, or legal screening contexts. A reader looking for evidence on when transparency backfires in deception detection would get something usable from it. It is not a broad theoretical advance but the empirical result is focused and the methods appear sound enough on the surface to merit checking. I would send it for peer review; the core claim is worth a closer look even if the methods section needs expansion to pin down the confidence manipulation.

Referee Report

2 major / 1 minor

Summary. The paper reports a controlled experiment with n=373 participants examining how an AI model's overall accuracy and per-prediction confidence influence human acceptance of its veracity judgments in a deception detection task. Results indicate greater adherence to high-accuracy models, but increased deviation from high-confidence predictions (especially accusatory/deception ones), with human overrides either worsening or not improving the model's performance. The aversion is partly attributed to overestimation of human deception-detection ability, alongside truth-default theory and social costs of accusations.

Significance. If the results hold after addressing the noted gaps, this work offers useful empirical evidence on human-AI interaction biases in high-stakes judgment domains. The reasonably large sample and focus on both accuracy and confidence factors provide a concrete behavioral dataset that can inform HCI design for algorithmic decision support. It also supplies falsifiable predictions about when humans will reject confident AI outputs, which is a strength for the field.

major comments (2)

[Methods] Methods section: No manipulation check is reported to confirm that participants perceived and responded to the model's stated confidence levels as intended by the experimental design. Without this, the central claim that humans deviate more from 'confident' predictions (especially accusatory ones) risks being driven instead by statement content, accusation valence, or demand characteristics.
[Results] Results section: The manuscript provides insufficient detail on the statistical tests, exclusion criteria, stimulus construction, and potential confounds used to establish the directional effects on acceptance rates and performance changes. This leaves the reported interactions only moderately supported.

minor comments (1)

[Abstract] Abstract: The description of confidence levels as 'various degrees' would be clearer if the exact operationalization (e.g., numeric or verbal labels and their values) were stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify how to strengthen the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Methods] Methods section: No manipulation check is reported to confirm that participants perceived and responded to the model's stated confidence levels as intended by the experimental design. Without this, the central claim that humans deviate more from 'confident' predictions (especially accusatory ones) risks being driven instead by statement content, accusation valence, or demand characteristics.

Authors: We agree this is a legitimate concern. The original study did not include an explicit manipulation check for perceived confidence. In the revision we will add a dedicated paragraph in the Methods section describing the instructions given to participants (which explicitly directed attention to the confidence ratings) and will report any available attention-check data. We will also add a limitations subsection noting the absence of a direct manipulation check and outlining how future work could include one. We do not believe the core pattern is reducible to demand characteristics or valence alone, because the deviation was selective (stronger for high-confidence deception accusations) and consistent with the pre-registered hypotheses derived from truth-default theory and social-cost considerations. revision: partial
Referee: [Results] Results section: The manuscript provides insufficient detail on the statistical tests, exclusion criteria, stimulus construction, and potential confounds used to establish the directional effects on acceptance rates and performance changes. This leaves the reported interactions only moderately supported.

Authors: We accept that additional detail is required. In the revised Results section we will: (a) specify the exact mixed-effects logistic regression models, including all fixed and random effects, software, and any corrections for multiple comparisons; (b) state the precise exclusion criteria and the number of participants removed at each stage; (c) describe stimulus construction, including how the 20 statements were selected, how the AI model’s accuracy and per-prediction confidence values were assigned, and how accusatory versus exonerating predictions were balanced; and (d) present supplementary analyses or discussion addressing potential confounds such as statement content, accusation valence, and order effects. These expansions will be supported by the data already collected and will be accompanied by the full analysis code and stimuli in the supplementary materials. revision: yes

Circularity Check

0 steps flagged

Empirical behavioral experiment with fresh data shows no circularity

full rationale

This paper presents results from a new experiment with 373 participants who viewed AI veracity judgments varying in stated accuracy and confidence. All reported effects on human adoption, deviation rates, and performance changes derive from direct statistical analysis of collected behavioral responses rather than any equations, fitted parameters, self-referential predictions, or load-bearing self-citations. No derivation chain exists that could reduce outputs to inputs by construction; the study is self-contained against external benchmarks of participant data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard experimental psychology assumptions about participant perception and response to presented information, plus external theories (truth-default theory, social costs of accusation) used for post-hoc explanation rather than derivation.

axioms (1)

domain assumption Participants can accurately perceive and respond to manipulated AI accuracy and confidence levels in a controlled presentation.
Invoked throughout the experimental design where high/low accuracy and varying confidence are presented to participants.

pith-pipeline@v0.9.0 · 5520 in / 1262 out tokens · 96608 ms · 2026-05-17T02:10:39.445774+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Preliminary Analysis TABLE S1. Mean, standard deviation, and median values of participants’ deviation from AI (Δy) across conditions Accuracy low Accuracy high Classification Confidence M SD Median M SD Median Deceptive Indecisive 5.83 24.75 8 7.85 21.80 6 Poorly Confident 15.87 26.31 15.5 11.57 23.51 6 Moderately Confident 21.01 25.82 15 14.9 23.43 10 Co...

work page
[2]

Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects from Model 2

Confirmatory Analysis TABLE S2. Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects from Model 2. Random Effects Group Variance SD ICC Participant_id 39.44 6.28 0.07 Statement_id 35.91 5.99 0.06 Residual 501.00 22.38 - TABLE S3. Pairwise contrasts for Confidence range by Classification. Pairwise cont...

work page
[3]

How much were you motivated to perform well?

Robustness check by including covariates Covariates: - Motivation: “How much were you motivated to perform well?” (0=Not at all, 10=Very much) - Difficulty: “How difficult did you find the study?” (0=Very easy, 10=Very difficult) - ML familiarity: “How familiar are you with AI-based algorithms?” (0=Not familiar at all, 5=Neutral, 10=Very familiar) - AI vs...

work page
[4]

Type III Analysis of Variance for the Linear Mixed Model predicting human deviation

Robustness check by including previously excluded participants TABLE S7. Type III Analysis of Variance for the Linear Mixed Model predicting human deviation. Effect Sample Sum of Sq Mean Sq Num DF Den DF F-value 𝜂2 (99% CI) Accuracy Reduced 9 9 1 377.5 0.02 0.00 (0.00, 0.00) Full 53 53 1 499.7 0.10 0.00 (0.00, 0.00) Confidence Reduced 928 232 4 180.0 0.46...

work page
[5]

Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects

Absolute values of human deviation Model equation: |∆𝑦|=𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∗ 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ∗ 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛+(1| Participant_id) )+(1| Statement_id) TABLE S8. Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects. Random Effects Group Variance SD ICC Participant_id 45.84 6.77 0.13 Statement_id 2.05 1.43 0.01 Residua...

work page

[1] [1]

Preliminary Analysis TABLE S1. Mean, standard deviation, and median values of participants’ deviation from AI (Δy) across conditions Accuracy low Accuracy high Classification Confidence M SD Median M SD Median Deceptive Indecisive 5.83 24.75 8 7.85 21.80 6 Poorly Confident 15.87 26.31 15.5 11.57 23.51 6 Moderately Confident 21.01 25.82 15 14.9 23.43 10 Co...

work page

[2] [2]

Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects from Model 2

Confirmatory Analysis TABLE S2. Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects from Model 2. Random Effects Group Variance SD ICC Participant_id 39.44 6.28 0.07 Statement_id 35.91 5.99 0.06 Residual 501.00 22.38 - TABLE S3. Pairwise contrasts for Confidence range by Classification. Pairwise cont...

work page

[3] [3]

How much were you motivated to perform well?

Robustness check by including covariates Covariates: - Motivation: “How much were you motivated to perform well?” (0=Not at all, 10=Very much) - Difficulty: “How difficult did you find the study?” (0=Very easy, 10=Very difficult) - ML familiarity: “How familiar are you with AI-based algorithms?” (0=Not familiar at all, 5=Neutral, 10=Very familiar) - AI vs...

work page

[4] [4]

Type III Analysis of Variance for the Linear Mixed Model predicting human deviation

Robustness check by including previously excluded participants TABLE S7. Type III Analysis of Variance for the Linear Mixed Model predicting human deviation. Effect Sample Sum of Sq Mean Sq Num DF Den DF F-value 𝜂2 (99% CI) Accuracy Reduced 9 9 1 377.5 0.02 0.00 (0.00, 0.00) Full 53 53 1 499.7 0.10 0.00 (0.00, 0.00) Confidence Reduced 928 232 4 180.0 0.46...

work page

[5] [5]

Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects

Absolute values of human deviation Model equation: |∆𝑦|=𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∗ 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ∗ 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛+(1| Participant_id) )+(1| Statement_id) TABLE S8. Variance, Standard Deviation (SD), and adjusted Intraclass Correlation Coefficient (ICC) for Random Effects. Random Effects Group Variance SD ICC Participant_id 45.84 6.77 0.13 Statement_id 2.05 1.43 0.01 Residua...

work page