arxiv: 2102.09692 · v1 · submitted 2021-02-19 · 💻 cs.HC · cs.AI

Recognition: 2 theorem links

· Lean Theorem

To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making

Zana Bu\c{c}inca , Maja Barbara Malaya , Krzysztof Z. Gajos

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:58 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords cognitive forcingoverrelianceexplainable AIAI-assisted decision-makingdual-process theoryhuman-AI interactionneed for cognitiontrust in AI

0 comments

The pith

Cognitive forcing interventions reduce overreliance on incorrect AI suggestions by prompting deeper analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

People often accept AI recommendations even when wrong because they apply general heuristics instead of analyzing each case and its explanation. The paper tests three cognitive forcing designs that require users to engage analytically with the AI output before deciding. In an experiment with 199 participants, these designs cut overreliance compared with standard explainable AI approaches. The designs that worked best received the lowest user satisfaction ratings, and their benefits were larger for participants who score high on Need for Cognition. The work therefore shows that explainable AI success depends on whether users are motivated to think through the provided information.

Core claim

Cognitive forcing functions compel people to engage more thoughtfully with AI-generated explanations rather than relying on heuristics, and this engagement significantly reduces overreliance on wrong AI suggestions relative to simple explainable AI baselines, although it lowers subjective satisfaction and benefits people higher in Need for Cognition more.

What carries the argument

Cognitive forcing interventions: interface designs that require users to perform additional analytical steps with the AI explanation before accepting or rejecting the suggestion.

If this is right

Cognitive forcing can be used to lower acceptance of erroneous AI advice in decision tasks.
Any reduction in overreliance comes with lower subjective ratings of the system.
The benefit of forcing is moderated by individual differences in motivation to think effortfully.
Explainable AI solutions will not work equally well for all users without accounting for cognitive motivation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Forcing mechanisms might transfer to high-stakes domains such as medical or financial decisions where overreliance carries larger costs.
Designers could explore milder versions of forcing that preserve user satisfaction while still increasing analysis.
Personalizing the level of forcing based on a user's measured Need for Cognition could improve both effectiveness and acceptance.

Load-bearing premise

The three interventions actually compel deeper analytical thinking rather than simply adding friction or prompting other behavioral changes.

What would settle it

An experiment that measures actual analytical processing (for example via eye-tracking or think-aloud protocols) and finds no increase in depth of engagement despite the forcing designs.

read the original abstract

People supported by AI-powered decision support tools frequently overrely on the AI: they accept an AI's suggestion even when that suggestion is wrong. Adding explanations to the AI decisions does not appear to reduce the overreliance and some studies suggest that it might even increase it. Informed by the dual-process theory of cognition, we posit that people rarely engage analytically with each individual AI recommendation and explanation, and instead develop general heuristics about whether and when to follow the AI suggestions. Building on prior research on medical decision-making, we designed three cognitive forcing interventions to compel people to engage more thoughtfully with the AI-generated explanations. We conducted an experiment (N=199), in which we compared our three cognitive forcing designs to two simple explainable AI approaches and to a no-AI baseline. The results demonstrate that cognitive forcing significantly reduced overreliance compared to the simple explainable AI approaches. However, there was a trade-off: people assigned the least favorable subjective ratings to the designs that reduced the overreliance the most. To audit our work for intervention-generated inequalities, we investigated whether our interventions benefited equally people with different levels of Need for Cognition (i.e., motivation to engage in effortful mental activities). Our results show that, on average, cognitive forcing interventions benefited participants higher in Need for Cognition more. Our research suggests that human cognitive motivation moderates the effectiveness of explainable AI solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Cognitive forcing cuts overreliance more than plain explanations but without process measures to confirm deeper thinking and with clear usability costs.

read the letter

The main takeaway is that the three cognitive forcing interventions reduced overreliance on wrong AI suggestions more than the simple XAI baselines, though participants rated those forcing designs lowest and the gains went mostly to people high in Need for Cognition. The paper adapts forcing functions from medical decision-making to general AI-assisted tasks and tests three concrete designs in a controlled experiment with 199 participants against two explanation conditions and a no-AI baseline. The outcome data show lower overreliance in the forcing arms, which is a straightforward empirical result worth having on record. They also checked for differential effects by NFC and reported that higher-NFC participants benefited more on average, which addresses one fairness angle directly. The experiment setup itself looks clean with clear condition comparisons and a reasonable sample size for the comparisons they ran. The main limitation is the missing process evidence. The design measures final accuracy and overreliance rates but does not include response times on the explanations, comprehension checks, or any other direct indicator that people actually engaged more analytically with the AI rationale rather than simply slowing down because of added steps. That leaves open the possibility that the effect comes from friction or caution rather than the intended dual-process shift. The lower subjective ratings for the conditions that performed best on overreliance is another practical issue that would matter in deployment. This paper is for researchers working on human-AI decision support and XAI evaluation. It supplies specific intervention examples and documents both the benefit and the trade-off in one study. I would send it to peer review. The empirical comparison is solid enough to deserve referee time even if the mechanism story needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript reports results from a controlled experiment (N=199) comparing three cognitive forcing interventions—designed to promote analytical engagement with AI explanations per dual-process theory—to two simple explainable-AI baselines and a no-AI condition. It claims that the forcing designs significantly reduce overreliance on incorrect AI recommendations, albeit with lower subjective ratings, and that the benefit is moderated by Need for Cognition such that higher-NFC participants gain more from the interventions.

Significance. If the core empirical result holds, the work supplies concrete design evidence that cognitive forcing can mitigate overreliance in AI-assisted decisions, documents a satisfaction trade-off, and identifies cognitive motivation as a moderator relevant to equitable XAI deployment.

major comments (2)

[Methods and Results sections] The central interpretation—that reduced overreliance results from compelled deeper analytical processing rather than non-analytic mechanisms such as added friction—is load-bearing for the dual-process framing, yet the design reports only outcome metrics (overreliance rates) without direct process measures (response latencies on explanations, eye-tracking dwell times, or comprehension probes of the AI rationale). This leaves the mechanism unverified.
[Results section] The moderation analysis by Need for Cognition is presented as evidence of differential benefit, but the manuscript does not report the full regression model (including interaction term, covariates, and effect-size details) or power calculations for the subgroup comparisons, making it difficult to assess whether the reported average benefit for higher-NFC participants is robust.

minor comments (2)

[Abstract] The abstract states the sample size and key comparisons but omits the decision task domain and the precise operationalization of overreliance; adding one sentence would improve standalone readability.
[Results] Subjective rating scales are mentioned but the exact items, anchors, and reliability statistics are not tabulated; a supplementary table would clarify the reported trade-off.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, indicating planned revisions to improve clarity, transparency, and completeness of the manuscript.

read point-by-point responses

Referee: [Methods and Results sections] The central interpretation—that reduced overreliance results from compelled deeper analytical processing rather than non-analytic mechanisms such as added friction—is load-bearing for the dual-process framing, yet the design reports only outcome metrics (overreliance rates) without direct process measures (response latencies on explanations, eye-tracking dwell times, or comprehension probes of the AI rationale). This leaves the mechanism unverified.

Authors: We agree that the absence of direct process measures (e.g., response latencies, eye-tracking, or comprehension probes) leaves the precise cognitive mechanism somewhat inferential rather than directly verified. Our study was designed to evaluate behavioral outcomes of the forcing interventions, which were constructed to interrupt heuristic reliance and require explicit engagement with the AI rationale, consistent with dual-process theory and prior medical decision-making research. We cannot rule out non-analytic factors such as added friction with the current data. In revision we will (1) expand the Discussion to explicitly address alternative mechanisms, (2) add a dedicated Limitations subsection noting the lack of process-tracing data, and (3) propose future studies that incorporate such measures. This is a partial revision focused on improved interpretation and transparency. revision: partial
Referee: [Results section] The moderation analysis by Need for Cognition is presented as evidence of differential benefit, but the manuscript does not report the full regression model (including interaction term, covariates, and effect-size details) or power calculations for the subgroup comparisons, making it difficult to assess whether the reported average benefit for higher-NFC participants is robust.

Authors: We thank the referee for highlighting this reporting gap. The original manuscript summarized the NFC moderation but omitted the full model specification. In the revised version we will include the complete regression results: the full model equation with the condition × NFC interaction term, all covariates, coefficient estimates with confidence intervals, effect-size metrics, and any available power or robustness checks for the subgroup analyses. This will allow readers to evaluate the strength and stability of the moderation finding directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this empirical HCI study

full rationale

This paper reports results from a controlled experiment (N=199) comparing three cognitive forcing interventions to explainable AI baselines and a no-AI condition. All central claims—reduced overreliance, subjective rating trade-offs, and moderation by Need for Cognition—are direct empirical outcome measures from participant decisions and self-reports. The work draws on dual-process theory for intervention design but presents no equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems that reduce any result to its own inputs by construction. The derivation chain is self-contained experimental reporting with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the dual-process theory of cognition as a domain assumption and on the validity of the experimental measures of overreliance and Need for Cognition; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Dual-process theory of cognition: people rarely engage analytically with each individual AI recommendation and instead develop general heuristics
Invoked to justify why explanations alone fail and why forcing functions are needed

pith-pipeline@v0.9.0 · 5566 in / 1139 out tokens · 38384 ms · 2026-05-15T17:58:59.505519+00:00 · methodology

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps
cs.HC 2026-05 conditional novelty 6.0

Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.
Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive
cs.HC 2026-05 accept novelty 6.0

AI-authored goals produce higher SMART quality scores but lower psychological ownership, commitment, importance, and goal-directed behavior than self-authored goals, with ownership as the mediating mechanism.
Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive
cs.HC 2026-05 conditional novelty 6.0

AI-authored goals are objectively higher quality but produce lower psychological ownership, commitment, importance, and behavioral action than self-authored goals, with ownership as the mediating mechanism.
Evaluating the False Trust engendered by LLM Explanations
cs.HC 2026-05 unverdicted novelty 6.0

A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs
cs.GT 2026-05 unverdicted novelty 6.0

A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.
What Did They Mean? How LLMs Resolve Ambiguous Social Situations across Perspectives and Roles
cs.HC 2026-04 unverdicted novelty 6.0

LLMs produce interpretive closure in 87.5% of ambiguous social scenarios through narrative alignment, reversal, or normative advice, with first-person perspectives increasing alignment tendencies.
Gradual Voluntary Participation: A Framework for Participatory AI Governance in Journalism
cs.HC 2026-04 unverdicted novelty 6.0

The study proposes the Gradual Voluntary Participation (GVP) framework to reconceptualize participatory AI governance in journalism as a gradual and voluntary process using a bidimensional matrix.
Agentivism: a learning theory for the age of artificial intelligence
cs.AI 2026-04 unverdicted novelty 6.0

Agentivism defines learning as durable growth in human capability through selective AI delegation, epistemic monitoring and verification, reconstructive internalization of AI outputs, and transfer under reduced support.
Effects of Generative AI Errors on User Reliance Across Task Difficulty
cs.CY 2026-04 unverdicted novelty 6.0

Higher generative AI error rates reduce user reliance, but task difficulty does not significantly moderate this effect.
Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models
cs.CR 2026-04 conditional novelty 6.0

A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.
Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems
cs.HC 2026-03 unverdicted novelty 6.0

LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows
cs.CY 2026-04 unverdicted novelty 5.0

Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.
Auditing and Controlling AI Agent Actions in Spreadsheets
cs.HC 2026-04 unverdicted novelty 5.0

Pista decomposes AI agent actions in spreadsheets into auditable steps, enabling real-time user intervention that improves task outcomes, user comprehension, agent perception, and sense of co-ownership over baseline agents.
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research
cs.HC 2026-04 unverdicted novelty 5.0

AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.
Toward Human-AI Complementarity Across Diverse Tasks
cs.HC 2026-04 unverdicted novelty 5.0

Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
cs.MA 2026-03 unverdicted novelty 5.0

Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction
cs.HC 2026-03 unverdicted novelty 5.0

Analysis of 1,223 AI-HCI papers shows declining focus on human epistemic sovereignty and rising optimization of autonomous agents, leading to a proposal for scaffolded cognitive friction via multi-agent systems to pre...
Exploring Instant Photography using Generative AI: A Design Probe with the UnReality Camera
cs.HC 2026-05 unverdicted novelty 4.0

The UnReality Camera augments instant photos with generative AI from spoken input, and a design probe found users balancing artistic control with appreciation for unpredictability, suspense during printing, and owners...
From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making
cs.HC 2026-04 unverdicted novelty 4.0

A literature review shows that constructs for appropriate reliance on AI are fragmented, presents three views on the topic, and calls for consensus on objective metrics to enable better comparisons across studies.
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions
cs.CY 2026-02 unverdicted novelty 4.0

Current XAI methods for DNNs and LLMs rest on paradoxes and false assumptions that demand a paradigm shift to verification protocols, scientific foundations, context-aware design, and faithful model analysis rather th...

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · cited by 19 Pith papers · 1 internal anchor

[1]

Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11

work page 2019
[2]

Gagan Bansal, Tongshuang Wu, Joyce Zhu, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing...

work page 2021
[3]

Dale J Barr, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68, 3 (2013), 255–278

work page 2013
[4]

Eta S Berner and Mark L Graber. 2008. Overconfidence as a cause of diagnostic error in medicine.The American journal of medicine 121, 5 (2008), S2–S23

work page 2008
[5]

Umang Bhatt, Adrian Weller, and José M. F. Moura. 2020. Evaluating and Aggregating Feature-based Model Explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 , Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3016–3022. https://doi.org/10.24963/...

work page doi:10.24963/ijcai 2020
[6]

Brian H Bornstein and A Christine Emler. 2001. Rationality in medical decision making: a review of the literature on doctors’ decision-making biases. Journal of evaluation in clinical practice 7, 2 (2001), 97–107

work page 2001
[7]

Gajos, and Elena L

Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI ’20). ACM, New York, NY, USA

work page 2020
[8]

Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency . 77–91

work page 2018
[9]

Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In 2015 International Conference on Healthcare Informatics . IEEE, 160–169

work page 2015
[10]

Cacioppo and Richard E

John T. Cacioppo and Richard E. Petty. 1982. The need for cognition. Journal of Personality and Social Psychology 42, 1 (1982), 116–131. https://doi.org/10.1037/0022-3514.42.1.116

work page doi:10.1037/0022-3514.42.1.116 1982
[11]

J T Cacioppo, R E Petty, and C F Kao. 1984. The efficient assessment of need for cognition. Journal of personality assessment 48, 3 (1984), 306–307. https://doi.org/10.1207/s15327752jpa4803_13

work page doi:10.1207/s15327752jpa4803_13 1984
[12]

Giuseppe Carenini. 2001. An Analysis of the Influence of Need for Cognition on Dynamic Queries Usage. In CHI ’01 Extended Abstracts on Human Factors in Computing Systems (Seattle, Washington) (CHI EA ’01). ACM, New York, NY, USA, 383–384. https://doi.org/10.1145/634067.634293

work page doi:10.1145/634067.634293 2001
[13]

Ana-Maria Cazan and Simona Elena Indreica. 2014. Need for cognition and approaches to learning among university students. Procedia-Social and Behavioral Sciences 127 (2014), 134–138

work page 2014
[14]

Jim Q Chen and Sang M Lee. 2003. An exploratory cognitive DSS for strategic decision making. Decision support systems 36, 2 (2003), 147–160

work page 2003
[15]

Glinda S Cooper and Vanessa Meterko. 2019. Cognitive bias research in forensic science: a systematic review. Forensic science international 297 (2019), 35–46. Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. To Trust or to Think 188:19

work page 2019
[16]

Pat Croskerry. 2003. Cognitive forcing strategies in clinical decisionmaking. Annals of emergency medicine 41, 1 (2003), 110–120

work page 2003
[17]

Pat Croskerry. 2003. The importance of cognitive errors in diagnosis and strategies to minimize them. Academic medicine 78, 8 (2003), 775–780

work page 2003
[18]

Louis Deslauriers, Logan S McCarty, Kelly Miller, Kristina Callaghan, and Greg Kestin. 2019. Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences (2019), 201821936

work page 2019
[19]

Jennifer L Eberhardt. 2020. Biased: Uncovering the hidden prejudice that shapes what we see, think, and do . Penguin Books

work page 2020
[20]

John W Ely, Mark L Graber, and Pat Croskerry. 2011. Checklists to reduce diagnostic errors. Academic Medicine 86, 3 (2011), 307–313

work page 2011
[21]

Gavan J Fitzsimons and Donald R Lehmann. 2004. Reactance to recommendations: When unsolicited advice yields contrary responses. Marketing Science 23, 1 (2004), 82–94

work page 2004
[22]

Gajos and Krysta Chauncey

Krzysztof Z. Gajos and Krysta Chauncey. 2017. The Influence of Personality Traits and Cognitive Load on the Use of Adaptive User Interfaces. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces (Limassol, Cyprus) (IUI ’17). ACM, New York, NY, USA, 301–306. https://doi.org/10.1145/3025171.3025192

work page doi:10.1145/3025171.3025192 2017
[23]

Neelansh Garg, Apuroop Sethupathy, Rudraksh Tuwani, Rakhi Nk, Shubham Dokania, Arvind Iyer, Ayushi Gupta, Shubhra Agrawal, Navjot Singh, Shubham Shukla, et al. 2018. FlavorDB: a database of flavor molecules. Nucleic acids research 46, D1 (2018), D1210–D1216

work page 2018
[24]

Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller

Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2021. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers. Proc. ACM Hum.-Comput. Interact. 4, CSCW3, Article 235 (2021), 28 pages. https://doi.org/10.1145/3432934

work page doi:10.1145/3432934 2021
[25]

Mark L Graber, Stephanie Kissam, Velma L Payne, Ashley ND Meyer, Asta Sorensen, Nancy Lenfestey, Elizabeth Tant, Kerm Henriksen, Kenneth LaBresh, and Hardeep Singh. 2012. Cognitive interventions to reduce diagnostic error: a narrative review. BMJ quality & safety 21, 7 (2012), 535–557

work page 2012
[26]

Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–24

work page 2019
[27]

Curtis P Haugtvedt, Richard E Petty, and John T Cacioppo. 1992. Need for cognition and advertising: Understanding the role of personality variables in consumer behavior. Journal of Consumer Psychology 1, 3 (1992), 239–260

work page 1992
[28]

Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70

work page 1979
[29]

Jessica Hullman, Eytan Adar, and Priti Shah. 2011. Benefitting infovis with visual difficulties. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2213–2222

work page 2011
[30]

Pradier, Thomas H

Maia Jacobs, Melanie F. Pradier, Thomas H. McCoy Jr, Roy H. Perlis, Finale Doshi-Velez, and Krzysztof Z. Gajos. 2021. How machine-learning recommendations influence clinician treatment selections: the example of the antidepressant selection. Translational Psychiatry 11 (2021). https://doi.org/10.1038/s41398-021-01224-x

work page doi:10.1038/s41398-021-01224-x 2021
[31]

Heinrich Jiang, Been Kim, Melody Y Guan, and Maya Gupta. 2018. To trust or not to trust a classifier. In Proceedings of the 32nd International Conference on Neural Information Processing Systems . 5546–5557

work page 2018
[32]

Daniel Kahneman. 2011. Thinking, fast and slow . Macmillan

work page 2011
[33]

Daniel Kahneman and Shane Frederick. 2002. Representativeness revisited: Attribute substitution in intuitive judgment. In Representativeness revisited: Attribute substitution in intuitive judgment . New York. Cambridge University Press., 49–81

work page 2002
[34]

Daniel Kahneman, Stewart Paul Slovic, Paul Slovic, and Amos Tversky. 1982. Judgment under uncertainty: Heuristics and biases. Cambridge university press

work page 1982
[35]

Ece Kamar. 2016. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence.. In IJCAI. 4070–4073

work page 2016
[36]

Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems- Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 467–474

work page 2012
[37]

Eric Kearney, Diether Gebert, and Sven C Voelpel. 2009. When and how diversity benefits teams: The importance of team members’ need for cognition. Academy of Management journal 52, 3 (2009), 581–598

work page 2009
[38]

Wouter Kool and Matthew Botvinick. 2018. Mental labour. Nature human behaviour 2, 12 (2018), 899–908

work page 2018
[39]

Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In2013 IEEE Symposium on Visual Languages and Human Centric Computing . IEEE, 3–10

work page 2013
[40]

Why is ’Chicago’ Deceptive?

Vivian Lai, Han Liu, and Chenhao Tan. 2020. "Why is ’Chicago’ Deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. 188:20 Zana Buçinca, Maja Barbara M...

work page doi:10.1145/3313831.3376873 2020
[41]

Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency . 29–38

work page 2019
[42]

Kathryn Ann Lambe, Gary O’Reilly, Brendan D Kelly, and Sarah Curristan. 2016. Dual-process cognitive interventions to enhance diagnostic reasoning: a systematic review. BMJ quality & safety 25, 10 (2016), 808–820

work page 2016
[43]

G Daniel Lassiter, Michael A Briggs, and R David Slaw. 1991. Need for cognition, causal processing, and memory for behavior. Personality and Social Psychology Bulletin 17, 6 (1991), 694–700

work page 1991
[44]

John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80

work page 2004
[45]

Chin-Lung Lin, Sheng-Hsien Lee, and Der-Juinn Horng. 2011. The effects of online reviews on purchasing intention: The moderating role of need for cognition. Social Behavior and Personality: an international journal 39, 1 (2011), 71–81

work page 2011
[46]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . Curran Associates, Inc., 4765–4774

work page 2017
[47]

Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert. 2019. To Explain or Not to Explain: The Effects of Personal Characteristics When Explaining Music Recommendations. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, N...

work page doi:10.1145/3301275.3302313 2019
[48]

Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert. 2020. What’s in a User? Towards Personalising Transparency for Music Recommender Interfaces. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization (Genoa, Italy) (UMAP ’20). Association for Computing Machinery, New York, NY, USA, 173–182. https://do...

work page doi:10.1145/3340631.3394844 2020
[49]

Carol-anne E Moulton, Glenn Regehr, Maria Mylopoulos, and Helen M MacRae. 2007. Slowing down when you should: a new model of expert judgment. Academic Medicine 82, 10 (2007), S109–S116

work page 2007
[50]

Joon Sung Park, Rick Barber, Alex Kirlik, and Karrie Karahalios. 2019. A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy.Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–15

work page 2019
[51]

Avi Parush, Shir Ahuvia, and Ido Erev. 2007. Degradation in spatial knowledge acquisition when using automatic navigation systems. In International conference on spatial information theory . Springer, 238–254

work page 2007
[52]

A "missing" family of classical orthogonal polynomials

Richard E. Petty and John T. Cacioppo. 1986. The Elaboration Likelihood Model of Persuasion. Communication and Persuasion 19 (1986), 1–24. https://doi.org/10.1007/978-1-4612-4964-1_1 arXiv:arXiv:1011.1669v3

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-1-4612-4964-1_1 1986
[53]

Vlad L Pop, Alex Shrewsbury, and Francis T Durso. 2015. Individual differences in the calibration of trust in automation. Human factors 57, 4 (2015), 545–556

work page 2015
[54]

White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes

Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to- End Framework for Internal Algorithmic Auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barce...

work page doi:10.1145/3351095.3372873 2020
[55]

Jonathan Sherbino, Kulamakan Kulasegaram, Elizabeth Howey, and Geoffrey Norman. 2014. Ineffectiveness of cognitive forcing strategies to reduce biases in diagnostic reasoning: a controlled trial. Canadian Journal of Emergency Medicine 16, 1 (2014), 34–40

work page 2014
[56]

Maria Sicilia, Salvador Ruiz, and Jose L Munuera. 2005. Effects of interactivity in a web site: The moderating effect of need for cognition. Journal of advertising 34, 3 (2005), 31–44

work page 2005
[57]

J Spilke, HP Piepho, and X Hu. 2005. Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS system. Journal of Agronomy and crop science 191, 1 (2005), 47–54

work page 2005
[58]

Tracy L Tuten and Michael Bosnjak. 2001. Understanding differences in web usage: The role of need for cognition and the five factor model of personality. Social Behavior and Personality: an international journal 29, 4 (2001), 391–398

work page 2001
[59]

Michelle Vaccaro and Jim Waldo. 2019. The effects of mixing machine learning and human judgment. Commun. ACM 62, 11 (2019), 104–110

work page 2019
[60]

Tiffany C Veinot, Hannah Mitchell, and Jessica S Ancker. 2018. Good intentions are not enough: how informatics interventions can worsen inequality. Journal of the American Medical Informatics Association 25, 8 (2018), 1080–1088

work page 2018
[61]

Jennifer Irvin Vidrine, Vani Nath Simmons, and Thomas H. Brandon. 2007. Construction of smoking-relevant risk perceptions among college students: The influence of need for cognition and message content. Journal of Applied Social Psychology 37, 1 (2007), 91–114. https://doi.org/10.1111/j.0021-9029.2007.00149.x

work page doi:10.1111/j.0021-9029.2007.00149.x 2007
[62]

Alan R Wagner, Jason Borenstein, and Ayanna Howard. 2018. Overtrust in the robotic age. Commun. ACM 61, 9 (2018), 22–24

work page 2018
[63]

Peter C Wason and J St BT Evans. 1974. Dual processes in reasoning? Cognition 3, 2 (1974), 141–154. Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. To Trust or to Think 188:21

work page 1974
[64]

Jacob Westfall, David A Kenny, and Charles M Judd. 2014. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General 143, 5 (2014)

work page 2014
[65]

Pamela Williams-Piehota, Tamera R Schneider, Linda Mowad, and Peter Salovey. 2003. Matching Health Messages to Information-Processing Styles : Need for Cognition and Mammography Utilization. Health Communication 15, 4 (2003), 375–392

work page 2003
[66]

Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems . 1–12

work page 2019
[67]

Vera Liao, and Rachel K

Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 295–305. https://doi.o...

work page doi:10.1145/3351095.3372852 2020