pith. machine review for the scientific record. sign in

arxiv: 2102.09692 · v1 · submitted 2021-02-19 · 💻 cs.HC · cs.AI

Recognition: 2 theorem links

· Lean Theorem

To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:58 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords cognitive forcingoverrelianceexplainable AIAI-assisted decision-makingdual-process theoryhuman-AI interactionneed for cognitiontrust in AI
0
0 comments X

The pith

Cognitive forcing interventions reduce overreliance on incorrect AI suggestions by prompting deeper analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

People often accept AI recommendations even when wrong because they apply general heuristics instead of analyzing each case and its explanation. The paper tests three cognitive forcing designs that require users to engage analytically with the AI output before deciding. In an experiment with 199 participants, these designs cut overreliance compared with standard explainable AI approaches. The designs that worked best received the lowest user satisfaction ratings, and their benefits were larger for participants who score high on Need for Cognition. The work therefore shows that explainable AI success depends on whether users are motivated to think through the provided information.

Core claim

Cognitive forcing functions compel people to engage more thoughtfully with AI-generated explanations rather than relying on heuristics, and this engagement significantly reduces overreliance on wrong AI suggestions relative to simple explainable AI baselines, although it lowers subjective satisfaction and benefits people higher in Need for Cognition more.

What carries the argument

Cognitive forcing interventions: interface designs that require users to perform additional analytical steps with the AI explanation before accepting or rejecting the suggestion.

If this is right

  • Cognitive forcing can be used to lower acceptance of erroneous AI advice in decision tasks.
  • Any reduction in overreliance comes with lower subjective ratings of the system.
  • The benefit of forcing is moderated by individual differences in motivation to think effortfully.
  • Explainable AI solutions will not work equally well for all users without accounting for cognitive motivation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Forcing mechanisms might transfer to high-stakes domains such as medical or financial decisions where overreliance carries larger costs.
  • Designers could explore milder versions of forcing that preserve user satisfaction while still increasing analysis.
  • Personalizing the level of forcing based on a user's measured Need for Cognition could improve both effectiveness and acceptance.

Load-bearing premise

The three interventions actually compel deeper analytical thinking rather than simply adding friction or prompting other behavioral changes.

What would settle it

An experiment that measures actual analytical processing (for example via eye-tracking or think-aloud protocols) and finds no increase in depth of engagement despite the forcing designs.

read the original abstract

People supported by AI-powered decision support tools frequently overrely on the AI: they accept an AI's suggestion even when that suggestion is wrong. Adding explanations to the AI decisions does not appear to reduce the overreliance and some studies suggest that it might even increase it. Informed by the dual-process theory of cognition, we posit that people rarely engage analytically with each individual AI recommendation and explanation, and instead develop general heuristics about whether and when to follow the AI suggestions. Building on prior research on medical decision-making, we designed three cognitive forcing interventions to compel people to engage more thoughtfully with the AI-generated explanations. We conducted an experiment (N=199), in which we compared our three cognitive forcing designs to two simple explainable AI approaches and to a no-AI baseline. The results demonstrate that cognitive forcing significantly reduced overreliance compared to the simple explainable AI approaches. However, there was a trade-off: people assigned the least favorable subjective ratings to the designs that reduced the overreliance the most. To audit our work for intervention-generated inequalities, we investigated whether our interventions benefited equally people with different levels of Need for Cognition (i.e., motivation to engage in effortful mental activities). Our results show that, on average, cognitive forcing interventions benefited participants higher in Need for Cognition more. Our research suggests that human cognitive motivation moderates the effectiveness of explainable AI solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports results from a controlled experiment (N=199) comparing three cognitive forcing interventions—designed to promote analytical engagement with AI explanations per dual-process theory—to two simple explainable-AI baselines and a no-AI condition. It claims that the forcing designs significantly reduce overreliance on incorrect AI recommendations, albeit with lower subjective ratings, and that the benefit is moderated by Need for Cognition such that higher-NFC participants gain more from the interventions.

Significance. If the core empirical result holds, the work supplies concrete design evidence that cognitive forcing can mitigate overreliance in AI-assisted decisions, documents a satisfaction trade-off, and identifies cognitive motivation as a moderator relevant to equitable XAI deployment.

major comments (2)
  1. [Methods and Results sections] The central interpretation—that reduced overreliance results from compelled deeper analytical processing rather than non-analytic mechanisms such as added friction—is load-bearing for the dual-process framing, yet the design reports only outcome metrics (overreliance rates) without direct process measures (response latencies on explanations, eye-tracking dwell times, or comprehension probes of the AI rationale). This leaves the mechanism unverified.
  2. [Results section] The moderation analysis by Need for Cognition is presented as evidence of differential benefit, but the manuscript does not report the full regression model (including interaction term, covariates, and effect-size details) or power calculations for the subgroup comparisons, making it difficult to assess whether the reported average benefit for higher-NFC participants is robust.
minor comments (2)
  1. [Abstract] The abstract states the sample size and key comparisons but omits the decision task domain and the precise operationalization of overreliance; adding one sentence would improve standalone readability.
  2. [Results] Subjective rating scales are mentioned but the exact items, anchors, and reliability statistics are not tabulated; a supplementary table would clarify the reported trade-off.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, indicating planned revisions to improve clarity, transparency, and completeness of the manuscript.

read point-by-point responses
  1. Referee: [Methods and Results sections] The central interpretation—that reduced overreliance results from compelled deeper analytical processing rather than non-analytic mechanisms such as added friction—is load-bearing for the dual-process framing, yet the design reports only outcome metrics (overreliance rates) without direct process measures (response latencies on explanations, eye-tracking dwell times, or comprehension probes of the AI rationale). This leaves the mechanism unverified.

    Authors: We agree that the absence of direct process measures (e.g., response latencies, eye-tracking, or comprehension probes) leaves the precise cognitive mechanism somewhat inferential rather than directly verified. Our study was designed to evaluate behavioral outcomes of the forcing interventions, which were constructed to interrupt heuristic reliance and require explicit engagement with the AI rationale, consistent with dual-process theory and prior medical decision-making research. We cannot rule out non-analytic factors such as added friction with the current data. In revision we will (1) expand the Discussion to explicitly address alternative mechanisms, (2) add a dedicated Limitations subsection noting the lack of process-tracing data, and (3) propose future studies that incorporate such measures. This is a partial revision focused on improved interpretation and transparency. revision: partial

  2. Referee: [Results section] The moderation analysis by Need for Cognition is presented as evidence of differential benefit, but the manuscript does not report the full regression model (including interaction term, covariates, and effect-size details) or power calculations for the subgroup comparisons, making it difficult to assess whether the reported average benefit for higher-NFC participants is robust.

    Authors: We thank the referee for highlighting this reporting gap. The original manuscript summarized the NFC moderation but omitted the full model specification. In the revised version we will include the complete regression results: the full model equation with the condition × NFC interaction term, all covariates, coefficient estimates with confidence intervals, effect-size metrics, and any available power or robustness checks for the subgroup analyses. This will allow readers to evaluate the strength and stability of the moderation finding directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this empirical HCI study

full rationale

This paper reports results from a controlled experiment (N=199) comparing three cognitive forcing interventions to explainable AI baselines and a no-AI condition. All central claims—reduced overreliance, subjective rating trade-offs, and moderation by Need for Cognition—are direct empirical outcome measures from participant decisions and self-reports. The work draws on dual-process theory for intervention design but presents no equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems that reduce any result to its own inputs by construction. The derivation chain is self-contained experimental reporting with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the dual-process theory of cognition as a domain assumption and on the validity of the experimental measures of overreliance and Need for Cognition; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Dual-process theory of cognition: people rarely engage analytically with each individual AI recommendation and instead develop general heuristics
    Invoked to justify why explanations alone fail and why forcing functions are needed

pith-pipeline@v0.9.0 · 5566 in / 1139 out tokens · 38384 ms · 2026-05-15T17:58:59.505519+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. "It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps

    cs.HC 2026-05 conditional novelty 6.0

    Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.

  2. Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive

    cs.HC 2026-05 accept novelty 6.0

    AI-authored goals produce higher SMART quality scores but lower psychological ownership, commitment, importance, and goal-directed behavior than self-authored goals, with ownership as the mediating mechanism.

  3. Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive

    cs.HC 2026-05 conditional novelty 6.0

    AI-authored goals are objectively higher quality but produce lower psychological ownership, commitment, importance, and behavioral action than self-authored goals, with ownership as the mediating mechanism.

  4. Evaluating the False Trust engendered by LLM Explanations

    cs.HC 2026-05 unverdicted novelty 6.0

    A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.

  5. Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs

    cs.GT 2026-05 unverdicted novelty 6.0

    A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.

  6. What Did They Mean? How LLMs Resolve Ambiguous Social Situations across Perspectives and Roles

    cs.HC 2026-04 unverdicted novelty 6.0

    LLMs produce interpretive closure in 87.5% of ambiguous social scenarios through narrative alignment, reversal, or normative advice, with first-person perspectives increasing alignment tendencies.

  7. Gradual Voluntary Participation: A Framework for Participatory AI Governance in Journalism

    cs.HC 2026-04 unverdicted novelty 6.0

    The study proposes the Gradual Voluntary Participation (GVP) framework to reconceptualize participatory AI governance in journalism as a gradual and voluntary process using a bidimensional matrix.

  8. Agentivism: a learning theory for the age of artificial intelligence

    cs.AI 2026-04 unverdicted novelty 6.0

    Agentivism defines learning as durable growth in human capability through selective AI delegation, epistemic monitoring and verification, reconstructive internalization of AI outputs, and transfer under reduced support.

  9. Effects of Generative AI Errors on User Reliance Across Task Difficulty

    cs.CY 2026-04 unverdicted novelty 6.0

    Higher generative AI error rates reduce user reliance, but task difficulty does not significantly moderate this effect.

  10. Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

    cs.CR 2026-04 conditional novelty 6.0

    A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.

  11. Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems

    cs.HC 2026-03 unverdicted novelty 6.0

    LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.

  12. Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows

    cs.CY 2026-04 unverdicted novelty 5.0

    Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.

  13. Auditing and Controlling AI Agent Actions in Spreadsheets

    cs.HC 2026-04 unverdicted novelty 5.0

    Pista decomposes AI agent actions in spreadsheets into auditable steps, enabling real-time user intervention that improves task outcomes, user comprehension, agent perception, and sense of co-ownership over baseline agents.

  14. Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

    cs.HC 2026-04 unverdicted novelty 5.0

    AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.

  15. Toward Human-AI Complementarity Across Diverse Tasks

    cs.HC 2026-04 unverdicted novelty 5.0

    Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.

  16. Emergent Social Intelligence Risks in Generative Multi-Agent Systems

    cs.MA 2026-03 unverdicted novelty 5.0

    Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.

  17. Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

    cs.HC 2026-03 unverdicted novelty 5.0

    Analysis of 1,223 AI-HCI papers shows declining focus on human epistemic sovereignty and rising optimization of autonomous agents, leading to a proposal for scaffolded cognitive friction via multi-agent systems to pre...

  18. Exploring Instant Photography using Generative AI: A Design Probe with the UnReality Camera

    cs.HC 2026-05 unverdicted novelty 4.0

    The UnReality Camera augments instant photos with generative AI from spoken input, and a design probe found users balancing artistic control with appreciation for unpredictability, suspense during printing, and owners...

  19. From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making

    cs.HC 2026-04 unverdicted novelty 4.0

    A literature review shows that constructs for appropriate reliance on AI are fragmented, presents three views on the topic, and calls for consensus on objective metrics to enable better comparisons across studies.

  20. Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

    cs.CY 2026-02 unverdicted novelty 4.0

    Current XAI methods for DNNs and LLMs rest on paradoxes and false assumptions that demand a paradigm shift to verification protocols, scientific foundations, context-aware design, and faithful model analysis rather th...

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · cited by 19 Pith papers · 1 internal anchor

  1. [1]

    Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11

  2. [2]

    Gagan Bansal, Tongshuang Wu, Joyce Zhu, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing...

  3. [3]

    Dale J Barr, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68, 3 (2013), 255–278

  4. [4]

    Eta S Berner and Mark L Graber. 2008. Overconfidence as a cause of diagnostic error in medicine.The American journal of medicine 121, 5 (2008), S2–S23

  5. [5]

    Umang Bhatt, Adrian Weller, and José M. F. Moura. 2020. Evaluating and Aggregating Feature-based Model Explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 , Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3016–3022. https://doi.org/10.24963/...

  6. [6]

    Brian H Bornstein and A Christine Emler. 2001. Rationality in medical decision making: a review of the literature on doctors’ decision-making biases. Journal of evaluation in clinical practice 7, 2 (2001), 97–107

  7. [7]

    Gajos, and Elena L

    Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI ’20). ACM, New York, NY, USA

  8. [8]

    Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency . 77–91

  9. [9]

    Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In 2015 International Conference on Healthcare Informatics . IEEE, 160–169

  10. [10]

    Cacioppo and Richard E

    John T. Cacioppo and Richard E. Petty. 1982. The need for cognition. Journal of Personality and Social Psychology 42, 1 (1982), 116–131. https://doi.org/10.1037/0022-3514.42.1.116

  11. [11]

    J T Cacioppo, R E Petty, and C F Kao. 1984. The efficient assessment of need for cognition. Journal of personality assessment 48, 3 (1984), 306–307. https://doi.org/10.1207/s15327752jpa4803_13

  12. [12]

    Giuseppe Carenini. 2001. An Analysis of the Influence of Need for Cognition on Dynamic Queries Usage. In CHI ’01 Extended Abstracts on Human Factors in Computing Systems (Seattle, Washington) (CHI EA ’01). ACM, New York, NY, USA, 383–384. https://doi.org/10.1145/634067.634293

  13. [13]

    Ana-Maria Cazan and Simona Elena Indreica. 2014. Need for cognition and approaches to learning among university students. Procedia-Social and Behavioral Sciences 127 (2014), 134–138

  14. [14]

    Jim Q Chen and Sang M Lee. 2003. An exploratory cognitive DSS for strategic decision making. Decision support systems 36, 2 (2003), 147–160

  15. [15]

    Glinda S Cooper and Vanessa Meterko. 2019. Cognitive bias research in forensic science: a systematic review. Forensic science international 297 (2019), 35–46. Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. To Trust or to Think 188:19

  16. [16]

    Pat Croskerry. 2003. Cognitive forcing strategies in clinical decisionmaking. Annals of emergency medicine 41, 1 (2003), 110–120

  17. [17]

    Pat Croskerry. 2003. The importance of cognitive errors in diagnosis and strategies to minimize them. Academic medicine 78, 8 (2003), 775–780

  18. [18]

    Louis Deslauriers, Logan S McCarty, Kelly Miller, Kristina Callaghan, and Greg Kestin. 2019. Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences (2019), 201821936

  19. [19]

    Jennifer L Eberhardt. 2020. Biased: Uncovering the hidden prejudice that shapes what we see, think, and do . Penguin Books

  20. [20]

    John W Ely, Mark L Graber, and Pat Croskerry. 2011. Checklists to reduce diagnostic errors. Academic Medicine 86, 3 (2011), 307–313

  21. [21]

    Gavan J Fitzsimons and Donald R Lehmann. 2004. Reactance to recommendations: When unsolicited advice yields contrary responses. Marketing Science 23, 1 (2004), 82–94

  22. [22]

    Gajos and Krysta Chauncey

    Krzysztof Z. Gajos and Krysta Chauncey. 2017. The Influence of Personality Traits and Cognitive Load on the Use of Adaptive User Interfaces. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces (Limassol, Cyprus) (IUI ’17). ACM, New York, NY, USA, 301–306. https://doi.org/10.1145/3025171.3025192

  23. [23]

    Neelansh Garg, Apuroop Sethupathy, Rudraksh Tuwani, Rakhi Nk, Shubham Dokania, Arvind Iyer, Ayushi Gupta, Shubhra Agrawal, Navjot Singh, Shubham Shukla, et al. 2018. FlavorDB: a database of flavor molecules. Nucleic acids research 46, D1 (2018), D1210–D1216

  24. [24]

    Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller

    Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2021. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers. Proc. ACM Hum.-Comput. Interact. 4, CSCW3, Article 235 (2021), 28 pages. https://doi.org/10.1145/3432934

  25. [25]

    Mark L Graber, Stephanie Kissam, Velma L Payne, Ashley ND Meyer, Asta Sorensen, Nancy Lenfestey, Elizabeth Tant, Kerm Henriksen, Kenneth LaBresh, and Hardeep Singh. 2012. Cognitive interventions to reduce diagnostic error: a narrative review. BMJ quality & safety 21, 7 (2012), 535–557

  26. [26]

    Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–24

  27. [27]

    Curtis P Haugtvedt, Richard E Petty, and John T Cacioppo. 1992. Need for cognition and advertising: Understanding the role of personality variables in consumer behavior. Journal of Consumer Psychology 1, 3 (1992), 239–260

  28. [28]

    Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70

  29. [29]

    Jessica Hullman, Eytan Adar, and Priti Shah. 2011. Benefitting infovis with visual difficulties. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2213–2222

  30. [30]

    Pradier, Thomas H

    Maia Jacobs, Melanie F. Pradier, Thomas H. McCoy Jr, Roy H. Perlis, Finale Doshi-Velez, and Krzysztof Z. Gajos. 2021. How machine-learning recommendations influence clinician treatment selections: the example of the antidepressant selection. Translational Psychiatry 11 (2021). https://doi.org/10.1038/s41398-021-01224-x

  31. [31]

    Heinrich Jiang, Been Kim, Melody Y Guan, and Maya Gupta. 2018. To trust or not to trust a classifier. In Proceedings of the 32nd International Conference on Neural Information Processing Systems . 5546–5557

  32. [32]

    Daniel Kahneman. 2011. Thinking, fast and slow . Macmillan

  33. [33]

    Daniel Kahneman and Shane Frederick. 2002. Representativeness revisited: Attribute substitution in intuitive judgment. In Representativeness revisited: Attribute substitution in intuitive judgment . New York. Cambridge University Press., 49–81

  34. [34]

    Daniel Kahneman, Stewart Paul Slovic, Paul Slovic, and Amos Tversky. 1982. Judgment under uncertainty: Heuristics and biases. Cambridge university press

  35. [35]

    Ece Kamar. 2016. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence.. In IJCAI. 4070–4073

  36. [36]

    Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems- Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 467–474

  37. [37]

    Eric Kearney, Diether Gebert, and Sven C Voelpel. 2009. When and how diversity benefits teams: The importance of team members’ need for cognition. Academy of Management journal 52, 3 (2009), 581–598

  38. [38]

    Wouter Kool and Matthew Botvinick. 2018. Mental labour. Nature human behaviour 2, 12 (2018), 899–908

  39. [39]

    Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In2013 IEEE Symposium on Visual Languages and Human Centric Computing . IEEE, 3–10

  40. [40]

    Why is ’Chicago’ Deceptive?

    Vivian Lai, Han Liu, and Chenhao Tan. 2020. "Why is ’Chicago’ Deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. 188:20 Zana Buçinca, Maja Barbara M...

  41. [41]

    Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency . 29–38

  42. [42]

    Kathryn Ann Lambe, Gary O’Reilly, Brendan D Kelly, and Sarah Curristan. 2016. Dual-process cognitive interventions to enhance diagnostic reasoning: a systematic review. BMJ quality & safety 25, 10 (2016), 808–820

  43. [43]

    G Daniel Lassiter, Michael A Briggs, and R David Slaw. 1991. Need for cognition, causal processing, and memory for behavior. Personality and Social Psychology Bulletin 17, 6 (1991), 694–700

  44. [44]

    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80

  45. [45]

    Chin-Lung Lin, Sheng-Hsien Lee, and Der-Juinn Horng. 2011. The effects of online reviews on purchasing intention: The moderating role of need for cognition. Social Behavior and Personality: an international journal 39, 1 (2011), 71–81

  46. [46]

    Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 . Curran Associates, Inc., 4765–4774

  47. [47]

    Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert. 2019. To Explain or Not to Explain: The Effects of Personal Characteristics When Explaining Music Recommendations. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, N...

  48. [48]

    Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert. 2020. What’s in a User? Towards Personalising Transparency for Music Recommender Interfaces. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization (Genoa, Italy) (UMAP ’20). Association for Computing Machinery, New York, NY, USA, 173–182. https://do...

  49. [49]

    Carol-anne E Moulton, Glenn Regehr, Maria Mylopoulos, and Helen M MacRae. 2007. Slowing down when you should: a new model of expert judgment. Academic Medicine 82, 10 (2007), S109–S116

  50. [50]

    Joon Sung Park, Rick Barber, Alex Kirlik, and Karrie Karahalios. 2019. A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy.Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–15

  51. [51]

    Avi Parush, Shir Ahuvia, and Ido Erev. 2007. Degradation in spatial knowledge acquisition when using automatic navigation systems. In International conference on spatial information theory . Springer, 238–254

  52. [52]

    A "missing" family of classical orthogonal polynomials

    Richard E. Petty and John T. Cacioppo. 1986. The Elaboration Likelihood Model of Persuasion. Communication and Persuasion 19 (1986), 1–24. https://doi.org/10.1007/978-1-4612-4964-1_1 arXiv:arXiv:1011.1669v3

  53. [53]

    Vlad L Pop, Alex Shrewsbury, and Francis T Durso. 2015. Individual differences in the calibration of trust in automation. Human factors 57, 4 (2015), 545–556

  54. [54]

    White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes

    Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to- End Framework for Internal Algorithmic Auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barce...

  55. [55]

    Jonathan Sherbino, Kulamakan Kulasegaram, Elizabeth Howey, and Geoffrey Norman. 2014. Ineffectiveness of cognitive forcing strategies to reduce biases in diagnostic reasoning: a controlled trial. Canadian Journal of Emergency Medicine 16, 1 (2014), 34–40

  56. [56]

    Maria Sicilia, Salvador Ruiz, and Jose L Munuera. 2005. Effects of interactivity in a web site: The moderating effect of need for cognition. Journal of advertising 34, 3 (2005), 31–44

  57. [57]

    J Spilke, HP Piepho, and X Hu. 2005. Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS system. Journal of Agronomy and crop science 191, 1 (2005), 47–54

  58. [58]

    Tracy L Tuten and Michael Bosnjak. 2001. Understanding differences in web usage: The role of need for cognition and the five factor model of personality. Social Behavior and Personality: an international journal 29, 4 (2001), 391–398

  59. [59]

    Michelle Vaccaro and Jim Waldo. 2019. The effects of mixing machine learning and human judgment. Commun. ACM 62, 11 (2019), 104–110

  60. [60]

    Tiffany C Veinot, Hannah Mitchell, and Jessica S Ancker. 2018. Good intentions are not enough: how informatics interventions can worsen inequality. Journal of the American Medical Informatics Association 25, 8 (2018), 1080–1088

  61. [61]

    Jennifer Irvin Vidrine, Vani Nath Simmons, and Thomas H. Brandon. 2007. Construction of smoking-relevant risk perceptions among college students: The influence of need for cognition and message content. Journal of Applied Social Psychology 37, 1 (2007), 91–114. https://doi.org/10.1111/j.0021-9029.2007.00149.x

  62. [62]

    Alan R Wagner, Jason Borenstein, and Ayanna Howard. 2018. Overtrust in the robotic age. Commun. ACM 61, 9 (2018), 22–24

  63. [63]

    Peter C Wason and J St BT Evans. 1974. Dual processes in reasoning? Cognition 3, 2 (1974), 141–154. Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 188. Publication date: April 2021. To Trust or to Think 188:21

  64. [64]

    Jacob Westfall, David A Kenny, and Charles M Judd. 2014. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General 143, 5 (2014)

  65. [65]

    Pamela Williams-Piehota, Tamera R Schneider, Linda Mowad, and Peter Salovey. 2003. Matching Health Messages to Information-Processing Styles : Need for Cognition and Mammography Utilization. Health Communication 15, 4 (2003), 375–392

  66. [66]

    Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems . 1–12

  67. [67]

    Vera Liao, and Rachel K

    Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 295–305. https://doi.o...