Measuring and mitigating overreliance to build human-compatible AI

Alia El Kattan; Andrew Strait; Anka Reuel; Diyi Yang; Ilia Sucholutsky; Katherine M. Collins; Kevin Feng; Lama Ahmad; Lujain Ibrahim; Max Lamparth

arxiv: 2509.08010 · v2 · pith:ZI7I3C2Jnew · submitted 2025-09-08 · 💻 cs.CY · cs.AI· cs.CL· cs.HC

Measuring and mitigating overreliance to build human-compatible AI

Lujain Ibrahim , Katherine M. Collins , Sunnie S. Y. Kim , Anka Reuel , Max Lamparth , Kevin Feng , Lama Ahmad , Prajna Soni

show 9 more authors

Alia El Kattan Merlin Stein Siddharth Swaroop Vishakh Padmakumar Ilia Sucholutsky Andrew Strait Diyi Yang Q. Vera Liao Umang Bhatt

This is my paper

Pith reviewed 2026-05-21 22:33 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CLcs.HC

keywords overreliancelarge language modelshuman-AI collaborationcognitive biasesAI safetymeasurement methodsmitigation strategies

0 comments

The pith

Measuring and mitigating overreliance must become central to LLM research and deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models function as collaborative thought partners that engage fluidly in natural language on a range of tasks. This sets them apart from earlier technologies and raises the risk of overreliance, where people depend on the models beyond their actual capabilities. The paper consolidates individual and societal risks including high-stakes errors, governance challenges, and cognitive deskilling. It reviews historical measurement approaches, identifies three gaps, and proposes new directions along with mitigation strategies to ensure LLMs augment rather than undermine human capabilities.

Core claim

Large language models distinguish themselves from previous technologies by functioning as collaborative thought partners capable of engaging more fluidly in natural language on a range of tasks. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance grows. This paper argues that measuring and mitigating overreliance must become central to LLM research and deployment because LLM characteristics, system design features, and user cognitive biases together raise serious and unique concerns about overreliance in practice.

What carries the argument

Overreliance, defined as relying on LLMs beyond their capabilities, carried by the combined effects of LLM characteristics as fluid natural-language thought partners, system design features, and user cognitive biases.

Load-bearing premise

The premise that LLM characteristics, system design features, and user cognitive biases together raise serious and unique concerns about overreliance that prior technologies did not.

What would settle it

A controlled study finding comparable rates of overreliance and comparable downstream harms when users interact with LLMs versus earlier technologies such as web search tools or rule-based decision aids on matched tasks would undermine the claim of unique concerns.

Figures

Figures reproduced from arXiv: 2509.08010 by Alia El Kattan, Andrew Strait, Anka Reuel, Diyi Yang, Ilia Sucholutsky, Katherine M. Collins, Kevin Feng, Lama Ahmad, Lujain Ibrahim, Max Lamparth, Merlin Stein, Prajna Soni, Q. Vera Liao, Siddharth Swaroop, Sunnie S. Y. Kim, Umang Bhatt, Vishakh Padmakumar.

read the original abstract

Large language models (LLMs) distinguish themselves from previous technologies by functioning as collaborative ``thought partners,'' capable of engaging more fluidly in natural language on a range of tasks. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance -- relying on LLMs beyond their capabilities -- grows. This paper argues that measuring and mitigating overreliance must become central to LLM research and deployment. First, we consolidate risks from overreliance at both the individual and societal levels, including high-stakes errors, governance challenges, and cognitive deskilling. Then, we explore LLM characteristics, system design features, and user cognitive biases that together raise serious and unique concerns about overreliance on LLMs in practice. We also examine historical approaches for measuring overreliance, identifying three important gaps and proposing three promising directions to improve measurement. Finally, we propose mitigation strategies that can be pursued to ensure LLMs augment rather than undermine human capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that overreliance on LLMs—relying on them beyond their capabilities—poses serious individual and societal risks (high-stakes errors, governance challenges, cognitive deskilling) that are qualitatively distinct from prior technologies due to LLMs' fluid natural-language collaboration. It consolidates these risks, attributes them to LLM characteristics, system design features, and user cognitive biases, reviews historical measurement approaches to identify three gaps, proposes three new measurement directions, and outlines mitigation strategies to ensure LLMs augment rather than replace human capabilities.

Significance. If the uniqueness premise holds and the proposed measurement directions can be operationalized, this position paper could usefully shift priorities in human-AI interaction and AI alignment research toward systematic evaluation of reliance behaviors. The consolidation of risks across domains and explicit identification of measurement gaps provide a clear agenda for subsequent empirical studies.

major comments (2)

[Section 3] Section 3: The assertion that LLM traits, design features, and cognitive biases 'raise serious and unique concerns about overreliance' (abstract paragraph 2) rests on illustrative examples rather than comparative incidence rates, error-severity metrics, or longitudinal deskilling data against baselines such as rule-based expert systems, web search, or GPS navigation. This gap directly undermines the load-bearing premise that these issues warrant elevating measurement and mitigation to a central research priority.
[Section 2] Section 2: The consolidation of individual and societal risks is logically structured but lacks quantitative contrasts (e.g., error rates or deskilling trajectories) with historical technologies, leaving the claim that LLM overreliance introduces qualitatively new governance and capability-undermining challenges without sufficient empirical anchoring for the central argument.

minor comments (1)

[Abstract] The abstract would benefit from briefly enumerating the three proposed measurement directions and the main mitigation strategies to improve reader orientation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. We appreciate the recognition that the paper could usefully shift priorities in human-AI interaction research if the uniqueness premise holds and the proposed directions are operationalized. As a position paper, our goal is to consolidate risks, identify measurement gaps, and outline an agenda rather than deliver new comparative empirical data. We address the major comments point by point below and have revised the manuscript to clarify the scope and nature of our arguments.

read point-by-point responses

Referee: [Section 3] Section 3: The assertion that LLM traits, design features, and cognitive biases 'raise serious and unique concerns about overreliance' (abstract paragraph 2) rests on illustrative examples rather than comparative incidence rates, error-severity metrics, or longitudinal deskilling data against baselines such as rule-based expert systems, web search, or GPS navigation. This gap directly undermines the load-bearing premise that these issues warrant elevating measurement and mitigation to a central research priority.

Authors: We agree that the paper relies on illustrative examples and qualitative distinctions rather than new comparative quantitative data. The central claim for uniqueness rests on LLMs' capacity for fluid, open-ended natural-language collaboration, which enables forms of interaction and potential cognitive integration not present in rule-based systems, search engines, or navigation tools. This distinction is drawn from existing human-AI interaction literature rather than asserted as proven by new metrics. We acknowledge the absence of direct incidence-rate or longitudinal comparisons as a limitation of the current evidence base. In the revised manuscript we have added explicit language in Section 3 stating that the uniqueness argument is a hypothesis to be tested through the measurement directions we propose, rather than a claim supported by new comparative data. This is a partial revision that clarifies scope without changing the position paper's core contribution. revision: partial
Referee: [Section 2] Section 2: The consolidation of individual and societal risks is logically structured but lacks quantitative contrasts (e.g., error rates or deskilling trajectories) with historical technologies, leaving the claim that LLM overreliance introduces qualitatively new governance and capability-undermining challenges without sufficient empirical anchoring for the central argument.

Authors: We accept that the risk consolidation would be strengthened by quantitative contrasts with prior technologies. The paper synthesizes risks reported across domains and attributes them to LLM-specific characteristics, but does not conduct or cite new comparative error-rate or deskilling analyses. Such direct contrasts remain limited in the literature precisely because LLMs are recent; this scarcity is one of the measurement gaps the paper identifies. We have revised Section 2 to include a short discussion noting the difficulty of apples-to-apples comparisons and explaining how the three proposed measurement directions are intended to generate the empirical anchors needed for future governance and deskilling studies. This partial revision improves anchoring while preserving the paper's focus on agenda-setting. revision: partial

Circularity Check

0 steps flagged

No significant circularity; position paper relies on external citations and observations

full rationale

The paper is a position and review piece that consolidates individual/societal risks, attributes concerns to LLM traits and design features via illustrative examples, reviews historical measurement approaches from prior literature, identifies gaps, and proposes mitigation strategies. No equations, fitted parameters, self-definitional constructs, or predictions appear in the provided abstract or described structure. The central argument draws on cited historical methods and domain observations rather than reducing to self-referential inputs or self-citation chains by construction. This is a standard self-contained review format with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on domain assumptions about LLM capabilities and user behavior drawn from the abstract; no free parameters or invented entities are introduced.

axioms (2)

domain assumption LLMs function as collaborative thought partners capable of engaging fluidly in natural language on a range of tasks
Stated in the first sentence of the abstract as the distinguishing feature of LLMs.
domain assumption Overreliance on LLMs creates distinct individual and societal risks not fully addressed by prior technologies
Invoked when the abstract consolidates risks and states that LLM characteristics raise unique concerns.

pith-pipeline@v0.9.0 · 5777 in / 1216 out tokens · 31509 ms · 2026-05-21T22:33:23.641695+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLMs distinguish themselves from previous technologies by functioning as collaborative 'thought partners,' capable of engaging more fluidly in natural language.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The efficiency-gain illusion: People underestimate the rate of AI use and overestimate its benefits on simple tasks
cs.CY 2026-05 accept novelty 6.0

Three pre-registered studies with 2691 participants show people underestimate their AI usage rate and overestimate efficiency gains on simple tasks, with prior use entrenching further adoption.
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows
cs.CY 2026-04 unverdicted novelty 5.0

Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.

Reference graph

Works this paper leans on

136 extracted references · 136 canonical work pages · cited by 2 Pith papers · 3 internal anchors

[1]

Mirages: On anthropomorphism in dialogue systems.arXiv preprint arXiv:2305.09800, 2023

Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, and Zeerak Talat. Mirages: On anthropomorphism in dialogue systems.arXiv preprint arXiv:2305.09800, 2023

work page arXiv 2023
[2]

Incident 838: Microsoft Copilot Allegedly Provides Unsafe Medical Advice with High Risk of Severe Harm — incidentdatabase.ai

AIID. Incident 838: Microsoft Copilot Allegedly Provides Unsafe Medical Advice with High Risk of Severe Harm — incidentdatabase.ai. https://incidentdatabase.ai/cite/838/,

work page
[3]

[Accessed 09-05-2025]

work page 2025
[4]

Allen, C.I

J.E. Allen, C.I. Guinn, and E. Horvtz. Mixed-initiative interaction.IEEE Intelligent Systems and their Applications, 14(5):14–23, 1999

work page 1999
[5]

Guidelines for human-AI interaction

Saleema Amershi, Dan Weld, Mihaela V orvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems, pages 1–13, 2019

work page 2019
[6]

How AI ideas affect the creativity, diversity, and evolution of human ideas: Evidence from a large, dynamic experiment.arXiv preprint arXiv:2401.13481, 2024

Joshua Ashkinaze, Julia Mendelsohn, Qiwei Li, Ceren Budak, and Eric Gilbert. How AI ideas affect the creativity, diversity, and evolution of human ideas: Evidence from a large, dynamic experiment.arXiv preprint arXiv:2401.13481, 2024

work page arXiv 2024
[7]

Artificial intelligence and machine learning in finance: Key concepts, ap- plications, and regulatory considerations

Alessio Azzutti. Artificial intelligence and machine learning in finance: Key concepts, ap- plications, and regulatory considerations. InThe Emerald Handbook of Fintech: Reshaping Finance, pages 315–339. Emerald Publishing Limited, 2024

work page 2024
[8]

An empirical exploration of trust dynamics in llm supply chains.arXiv preprint arXiv:2405.16310, 2024

Agathe Balayn, Mireia Yurrita, Fanny Rancourt, Fabio Casati, and Ujwal Gadiraju. An empirical exploration of trust dynamics in llm supply chains.arXiv preprint arXiv:2405.16310, 2024

work page arXiv 2024
[9]

Algorithm overdependence: How the use of algorithmic recommendation systems can increase risks to consumer well-being.Journal of Public Policy & Marketing, 38(4):500–515, 2019

Sachin Banker and Salil Khetani. Algorithm overdependence: How the use of algorithmic recommendation systems can increase risks to consumer well-being.Journal of Public Policy & Marketing, 38(4):500–515, 2019

work page 2019
[10]

Feedbacklogs: Recording and incorporating stakeholder feedback into machine learning pipelines

Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, and Umang Bhatt. Feedbacklogs: Recording and incorporating stakeholder feedback into machine learning pipelines. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–15, 2023

work page 2023
[11]

On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

work page 2021
[12]

Learning personalized decision support policies

Umang Bhatt, Valerie Chen, Katherine M Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, and Ameet Talwalkar. Learning personalized decision support policies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14203–14211, 2025

work page 2025
[13]

When should algorithms resign? a proposal for AI gover- nance.Computer, 57(10):99–103, 2024

Umang Bhatt and Holli Sargeant. When should algorithms resign? a proposal for AI gover- nance.Computer, 57(10):99–103, 2024. 10

work page 2024
[14]

To rely or not to rely? evaluating interven- tions for appropriate reliance on large language models.arXiv preprint arXiv:2412.15584, 2024

Jessica Y Bo, Sophia Wan, and Ashton Anderson. To rely or not to rely? evaluating interven- tions for appropriate reliance on large language models.arXiv preprint arXiv:2412.15584, 2024

work page arXiv 2024
[15]

Silvia Bonaccio and Reeshad S. Dalal. Advice taking and decision-making: An integrative literature review, and implications for the organizational sciences.Organizational Behavior and Human Decision Processes, 101(2):127–151, 2006

work page 2006
[16]

We need an interventionist mindset, Mar 2025

danah boyd. We need an interventionist mindset, Mar 2025

work page 2025
[17]

Machine culture.Nature Human Behaviour, 7(11):1855–1868, 2023

Levin Brinkmann, Fabian Baumann, Jean-François Bonnefon, Maxime Derex, Thomas F Müller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L Griffiths, Joseph Henrich, et al. Machine culture.Nature Human Behaviour, 7(11):1855–1868, 2023

work page 2023
[18]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making.Proceedings of the ACM on Human-computer Interaction, 5(CSCW1):1–21, 2021

work page 2021
[19]

The need for cognition.Journal of personality and social psychology, 42(1):116, 1982

John T Cacioppo and Richard E Petty. The need for cognition.Journal of personality and social psychology, 42(1):116, 1982

work page 1982
[20]

Understanding user reliance on AI in assisted decision- making.Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, 2022

Shiye Cao and Chien-Ming Huang. Understanding user reliance on AI in assisted decision- making.Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, 2022

work page 2022
[21]

Pitfalls of evidence-based AI policy.arXiv preprint arXiv:2502.09618, 2025

Stephen Casper, David Krueger, and Dylan Hadfield-Menell. Pitfalls of evidence-based AI policy.arXiv preprint arXiv:2502.09618, 2025

work page arXiv 2025
[22]

Kevin Castel

P. Kevin Castel. Mata v. avianca, inc. United States District Court, Southern District of New York, June 2023. No. 1:2022cv01461, Document 54 (S.D.N.Y . 2023)

work page 2023
[23]

Harms from increasingly agentic algorithmic systems

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly agentic algorithmic systems. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 651–666, 2023

work page 2023
[24]

Probabilistic biases meet the bayesian brain.Current Directions in Psychological Science, 29(5):506–512, 2020

Nick Chater, Jian-Qiao Zhu, Jake Spicer, Joakim Sundh, Pablo León-Villagrá, and Adam Sanborn. Probabilistic biases meet the bayesian brain.Current Directions in Psychological Science, 29(5):506–512, 2020

work page 2020
[25]

Random House, 2025

Kyle Chayka.Filterworld: How algorithms flattened culture. Random House, 2025

work page 2025
[26]

Allison Chen, Sunnie S. Y . Kim, Amaya Dharmasiri, Olga Russakovsky, and Judith E. Fan. Portraying large language models as machines, tools, or companions affects what mental capacities humans attribute to them. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’25, New York, NY , USA,

work page
[27]

Association for Computing Machinery

work page
[28]

Understanding the role of human intuition on reliance in human-AI decision-making with explanations

Valerie Chen, Q Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal. Understanding the role of human intuition on reliance in human-AI decision-making with explanations. Proceedings of the ACM on Human-computer Interaction, 7(CSCW2):1–32, 2023

work page 2023
[29]

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. Social sycophancy: A broader understanding of llm sycophancy.arXiv preprint arXiv:2505.13995, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

How individual traits and language styles shape preferences in open-ended user-llm interaction: A preliminary study

Rendi Chevi, Kentaro Inui, Thamar Solorio, and Alham Fikri Aji. How individual traits and language styles shape preferences in open-ended user-llm interaction: A preliminary study. arXiv preprint arXiv:2504.17083, 2025

work page arXiv 2025
[31]

Avishek Choudhury and Zaira Chaudhry. Large language models and user trust: consequence of self-referential learning loop and the deskilling of health care professionals.Journal of Medical Internet Research, 26:e56764, 2024

work page 2024
[32]

arXiv preprint arXiv:2501.10476 (2025)

Katherine M Collins, Umang Bhatt, and Ilia Sucholutsky. Revisiting rogers’ paradox in the context of human-AI interaction.arXiv preprint arXiv:2501.10476, 2025. 11

work page arXiv 2025
[33]

arXiv preprint arXiv:2407.12804 (2024)

Katherine M Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, and Umang Bhatt. Modulating language model experiences through frictions.arXiv preprint arXiv:2407.12804, 2024

work page arXiv 2024
[34]

Building machines that learn and think with people.Nature human behaviour, 8(10):1851–1863, 2024

Katherine M Collins, Ilia Sucholutsky, Umang Bhatt, Kartik Chandra, Lionel Wong, Mina Lee, Cedegao E Zhang, Tan Zhi-Xuan, Mark Ho, Vikash Mansinghka, et al. Building machines that learn and think with people.Nature human behaviour, 8(10):1851–1863, 2024

work page 2024
[35]

Survival of the best fit.USA

Gabor Csapo, Jihyun Kim, Miha Klasinc, and Alia ElKattan. Survival of the best fit.USA. https://www. survivalofthebestfit. com, 2019

work page 2019
[36]

Can Democracy Survive the Disruptive Power of AI? — carnegieendowment.org

Raluca Csernatoni. Can Democracy Survive the Disruptive Power of AI? — carnegieendowment.org. https://carnegieendowment.org/research/2024/12/ can-democracy-survive-the-disruptive-power-of-ai?lang=en , 2024. [Accessed 09-05-2025]

work page 2024
[37]

AI and procurement.Manufacturing & Service Operations Management, 24(2):691–706, 2022

Ruomeng Cui, Meng Li, and Shichen Zhang. AI and procurement.Manufacturing & Service Operations Management, 24(2):691–706, 2022

work page 2022
[38]

Automation and accountability in decision support system interface design

Mary L Cummings. Automation and accountability in decision support system interface design. 2006

work page 2006
[39]

Mixed-initiative creative interfaces

Sebastian Deterding, Jonathan Hook, Rebecca Fiebrink, Marco Gillies, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis, and Kate Compton. Mixed-initiative creative interfaces. InProceedings of the 2017 CHI conference extended abstracts on human factors in computing systems, pages 628–635, 2017

work page 2017
[40]

Multicalibration for confidence scoring in llms.arXiv preprint arXiv:2404.04689, 2024

Gianluca Detommaso, Martin Bertran, Riccardo Fogliato, and Aaron Roth. Multicalibration for confidence scoring in llms.arXiv preprint arXiv:2404.04689, 2024

work page arXiv 2024
[41]

Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

work page 2015
[42]

Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, 2018

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, 2018

work page 2018
[43]

The role of trust in automation reliance.International journal of human-computer studies, 58(6):697–718, 2003

Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. The role of trust in automation reliance.International journal of human-computer studies, 58(6):697–718, 2003

work page 2003
[44]

Relational norms for human-AI cooperation.arXiv preprint arXiv:2502.12102, 2025

Brian D Earp, Sebastian Porsdam Mann, Mateo Aboy, Edmond Awad, Monika Betzler, Marietjie Botes, Rachel Calcott, Mina Caraccio, Nick Chater, Mark Coeckelbergh, et al. Relational norms for human-AI cooperation.arXiv preprint arXiv:2502.12102, 2025

work page arXiv 2025
[45]

How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study

Cathy Mengying Fang, Auren R Liu, Valdemar Danry, Eunhae Lee, Samantha WT Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, et al. How AI and human behaviors shape psychosocial effects of chatbot use: A longitudinal randomized controlled study.arXiv preprint arXiv:2503.17473, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang

K.J. Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. Cocoa: Co-planning and co-execution with AI agents.arXiv preprint arXiv:2412.10999, 2024

work page arXiv 2024
[47]

The human factor of AI: Implications for critical thinking and societal anxieties.TECHNOLOGY AND SOCIETY: Boon or Bane?, page 8, 2025

Michael Gerlich. The human factor of AI: Implications for critical thinking and societal anxieties.TECHNOLOGY AND SOCIETY: Boon or Bane?, page 8, 2025

work page 2025
[48]

Human trust in artificial intelligence: Review of empirical research.Academy of management annals, 14(2):627–660, 2020

Ella Glikson and Anita Williams Woolley. Human trust in artificial intelligence: Review of empirical research.Academy of management annals, 14(2):627–660, 2020

work page 2020
[49]

Paul Grice

H. Paul Grice. Logic and conversation. In Donald Davidson, editor,The logic of grammar, pages 64–75. Dickenson Pub. Co., 1975. 12

work page 1975
[50]

Griffiths

Thomas L. Griffiths. Understanding human intelligence through human limitations.Trends in Cognitive Sciences, 24(11):873–883, 2020

work page 2020
[51]

MIT Press, 2024

Thomas L Griffiths, Nick Chater, and Joshua B Tenenbaum.Bayesian models of cognition: reverse engineering the mind. MIT Press, 2024

work page 2024
[52]

A decision theoretic frame- work for measuring AI reliance

Ziyang Guo, Yifan Wu, Jason D Hartline, and Jessica Hullman. A decision theoretic frame- work for measuring AI reliance. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 221–236, 2024

work page 2024
[53]

Taking advice: Accepting help, improving judgment, and sharing responsibility.Organizational Behavior and Human Decision Processes, 70(2):117– 133, 1997

Nigel Harvey and Ilan Fischer. Taking advice: Accepting help, improving judgment, and sharing responsibility.Organizational Behavior and Human Decision Processes, 70(2):117– 133, 1997

work page 1997
[54]

Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant

Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, New York, NY , USA, 2025. Association for Computing Machinery

work page 2025
[55]

Knowing about knowing: An illusion of human competence can hinder appropriate reliance on AI systems

Gaole He, Lucie Kuiper, and Ujwal Gadiraju. Knowing about knowing: An illusion of human competence can hinder appropriate reliance on AI systems. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery

work page 2023
[56]

Trust in automation: Integrating empirical evidence on factors that influence trust.Human factors, 57(3):407–434, 2015

Kevin Anthony Hoff and Masooda Bashir. Trust in automation: Integrating empirical evidence on factors that influence trust.Human factors, 57(3):407–434, 2015

work page 2015
[57]

Principles of mixed-initiative user interfaces

Eric Horvitz. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’99, page 159–166, New York, NY , USA, 1999. Association for Computing Machinery

work page 1999
[58]

Yoyo Tsung-Yu Hou and Malte F Jung. Who is the expert? reconciling algorithm aversion and algorithm appreciation in AI-supported decision making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–25, 2021

work page 2021
[59]

Position: We need an adaptive interpretation of helpful, honest, and harmless principles.arXiv preprint arXiv:2502.06059, 2025

Yue Huang, Chujie Gao, Yujun Zhou, Kehan Guo, Xiangqi Wang, Or Cohen-Sasson, Max Lamparth, and Xiangliang Zhang. Position: We need an adaptive interpretation of helpful, honest, and harmless principles.arXiv preprint arXiv:2502.06059, 2025

work page arXiv 2025
[60]

Decision theoretic foundations for experiments evaluating human decisions

Jessica Hullman, Alex Kale, and Jason Hartline. Decision theoretic foundations for experiments evaluating human decisions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2025

work page 2025
[61]

Monitoring human dependence on AI systems with reliance drills.arXiv preprint arXiv:2409.14055, 2024

Rosco Hunter, Richard Moulange, Jamie Bernardi, and Merlin Stein. Monitoring human dependence on AI systems with reliance drills.arXiv preprint arXiv:2409.14055, 2024

work page arXiv 2024
[62]

Multi-turn evaluation of anthropomorphic behaviours in large language models.arXiv preprint arXiv:2502.07077, 2025

Lujain Ibrahim, Canfer Akbulut, Rasmi Elasmar, Charvi Rastogi, Minsuk Kahng, Mered- ith Ringel Morris, Kevin R McKee, Verena Rieser, Murray Shanahan, and Laura Weidinger. Multi-turn evaluation of anthropomorphic behaviours in large language models.arXiv preprint arXiv:2502.07077, 2025

work page arXiv 2025
[63]

Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919, 2025

Lujain Ibrahim, Franziska Sofia Hafner, and Luc Rocher. Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919, 2025

work page arXiv 2025
[64]

To- wards interactive evaluations for interaction harms in human-ai systems.arXiv preprint arXiv:2405.10632, 2024

Lujain Ibrahim, Saffron Huang, Umang Bhatt, Lama Ahmad, and Markus Anderljung. To- wards interactive evaluations for interaction harms in human-ai systems.arXiv preprint arXiv:2405.10632, 2024

work page arXiv 2024
[65]

Kahr, Gerrit Rooks, Chris Snijders, and Martijn C

Patricia K. Kahr, Gerrit Rooks, Chris Snijders, and Martijn C. Willemsen. The trust recovery journey. the effect of timing of errors on the willingness to follow AI advice. InProceedings of the 29th International Conference on Intelligent User Interfaces, IUI ’24, page 609–622, New York, NY , USA, 2024. Association for Computing Machinery. 13

work page 2024
[66]

Capturing humans’ mental models of AI: An item response theory approach

Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. Capturing humans’ mental models of AI: An item response theory approach. InProceedings of the 2023 ACM conference on fairness, accountability, and transparency, pages 1723–1734, 2023

work page 2023
[67]

I’m Not Sure, But

Sunnie S. Y . Kim, Q Vera Liao, Mihaela V orvoreanu, Stephanie Ballard, and Jennifer Wortman Vaughan. "I’m Not Sure, But...": Examining the Impact of Large Language Models’ Uncer- tainty Expression on User Reliance and Trust. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 822–835, 2024

work page 2024
[68]

Sunnie S. Y . Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, and Olga Russakovsky. Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies. InACM Conference on Human Factors in Computing Systems (CHI), 2025

work page 2025
[69]

Sunnie S. Y . Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, and Andrés Monroy-Hernández. Humans, AI, and Context: Understanding End-Users’ Trust in a Real- World Computer Vision Application. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 77–88, New York, NY , USA, 2023. Association for C...

work page 2023
[70]

Algorithmic monoculture and social welfare.Proceed- ings of the National Academy of Sciences, 118(22):e2018340118, 2021

Jon Kleinberg and Manish Raghavan. Algorithmic monoculture and social welfare.Proceed- ings of the National Academy of Sciences, 118(22):e2018340118, 2021

work page 2021
[71]

Large language models, politics, and the functionalization of language.AI and Ethics, pages 1–13, 2024

Olya Kudina and Bas de Boer. Large language models, politics, and the functionalization of language.AI and Ethics, pages 1–13, 2024

work page 2024
[72]

Towards a science of human-AI decision making: An overview of design space in empirical human- subject studies

Vivian Lai, Chacha Chen, Alison Smith-Renner, Q Vera Liao, and Chenhao Tan. Towards a science of human-AI decision making: An overview of design space in empirical human- subject studies. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1369–1385, 2023

work page 2023
[73]

Selective ex- planations: Leveraging human input to align explainable AI.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–35, 2023

Vivian Lai, Yiming Zhang, Chacha Chen, Q Vera Liao, and Chenhao Tan. Selective ex- planations: Leveraging human input to align explainable AI.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–35, 2023

work page 2023
[74]

The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25...

work page 2025
[75]

Trust, control strategies and allocation of function in human- machine systems.Ergonomics, 35(10):1243–1270, 1992

John D Lee and Neville Moray. Trust, control strategies and allocation of function in human- machine systems.Ergonomics, 35(10):1243–1270, 1992

work page 1992
[76]

Trust, self-confidence, and operators’ adaptation to automation

John D Lee and Neville Moray. Trust, self-confidence, and operators’ adaptation to automation. International journal of human-computer studies, 40(1):153–184, 1994

work page 1994
[77]

Trust in automation: Designing for appropriate reliance

John D Lee and Katrina A See. Trust in automation: Designing for appropriate reliance. Human factors, 46(1):50–80, 2004

work page 2004
[78]

Griffiths

Falk Lieder and Thomas L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral and Brain Sciences, 43:e1, 2020

work page 2020
[79]

Large language models assume people are more rational than we really are.arXiv preprint arXiv:2406.17055, 2024

Ryan Liu, Jiayi Geng, Joshua C Peterson, Ilia Sucholutsky, and Thomas L Griffiths. Large language models assume people are more rational than we really are.arXiv preprint arXiv:2406.17055, 2024

work page arXiv 2024
[80]

Logg, Julia A

Jennifer M. Logg, Julia A. Minson, and Don A. Moore. Algorithm appreciation: People prefer algorithmic to human judgment.Organizational Behavior and Human Decision Processes, 151:90–103, 2019. 14

work page 2019

Showing first 80 references.

[1] [1]

Mirages: On anthropomorphism in dialogue systems.arXiv preprint arXiv:2305.09800, 2023

Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, and Zeerak Talat. Mirages: On anthropomorphism in dialogue systems.arXiv preprint arXiv:2305.09800, 2023

work page arXiv 2023

[2] [2]

Incident 838: Microsoft Copilot Allegedly Provides Unsafe Medical Advice with High Risk of Severe Harm — incidentdatabase.ai

AIID. Incident 838: Microsoft Copilot Allegedly Provides Unsafe Medical Advice with High Risk of Severe Harm — incidentdatabase.ai. https://incidentdatabase.ai/cite/838/,

work page

[3] [3]

[Accessed 09-05-2025]

work page 2025

[4] [4]

Allen, C.I

J.E. Allen, C.I. Guinn, and E. Horvtz. Mixed-initiative interaction.IEEE Intelligent Systems and their Applications, 14(5):14–23, 1999

work page 1999

[5] [5]

Guidelines for human-AI interaction

Saleema Amershi, Dan Weld, Mihaela V orvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems, pages 1–13, 2019

work page 2019

[6] [6]

How AI ideas affect the creativity, diversity, and evolution of human ideas: Evidence from a large, dynamic experiment.arXiv preprint arXiv:2401.13481, 2024

Joshua Ashkinaze, Julia Mendelsohn, Qiwei Li, Ceren Budak, and Eric Gilbert. How AI ideas affect the creativity, diversity, and evolution of human ideas: Evidence from a large, dynamic experiment.arXiv preprint arXiv:2401.13481, 2024

work page arXiv 2024

[7] [7]

Artificial intelligence and machine learning in finance: Key concepts, ap- plications, and regulatory considerations

Alessio Azzutti. Artificial intelligence and machine learning in finance: Key concepts, ap- plications, and regulatory considerations. InThe Emerald Handbook of Fintech: Reshaping Finance, pages 315–339. Emerald Publishing Limited, 2024

work page 2024

[8] [8]

An empirical exploration of trust dynamics in llm supply chains.arXiv preprint arXiv:2405.16310, 2024

Agathe Balayn, Mireia Yurrita, Fanny Rancourt, Fabio Casati, and Ujwal Gadiraju. An empirical exploration of trust dynamics in llm supply chains.arXiv preprint arXiv:2405.16310, 2024

work page arXiv 2024

[9] [9]

Algorithm overdependence: How the use of algorithmic recommendation systems can increase risks to consumer well-being.Journal of Public Policy & Marketing, 38(4):500–515, 2019

Sachin Banker and Salil Khetani. Algorithm overdependence: How the use of algorithmic recommendation systems can increase risks to consumer well-being.Journal of Public Policy & Marketing, 38(4):500–515, 2019

work page 2019

[10] [10]

Feedbacklogs: Recording and incorporating stakeholder feedback into machine learning pipelines

Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, and Umang Bhatt. Feedbacklogs: Recording and incorporating stakeholder feedback into machine learning pipelines. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–15, 2023

work page 2023

[11] [11]

On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

work page 2021

[12] [12]

Learning personalized decision support policies

Umang Bhatt, Valerie Chen, Katherine M Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, and Ameet Talwalkar. Learning personalized decision support policies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14203–14211, 2025

work page 2025

[13] [13]

When should algorithms resign? a proposal for AI gover- nance.Computer, 57(10):99–103, 2024

Umang Bhatt and Holli Sargeant. When should algorithms resign? a proposal for AI gover- nance.Computer, 57(10):99–103, 2024. 10

work page 2024

[14] [14]

To rely or not to rely? evaluating interven- tions for appropriate reliance on large language models.arXiv preprint arXiv:2412.15584, 2024

Jessica Y Bo, Sophia Wan, and Ashton Anderson. To rely or not to rely? evaluating interven- tions for appropriate reliance on large language models.arXiv preprint arXiv:2412.15584, 2024

work page arXiv 2024

[15] [15]

Silvia Bonaccio and Reeshad S. Dalal. Advice taking and decision-making: An integrative literature review, and implications for the organizational sciences.Organizational Behavior and Human Decision Processes, 101(2):127–151, 2006

work page 2006

[16] [16]

We need an interventionist mindset, Mar 2025

danah boyd. We need an interventionist mindset, Mar 2025

work page 2025

[17] [17]

Machine culture.Nature Human Behaviour, 7(11):1855–1868, 2023

Levin Brinkmann, Fabian Baumann, Jean-François Bonnefon, Maxime Derex, Thomas F Müller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L Griffiths, Joseph Henrich, et al. Machine culture.Nature Human Behaviour, 7(11):1855–1868, 2023

work page 2023

[18] [18]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making.Proceedings of the ACM on Human-computer Interaction, 5(CSCW1):1–21, 2021

work page 2021

[19] [19]

The need for cognition.Journal of personality and social psychology, 42(1):116, 1982

John T Cacioppo and Richard E Petty. The need for cognition.Journal of personality and social psychology, 42(1):116, 1982

work page 1982

[20] [20]

Understanding user reliance on AI in assisted decision- making.Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, 2022

Shiye Cao and Chien-Ming Huang. Understanding user reliance on AI in assisted decision- making.Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–23, 2022

work page 2022

[21] [21]

Pitfalls of evidence-based AI policy.arXiv preprint arXiv:2502.09618, 2025

Stephen Casper, David Krueger, and Dylan Hadfield-Menell. Pitfalls of evidence-based AI policy.arXiv preprint arXiv:2502.09618, 2025

work page arXiv 2025

[22] [22]

Kevin Castel

P. Kevin Castel. Mata v. avianca, inc. United States District Court, Southern District of New York, June 2023. No. 1:2022cv01461, Document 54 (S.D.N.Y . 2023)

work page 2023

[23] [23]

Harms from increasingly agentic algorithmic systems

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly agentic algorithmic systems. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 651–666, 2023

work page 2023

[24] [24]

Probabilistic biases meet the bayesian brain.Current Directions in Psychological Science, 29(5):506–512, 2020

Nick Chater, Jian-Qiao Zhu, Jake Spicer, Joakim Sundh, Pablo León-Villagrá, and Adam Sanborn. Probabilistic biases meet the bayesian brain.Current Directions in Psychological Science, 29(5):506–512, 2020

work page 2020

[25] [25]

Random House, 2025

Kyle Chayka.Filterworld: How algorithms flattened culture. Random House, 2025

work page 2025

[26] [26]

Allison Chen, Sunnie S. Y . Kim, Amaya Dharmasiri, Olga Russakovsky, and Judith E. Fan. Portraying large language models as machines, tools, or companions affects what mental capacities humans attribute to them. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’25, New York, NY , USA,

work page

[27] [27]

Association for Computing Machinery

work page

[28] [28]

Understanding the role of human intuition on reliance in human-AI decision-making with explanations

Valerie Chen, Q Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal. Understanding the role of human intuition on reliance in human-AI decision-making with explanations. Proceedings of the ACM on Human-computer Interaction, 7(CSCW2):1–32, 2023

work page 2023

[29] [29]

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. Social sycophancy: A broader understanding of llm sycophancy.arXiv preprint arXiv:2505.13995, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

How individual traits and language styles shape preferences in open-ended user-llm interaction: A preliminary study

Rendi Chevi, Kentaro Inui, Thamar Solorio, and Alham Fikri Aji. How individual traits and language styles shape preferences in open-ended user-llm interaction: A preliminary study. arXiv preprint arXiv:2504.17083, 2025

work page arXiv 2025

[31] [31]

Avishek Choudhury and Zaira Chaudhry. Large language models and user trust: consequence of self-referential learning loop and the deskilling of health care professionals.Journal of Medical Internet Research, 26:e56764, 2024

work page 2024

[32] [32]

arXiv preprint arXiv:2501.10476 (2025)

Katherine M Collins, Umang Bhatt, and Ilia Sucholutsky. Revisiting rogers’ paradox in the context of human-AI interaction.arXiv preprint arXiv:2501.10476, 2025. 11

work page arXiv 2025

[33] [33]

arXiv preprint arXiv:2407.12804 (2024)

Katherine M Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, and Umang Bhatt. Modulating language model experiences through frictions.arXiv preprint arXiv:2407.12804, 2024

work page arXiv 2024

[34] [34]

Building machines that learn and think with people.Nature human behaviour, 8(10):1851–1863, 2024

Katherine M Collins, Ilia Sucholutsky, Umang Bhatt, Kartik Chandra, Lionel Wong, Mina Lee, Cedegao E Zhang, Tan Zhi-Xuan, Mark Ho, Vikash Mansinghka, et al. Building machines that learn and think with people.Nature human behaviour, 8(10):1851–1863, 2024

work page 2024

[35] [35]

Survival of the best fit.USA

Gabor Csapo, Jihyun Kim, Miha Klasinc, and Alia ElKattan. Survival of the best fit.USA. https://www. survivalofthebestfit. com, 2019

work page 2019

[36] [36]

Can Democracy Survive the Disruptive Power of AI? — carnegieendowment.org

Raluca Csernatoni. Can Democracy Survive the Disruptive Power of AI? — carnegieendowment.org. https://carnegieendowment.org/research/2024/12/ can-democracy-survive-the-disruptive-power-of-ai?lang=en , 2024. [Accessed 09-05-2025]

work page 2024

[37] [37]

AI and procurement.Manufacturing & Service Operations Management, 24(2):691–706, 2022

Ruomeng Cui, Meng Li, and Shichen Zhang. AI and procurement.Manufacturing & Service Operations Management, 24(2):691–706, 2022

work page 2022

[38] [38]

Automation and accountability in decision support system interface design

Mary L Cummings. Automation and accountability in decision support system interface design. 2006

work page 2006

[39] [39]

Mixed-initiative creative interfaces

Sebastian Deterding, Jonathan Hook, Rebecca Fiebrink, Marco Gillies, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis, and Kate Compton. Mixed-initiative creative interfaces. InProceedings of the 2017 CHI conference extended abstracts on human factors in computing systems, pages 628–635, 2017

work page 2017

[40] [40]

Multicalibration for confidence scoring in llms.arXiv preprint arXiv:2404.04689, 2024

Gianluca Detommaso, Martin Bertran, Riccardo Fogliato, and Aaron Roth. Multicalibration for confidence scoring in llms.arXiv preprint arXiv:2404.04689, 2024

work page arXiv 2024

[41] [41]

Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of experimental psychology: General, 144(1):114, 2015

work page 2015

[42] [42]

Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, 2018

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them.Management Science, 64(3):1155–1170, 2018

work page 2018

[43] [43]

The role of trust in automation reliance.International journal of human-computer studies, 58(6):697–718, 2003

Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. The role of trust in automation reliance.International journal of human-computer studies, 58(6):697–718, 2003

work page 2003

[44] [44]

Relational norms for human-AI cooperation.arXiv preprint arXiv:2502.12102, 2025

Brian D Earp, Sebastian Porsdam Mann, Mateo Aboy, Edmond Awad, Monika Betzler, Marietjie Botes, Rachel Calcott, Mina Caraccio, Nick Chater, Mark Coeckelbergh, et al. Relational norms for human-AI cooperation.arXiv preprint arXiv:2502.12102, 2025

work page arXiv 2025

[45] [45]

How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study

Cathy Mengying Fang, Auren R Liu, Valdemar Danry, Eunhae Lee, Samantha WT Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, et al. How AI and human behaviors shape psychosocial effects of chatbot use: A longitudinal randomized controlled study.arXiv preprint arXiv:2503.17473, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang

K.J. Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. Cocoa: Co-planning and co-execution with AI agents.arXiv preprint arXiv:2412.10999, 2024

work page arXiv 2024

[47] [47]

The human factor of AI: Implications for critical thinking and societal anxieties.TECHNOLOGY AND SOCIETY: Boon or Bane?, page 8, 2025

Michael Gerlich. The human factor of AI: Implications for critical thinking and societal anxieties.TECHNOLOGY AND SOCIETY: Boon or Bane?, page 8, 2025

work page 2025

[48] [48]

Human trust in artificial intelligence: Review of empirical research.Academy of management annals, 14(2):627–660, 2020

Ella Glikson and Anita Williams Woolley. Human trust in artificial intelligence: Review of empirical research.Academy of management annals, 14(2):627–660, 2020

work page 2020

[49] [49]

Paul Grice

H. Paul Grice. Logic and conversation. In Donald Davidson, editor,The logic of grammar, pages 64–75. Dickenson Pub. Co., 1975. 12

work page 1975

[50] [50]

Griffiths

Thomas L. Griffiths. Understanding human intelligence through human limitations.Trends in Cognitive Sciences, 24(11):873–883, 2020

work page 2020

[51] [51]

MIT Press, 2024

Thomas L Griffiths, Nick Chater, and Joshua B Tenenbaum.Bayesian models of cognition: reverse engineering the mind. MIT Press, 2024

work page 2024

[52] [52]

A decision theoretic frame- work for measuring AI reliance

Ziyang Guo, Yifan Wu, Jason D Hartline, and Jessica Hullman. A decision theoretic frame- work for measuring AI reliance. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 221–236, 2024

work page 2024

[53] [53]

Taking advice: Accepting help, improving judgment, and sharing responsibility.Organizational Behavior and Human Decision Processes, 70(2):117– 133, 1997

Nigel Harvey and Ilan Fischer. Taking advice: Accepting help, improving judgment, and sharing responsibility.Organizational Behavior and Human Decision Processes, 70(2):117– 133, 1997

work page 1997

[54] [54]

Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant

Gaole He, Gianluca Demartini, and Ujwal Gadiraju. Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, New York, NY , USA, 2025. Association for Computing Machinery

work page 2025

[55] [55]

Knowing about knowing: An illusion of human competence can hinder appropriate reliance on AI systems

Gaole He, Lucie Kuiper, and Ujwal Gadiraju. Knowing about knowing: An illusion of human competence can hinder appropriate reliance on AI systems. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery

work page 2023

[56] [56]

Trust in automation: Integrating empirical evidence on factors that influence trust.Human factors, 57(3):407–434, 2015

Kevin Anthony Hoff and Masooda Bashir. Trust in automation: Integrating empirical evidence on factors that influence trust.Human factors, 57(3):407–434, 2015

work page 2015

[57] [57]

Principles of mixed-initiative user interfaces

Eric Horvitz. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’99, page 159–166, New York, NY , USA, 1999. Association for Computing Machinery

work page 1999

[58] [58]

Yoyo Tsung-Yu Hou and Malte F Jung. Who is the expert? reconciling algorithm aversion and algorithm appreciation in AI-supported decision making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–25, 2021

work page 2021

[59] [59]

Position: We need an adaptive interpretation of helpful, honest, and harmless principles.arXiv preprint arXiv:2502.06059, 2025

Yue Huang, Chujie Gao, Yujun Zhou, Kehan Guo, Xiangqi Wang, Or Cohen-Sasson, Max Lamparth, and Xiangliang Zhang. Position: We need an adaptive interpretation of helpful, honest, and harmless principles.arXiv preprint arXiv:2502.06059, 2025

work page arXiv 2025

[60] [60]

Decision theoretic foundations for experiments evaluating human decisions

Jessica Hullman, Alex Kale, and Jason Hartline. Decision theoretic foundations for experiments evaluating human decisions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2025

work page 2025

[61] [61]

Monitoring human dependence on AI systems with reliance drills.arXiv preprint arXiv:2409.14055, 2024

Rosco Hunter, Richard Moulange, Jamie Bernardi, and Merlin Stein. Monitoring human dependence on AI systems with reliance drills.arXiv preprint arXiv:2409.14055, 2024

work page arXiv 2024

[62] [62]

Multi-turn evaluation of anthropomorphic behaviours in large language models.arXiv preprint arXiv:2502.07077, 2025

Lujain Ibrahim, Canfer Akbulut, Rasmi Elasmar, Charvi Rastogi, Minsuk Kahng, Mered- ith Ringel Morris, Kevin R McKee, Verena Rieser, Murray Shanahan, and Laura Weidinger. Multi-turn evaluation of anthropomorphic behaviours in large language models.arXiv preprint arXiv:2502.07077, 2025

work page arXiv 2025

[63] [63]

Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919, 2025

Lujain Ibrahim, Franziska Sofia Hafner, and Luc Rocher. Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919, 2025

work page arXiv 2025

[64] [64]

To- wards interactive evaluations for interaction harms in human-ai systems.arXiv preprint arXiv:2405.10632, 2024

Lujain Ibrahim, Saffron Huang, Umang Bhatt, Lama Ahmad, and Markus Anderljung. To- wards interactive evaluations for interaction harms in human-ai systems.arXiv preprint arXiv:2405.10632, 2024

work page arXiv 2024

[65] [65]

Kahr, Gerrit Rooks, Chris Snijders, and Martijn C

Patricia K. Kahr, Gerrit Rooks, Chris Snijders, and Martijn C. Willemsen. The trust recovery journey. the effect of timing of errors on the willingness to follow AI advice. InProceedings of the 29th International Conference on Intelligent User Interfaces, IUI ’24, page 609–622, New York, NY , USA, 2024. Association for Computing Machinery. 13

work page 2024

[66] [66]

Capturing humans’ mental models of AI: An item response theory approach

Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. Capturing humans’ mental models of AI: An item response theory approach. InProceedings of the 2023 ACM conference on fairness, accountability, and transparency, pages 1723–1734, 2023

work page 2023

[67] [67]

I’m Not Sure, But

Sunnie S. Y . Kim, Q Vera Liao, Mihaela V orvoreanu, Stephanie Ballard, and Jennifer Wortman Vaughan. "I’m Not Sure, But...": Examining the Impact of Large Language Models’ Uncer- tainty Expression on User Reliance and Trust. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 822–835, 2024

work page 2024

[68] [68]

Sunnie S. Y . Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, and Olga Russakovsky. Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies. InACM Conference on Human Factors in Computing Systems (CHI), 2025

work page 2025

[69] [69]

Sunnie S. Y . Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, and Andrés Monroy-Hernández. Humans, AI, and Context: Understanding End-Users’ Trust in a Real- World Computer Vision Application. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 77–88, New York, NY , USA, 2023. Association for C...

work page 2023

[70] [70]

Algorithmic monoculture and social welfare.Proceed- ings of the National Academy of Sciences, 118(22):e2018340118, 2021

Jon Kleinberg and Manish Raghavan. Algorithmic monoculture and social welfare.Proceed- ings of the National Academy of Sciences, 118(22):e2018340118, 2021

work page 2021

[71] [71]

Large language models, politics, and the functionalization of language.AI and Ethics, pages 1–13, 2024

Olya Kudina and Bas de Boer. Large language models, politics, and the functionalization of language.AI and Ethics, pages 1–13, 2024

work page 2024

[72] [72]

Towards a science of human-AI decision making: An overview of design space in empirical human- subject studies

Vivian Lai, Chacha Chen, Alison Smith-Renner, Q Vera Liao, and Chenhao Tan. Towards a science of human-AI decision making: An overview of design space in empirical human- subject studies. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1369–1385, 2023

work page 2023

[73] [73]

Selective ex- planations: Leveraging human input to align explainable AI.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–35, 2023

Vivian Lai, Yiming Zhang, Chacha Chen, Q Vera Liao, and Chenhao Tan. Selective ex- planations: Leveraging human input to align explainable AI.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–35, 2023

work page 2023

[74] [74]

The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25...

work page 2025

[75] [75]

Trust, control strategies and allocation of function in human- machine systems.Ergonomics, 35(10):1243–1270, 1992

John D Lee and Neville Moray. Trust, control strategies and allocation of function in human- machine systems.Ergonomics, 35(10):1243–1270, 1992

work page 1992

[76] [76]

Trust, self-confidence, and operators’ adaptation to automation

John D Lee and Neville Moray. Trust, self-confidence, and operators’ adaptation to automation. International journal of human-computer studies, 40(1):153–184, 1994

work page 1994

[77] [77]

Trust in automation: Designing for appropriate reliance

John D Lee and Katrina A See. Trust in automation: Designing for appropriate reliance. Human factors, 46(1):50–80, 2004

work page 2004

[78] [78]

Griffiths

Falk Lieder and Thomas L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral and Brain Sciences, 43:e1, 2020

work page 2020

[79] [79]

Large language models assume people are more rational than we really are.arXiv preprint arXiv:2406.17055, 2024

Ryan Liu, Jiayi Geng, Joshua C Peterson, Ilia Sucholutsky, and Thomas L Griffiths. Large language models assume people are more rational than we really are.arXiv preprint arXiv:2406.17055, 2024

work page arXiv 2024

[80] [80]

Logg, Julia A

Jennifer M. Logg, Julia A. Minson, and Don A. Moore. Algorithm appreciation: People prefer algorithmic to human judgment.Organizational Behavior and Human Decision Processes, 151:90–103, 2019. 14

work page 2019