The efficiency-gain illusion: People underestimate the rate of AI use and overestimate its benefits on simple tasks
Pith reviewed 2026-05-22 03:24 UTC · model grok-4.3
The pith
People underestimate how often they use AI and overestimate the time savings it delivers on simple tasks
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Participants display self-estimate miscalibration by underreporting their actual AI usage relative to logged behavior, and efficiency-gain illusions by believing AI produces larger reductions in time and effort than the measured differences show. Prior AI use within a session increases adoption in the following session and entrenches the overestimation of benefits, raising the prospect of an overreliance feedback loop.
What carries the argument
The efficiency-gain illusion, the systematic overestimation of time and effort savings from AI on cognitively simple tasks, paired with underestimation of personal usage frequency.
If this is right
- People select AI assistance for tasks that deliver no meaningful time or effort reduction.
- AI use in one session raises the probability of AI use in the next session.
- Overestimation of savings grows stronger with repeated AI exposure.
- Without external feedback the pattern can settle into a self-sustaining overreliance loop.
Where Pith is reading between the lines
- Interfaces that display objective usage statistics or actual time costs could interrupt the miscalibration before it becomes habitual.
- The same pattern may appear when people decide whether to adopt AI for moderately complex work, not only simple tasks.
- Training or design interventions that make real savings visible could shift choices even if the underlying illusion remains.
Load-bearing premise
Participants' self-reports of usage frequency and time savings accurately reflect their genuine perceptions rather than study-specific influences or task familiarity.
What would settle it
A replication that logs AI usage and task completion times while showing self-reported usage rates match the logs and that reported savings match or fall below the actual time differences would undermine the miscalibration findings.
read the original abstract
People are increasingly turning to AI assistance for simple tasks, e.g., arithmetic, spell-check, and answering simple questions. But does AI assistance actually save users time and effort? We investigate people's propensity to use AI for cognitively simple tasks and assess whether their reliance is well-calibrated. Across three pre-registered user studies (N = 2691), we find that people frequently choose to use AI even when doing so is inefficient (i.e. provides no meaningful time or effort savings). We identify systematic miscalibration at two levels: (1) a self-estimate miscalibration where people on average believe that they are using AI less than they actually are, and (2) efficiency-gain illusions where people overestimate how much time and effort savings AI use affords. We also identify a session-level carryover effect where a participant's prior AI use leads to further AI adoption and entrenches their miscalibration about time savings. Our results shed light on the mechanisms and biases underlying people's choice of whether to use AI as well as the risk of an overreliance feedback loop.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports three pre-registered studies (total N=2691) showing that participants choose AI assistance for simple tasks (arithmetic, spell-check, simple questions) even when it yields no meaningful time or effort savings. It documents two forms of miscalibration: underestimation of personal AI usage rates relative to actual behavior, and overestimation of time/effort savings from AI; it further identifies a session-level carryover effect in which prior AI use increases subsequent adoption and entrenches miscalibrations about savings.
Significance. If the central behavioral patterns hold, the work is significant for human-AI interaction research: it supplies large-sample, pre-registered evidence of systematic miscalibration and a potential overreliance feedback loop on routine tasks. The pre-registration and sample size are clear strengths that reduce selection bias and support falsifiable claims about usage choices and subjective estimates. These results could guide interface design and user training aimed at aligning expectations with actual efficiency gains.
major comments (2)
- [Methods (Studies 1–3)] Methods section (Studies 1–3): the efficiency-gain illusion claim rests on comparing estimated versus actual time/effort savings, yet the manuscript does not specify whether 'actual' savings are derived from objective logs that include prompting, verification, and error-correction overhead or from post-task subjective ratings; without this distinction the overestimation finding risks conflating measurement artifacts with true miscalibration.
- [Results (carryover effect)] Results (carryover effect): the session-level carryover finding is load-bearing for the overreliance feedback-loop interpretation, but the reported analyses do not appear to include controls for within-session learning, task-order effects, or individual differences in baseline AI propensity; adding these would be required to rule out alternative explanations for increased subsequent AI adoption.
minor comments (2)
- [Figures] Figure legends: error bars should be explicitly labeled as 95% CI or SE to aid interpretation of the miscalibration magnitudes.
- [Discussion] Discussion: the generalizability statement could be tightened by noting that the chosen tasks are deliberately simple and that external pressures or multi-session learning were not tested.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. The comments highlight opportunities to improve methodological transparency and analytical robustness, and we address each point below with specific plans for revision.
read point-by-point responses
-
Referee: Methods (Studies 1–3): the efficiency-gain illusion claim rests on comparing estimated versus actual time/effort savings, yet the manuscript does not specify whether 'actual' savings are derived from objective logs that include prompting, verification, and error-correction overhead or from post-task subjective ratings; without this distinction the overestimation finding risks conflating measurement artifacts with true miscalibration.
Authors: We appreciate this observation on measurement clarity. In our studies, actual time and effort savings were computed from objective platform logs that recorded total task completion time for each trial, explicitly incorporating all overhead associated with prompting the AI, reviewing outputs, and making corrections. These logs were timestamped from task start to submission and did not rely on post-task subjective ratings for the 'actual' values. Participant estimates were collected separately via self-report after each block. We will revise the Methods section (and add a dedicated subsection on measurement) to explicitly describe this distinction and the logging procedure, thereby eliminating any ambiguity about potential measurement artifacts. revision: yes
-
Referee: Results (carryover effect): the session-level carryover finding is load-bearing for the overreliance feedback-loop interpretation, but the reported analyses do not appear to include controls for within-session learning, task-order effects, or individual differences in baseline AI propensity; adding these would be required to rule out alternative explanations for increased subsequent AI adoption.
Authors: We agree that strengthening the carryover analysis with additional controls would enhance confidence in the feedback-loop interpretation. Our pre-registered primary models already incorporated session fixed effects and participant-level random intercepts to account for repeated measures. To directly address the referee's concern, we will add exploratory (non-pre-registered) regression specifications in the revision that include (a) task-order position as a covariate, (b) an interaction term for within-session learning (trial number within block), and (c) a participant-level baseline AI propensity score derived from the first block. These supplementary analyses will be clearly labeled as robustness checks and will be reported alongside the pre-registered results. revision: partial
Circularity Check
No circularity: empirical claims rest on independent behavioral data
full rationale
The paper reports findings from three pre-registered user studies with direct measurements of AI usage choices, self-estimates, and perceived time/effort savings across N=2691 participants. No mathematical derivations, equations, fitted parameters presented as predictions, or uniqueness theorems appear in the abstract or described methods. Central claims derive from observed participant behavior rather than reducing by construction to self-citations, ansatzes, or input data. The study is self-contained against external benchmarks of experimental reporting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Participants provide honest and accurate self-reports of their AI usage frequency and perceived time savings.
Reference graph
Works this paper leans on
-
[1]
Nature human 33 behaviour8(10), 1851–1863 (2024)
Collins, K.M., Sucholutsky, I., Bhatt, U., Chandra, K., Wong, L., Lee, M., Zhang, C.E., Zhi-Xuan, T., Ho, M., Mansinghka, V., Weller, A., Tenenbaum, J.B., Grif- fiths, T.L.o.: Building machines that learn and think with people. Nature human 33 behaviour8(10), 1851–1863 (2024)
work page 2024
-
[2]
Harvard Business Review9(2025)
Zao-Sanders, M.: How people are really using gen AI in 2025. Harvard Business Review9(2025)
work page 2025
-
[3]
Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., Jurafsky, D.: Sycophantic AI decreases prosocial intentions and promotes dependence. Science391(6792), 8352 (2026)
work page 2026
-
[4]
arXiv preprint arXiv:2503.04761 (2025)
Handa, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., et al.: Which economic tasks are per- formed with ai? evidence from millions of claude conversations. arXiv preprint arXiv:2503.04761 (2025)
-
[5]
Human Resource Management Review , 33(1): 100857
Wang, Z.Z., Shao, Y., Shaikh, O., Fried, D., Neubig, G., Yang, D.: How do AI agents do human work? comparing AI and human workflows across diverse occupations. arXiv preprint arXiv:2510.22780 (2025)
-
[6]
https://www.anthropic.com/research/estimating-productivity-gains
Tamkin, A., McCrory, P.: Estimating AI Productivity Gains from Claude Con- versations. https://www.anthropic.com/research/estimating-productivity-gains
-
[7]
https:// www.anthropic.com/research/anthropic-economic-index-january-2026-report
Appel, R., Massenkoff, M., McCrory, P., McCain, M., Heller, R., Neylon, T., Tamkin, A.: Anthropic Economic Index Report: Economic Primitives. https:// www.anthropic.com/research/anthropic-economic-index-january-2026-report
work page 2026
-
[8]
IEEE Transactions on Services Computing (2024)
Xiong, H., Bian, J., Li, Y., Li, X., Du, M., Wang, S., Yin, D., Helal, S.: When search engine services meet large language models: visions and challenges. IEEE Transactions on Services Computing (2024)
work page 2024
-
[9]
In: Colloquia, Academic Journal of Culture and Thought, vol
Hooper, V.J.: Cognitive offloading and the reshaping of human thought: The sub- tle influence of artificial intelligence. In: Colloquia, Academic Journal of Culture and Thought, vol. 12, pp. 01–14 (2025)
work page 2025
-
[10]
Computers in Human Behavior160, 108386 (2024)
Stadler, M., Bannert, M., Sailer, M.: Cognitive ease at a cost: Llms reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior160, 108386 (2024)
work page 2024
-
[11]
In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp
Dhillon, P.S., Molaei, S., Li, J., Golub, M., Zheng, S., Robert, L.P.: Shaping human-AI collaboration: Varied scaffolding levels in co-writing with language models. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–18 (2024)
work page 2024
-
[12]
https://metr.org/blog/ 2025-07-10-early-2025-AI-experienced-os-dev-study/ (2025)
Becker, J., Rush, N., Barnes, E., Rein, D.: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. https://metr.org/blog/ 2025-07-10-early-2025-AI-experienced-os-dev-study/ (2025)
work page 2025
-
[13]
Trends in cognitive sciences20(9), 34 676–688 (2016)
Risko, E.F., Gilbert, S.J.: Cognitive offloading. Trends in cognitive sciences20(9), 34 676–688 (2016)
work page 2016
-
[14]
Gerlich, M.: AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies15(1), 6 (2025)
work page 2025
-
[15]
In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp
Lee, H.-P., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., Wilson, N.: The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–22 (2025)
work page 2025
-
[16]
AI Assistance Reduces Persistence and Hurts Independent Performance
Liu, G., Christian, B., Dumbalska, T., Bakker, M.A., Dubey, R.: AI assis- tance reduces persistence and hurts independent performance. arXiv preprint arXiv:2604.04721 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Social Sciences & Humanities Open12, 102287 (2025)
Barcaui, A.: ChatGPT as a cognitive crutch: Evidence from a randomized con- trolled trial on knowledge retention. Social Sciences & Humanities Open12, 102287 (2025)
work page 2025
-
[18]
arXiv preprint arXiv:2601.20245 (2026)
Shen, J.H., Tamkin, A.: How AI impacts skill formation. arXiv preprint arXiv:2601.20245 (2026)
-
[19]
arXiv preprint arXiv:2510.06124 (2025)
Shelby, R., Diaz, F., Prabhakaran, V.: Taxonomy of user needs and actions. arXiv preprint arXiv:2510.06124 (2025)
-
[20]
Scientific American207(4), 93–106 (1962)
Festinger, L.: Cognitive dissonance. Scientific American207(4), 93–106 (1962)
work page 1962
-
[21]
Hart, S.G., Staveland, L.E.: Development of NASA-TLX (task load index): Results of empirical and theoretical research. vol. 52, pp. 139–183. Elsevier (1988)
work page 1988
-
[22]
The Wiley Blackwell handbook of judgment and decision making2, 356–379 (2015)
Koriat, A.: Metacognition: Decision making processes in self-monitoring and self- regulation. The Wiley Blackwell handbook of judgment and decision making2, 356–379 (2015)
work page 2015
-
[23]
Philosophical Transactions of the Royal Society B: Biological Sciences367(1594), 1310–1321 (2012)
Yeung, N., Summerfield, C.: Metacognition in human decision-making: confi- dence and error monitoring. Philosophical Transactions of the Royal Society B: Biological Sciences367(1594), 1310–1321 (2012)
work page 2012
-
[24]
Psychological review124(1), 91 (2017)
Fleming, S.M., Daw, N.D.: Self-evaluation of decision-making: A general bayesian framework for metacognitive computation. Psychological review124(1), 91 (2017)
work page 2017
-
[25]
Cognition274, 106537 (2026) https://doi.org/ 10.1016/j.cognition.2026.106537
Dubey, R., Ho, M., Mehta, H., Griffiths, T.L.: Aha! moments correspond to metacognitive prediction errors. Cognition274, 106537 (2026) https://doi.org/ 10.1016/j.cognition.2026.106537
-
[26]
Journal of personality and social psychology42(1), 116 (1982) 35
Cacioppo, J.T., Petty, R.E.: The need for cognition. Journal of personality and social psychology42(1), 116 (1982) 35
work page 1982
-
[27]
Business & Information Systems Engi- neering2(4), 245–248 (2010)
Leimeister, J.M.: Collective intelligence. Business & Information Systems Engi- neering2(4), 245–248 (2010)
work page 2010
-
[28]
Nature Reviews Psychology2(9), 556–568 (2023)
Fan, J.E., Bainbridge, W.A., Chamberlain, R., Wammes, J.D.: Drawing as a versatile cognitive tool. Nature Reviews Psychology2(9), 556–568 (2023)
work page 2023
-
[29]
Behavioral and brain sciences43, 1 (2020)
Lieder, F., Griffiths, T.L.: Resource-rational analysis: Understanding human cog- nition as the optimal use of limited computational resources. Behavioral and brain sciences43, 1 (2020)
work page 2020
-
[30]
Current Opinion in Behavioral Sciences29, 24–30 (2019)
Griffiths, T.L., Callaway, F., Chang, M.B., Grant, E., Krueger, P.M., Lieder, F.: Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences29, 24–30 (2019)
work page 2019
-
[31]
Trends in Cognitive Sciences24(11), 873–883 (2020)
Griffiths, T.L.: Understanding human intelligence through human limitations. Trends in Cognitive Sciences24(11), 873–883 (2020)
work page 2020
-
[32]
Cognitive Science40(5), 1080–1127 (2016)
Dunn, T.L., Risko, E.F.: Toward a metacognitive account of cognitive offloading. Cognitive Science40(5), 1080–1127 (2016)
work page 2016
-
[33]
Wahn, B., Schmitz, L., Gerster, F.N., Weiss, M.: Offloading under cognitive load: Humans are willing to offload parts of an attentionally demanding task to an algorithm. Plos one18(5), 0286102 (2023)
work page 2023
-
[34]
Messeri, L., Crockett, M.J.: Artificial intelligence and illusions of understanding in scientific research. Nature627(8002), 49–58 (2024)
work page 2024
-
[35]
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
Kim, H., Yu, H., Yi, H.: The LLM fallacy: Misattribution in AI-assisted cognitive workflows. arXiv preprint arXiv:2604.14807 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[36]
Annual review of psychology35(1), 1–37 (1984)
Fraisse, P.: Perception and estimation of time. Annual review of psychology35(1), 1–37 (1984)
work page 1984
-
[37]
Journal of personality and social psychology65(1), 45 (1993)
Fredrickson, B.L., Kahneman, D.: Duration neglect in retrospective evaluations of affective episodes. Journal of personality and social psychology65(1), 45 (1993)
work page 1993
-
[38]
Journal of Marketing Research46(4), 543–556 (2009)
Zauberman, G., Kim, B.K., Malkoc, S.A., Bettman, J.R.: Discounting time and time discounting: Subjective time perception and intertemporal preferences. Journal of Marketing Research46(4), 543–556 (2009)
work page 2009
-
[39]
Ergonomics37(11), 1843– 1854 (1994)
Liu, Y., Wickens, C.D.: Mental workload and cognitive task automaticity: an evaluation of subjective and time estimation metrics. Ergonomics37(11), 1843– 1854 (1994)
work page 1994
-
[40]
Psychological bulletin142(8), 865 (2016)
Matthews, W.J., Meck, W.H.: Temporal cognition: Connecting subjective time to perception, attention, and memory. Psychological bulletin142(8), 865 (2016)
work page 2016
-
[41]
Adaptation level 36 theory, 287–301 (1971)
Brickman, P.: Hedonic relativism and planning the good society. Adaptation level 36 theory, 287–301 (1971)
work page 1971
-
[42]
Brickman, P., Coates, D., Janoff-Bulman, R.: Lottery winners and accident vic- tims: Is happiness relative? Journal of personality and social psychology36(8), 917 (1978)
work page 1978
-
[43]
Well-Being The founda- tions of hedonic psychology63, 302–329 (1999)
Frederick, S., Loewenstein, G.: 16 hedonic adaptation. Well-Being The founda- tions of hedonic psychology63, 302–329 (1999)
work page 1999
-
[44]
Oktar, K., Collins, K.M., Hern´ andez-Orallo, J., Coyle, D., Cave, S., Weller, A., Sucholutsky, I.: Identifying, evaluating, and mitigating risks of AI thought partnerships. ACM AI Lett. (2026) https://doi.org/10.1145/3803024
-
[45]
The Korean Journal of Medicine100(5), 197–200 (2025)
Ahn, S.: Preserving critical thinking in the age of large language models: The paradox of cognitive load and efficiency. The Korean Journal of Medicine100(5), 197–200 (2025)
work page 2025
-
[46]
Harvard Business Review (2023)
Hofman, J., Goldstein, D.G., Rothschild, D.: A sports analogy for understanding different ways to use ai. Harvard Business Review (2023)
work page 2023
-
[47]
Frontiers in Psychology16, 1645237 (2025)
Jose, B., Joseph, D., Mohan, V., Alexander, E., Varghese, S.K., Roy, A.: Out- sourcing cognition: the psychological costs of AI-era convenience. Frontiers in Psychology16, 1645237 (2025)
work page 2025
-
[48]
arXiv preprint arXiv:2501.10476 (2025)
Collins, K.M., Bhatt, U., Sucholutsky, I.: Revisiting rogers’ paradox in the context of human-AI interaction. arXiv preprint arXiv:2501.10476 (2025)
- [49]
-
[50]
Measuring and mitigating overreliance to build human-compatible AI
Ibrahim, L., Collins, K.M., Kim, S.S.Y., Reuel, A., Lamparth, M., Feng, K., Ahmad, L., Soni, P., Kattan, A.E., Stein, M., Swaroop, S., Sucholutsky, I., Strait, A., Liao, Q.V., Bhatt, U.: Measuring and mitigating overreliance is necessary for building human-compatible AI. arXiv preprint arXiv:2509.08010 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
arXiv preprint arXiv:2509.08494 (2025)
Sturgeon, B., Samuelson, D., Haimes, J., Anthis, J.R.: Humanagencybench: Scalable evaluation of human agency support in AI assistants. arXiv preprint arXiv:2509.08494 (2025)
-
[52]
Psychology in the Schools61(3), 887–902 (2024)
Elizondo, K., Valenzuela, R., Pestana, J.V., Codina, N.: Self-regulation and pro- crastination in college students: A tale of motivation, strategy, and perseverance. Psychology in the Schools61(3), 887–902 (2024)
work page 2024
-
[53]
In: Pro- ceedings of the AAAI Conference on Human Computation and Crowdsourcing, 37 vol
Bansal, G., Nushi, B., Kamar, E., Lasecki, W.S., Weld, D.S., Horvitz, E.: Beyond accuracy: The role of mental models in human-AI team performance. In: Pro- ceedings of the AAAI Conference on Human Computation and Crowdsourcing, 37 vol. 7, pp. 2–11 (2019)
work page 2019
-
[54]
In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp
Kelly, M., Kumar, A., Smyth, P., Steyvers, M.: Capturing humans’ mental mod- els of ai: An item response theory approach. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1723–1734 (2023)
work page 2023
-
[55]
arXiv preprint arXiv:2407.12804 (2024)
Collins, K.M., Chen, V., Sucholutsky, I., Kirk, H.R., Sadek, M., Sargeant, H., Talwalkar, A., Weller, A., Bhatt, U.: Modulating language model experiences through frictions. arXiv preprint arXiv:2407.12804 (2024)
-
[56]
Steele, J.: What is (perception of) effort? Objective and subjective effort during attempted task performance. PsyArXiv (2020). https://doi.org/10.31234/osf.io/ kbyhm . osf.io/preprints/psyarxiv/kbyhm v1
-
[57]
Frontiers in psychology 14, 1191628 (2023)
Grassini, S.: Development and validation of the AI attitude scale (aias-4): a brief measure of general attitude toward artificial intelligence. Frontiers in psychology 14, 1191628 (2023)
work page 2023
-
[58]
International Journal of Technology in Education7(2) (2024)
Yurt, E., Kasarci, I.: A questionnaire of artificial intelligence use motives: A contri- bution to investigating the connection between AI and motivation. International Journal of Technology in Education7(2) (2024)
work page 2024
-
[59]
Frontiers in psychology7, 169 (2016)
Al´ os-Ferrer, C., H¨ ugelsch¨ afer, S., Li, J.: Inertia and decision making. Frontiers in psychology7, 169 (2016)
work page 2016
-
[60]
Journal of Applied Psychology93(3), 617 (2008) 38
Yeo, G., Neal, A.: Subjective cognitive effort: A model of states, traits, and time. Journal of Applied Psychology93(3), 617 (2008) 38
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.