The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning
Pith reviewed 2026-05-22 09:03 UTC · model grok-4.3
The pith
Heavy AI use in logical reasoning tasks weakens skill development once assistance is removed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In experiments with on-demand AI during logical reasoning, greater AI usage correlated with weaker skill development after removal: heavy users underperformed comparable peers while light users matched those with no AI. These patterns were mediated by AI informativeness. Low-information AI improved neither immediate nor post-removal performance and linked to weaker learning overall. High-information AI raised short-run performance without lowering average post-AI outcomes but showed heterogeneous effects.
What carries the argument
Mediation through AI usage intensity and informativeness: the contrast between heavy versus light use and between high- versus low-information AI outputs determines whether assistance amplifies or substitutes for independent reasoning.
If this is right
- High-informativeness AI can support immediate gains while preserving long-term skill levels on average.
- Light or limited AI access keeps post-assistance performance comparable to no access at all.
- Heavy reliance on low-information AI can substitute for reasoning and reduce independent skill growth.
- Regulating AI availability in learning contexts may be needed to avoid undermining skill development.
Where Pith is reading between the lines
- The same usage-and-informativeness pattern could be tested in math problem-solving or coding tasks to check domain generality.
- Heterogeneous effects under high-information AI suggest room to study which users benefit most and design targeted prompts.
- Training people to treat AI as a reasoning partner rather than an answer source might shift heavy users toward lighter, more beneficial patterns.
Load-bearing premise
That differences in how much and what kind of AI people choose can be separated from their starting reasoning ability or motivation so performance gaps can be credited to usage rather than who selected what.
What would settle it
A follow-up experiment that randomly assigns fixed AI-usage quotas and still finds no post-removal performance gap between heavy and light users after initial-ability matching would undermine the central claim.
Figures
read the original abstract
Artificial intelligence (AI) is being increasingly integrated into human problem-solving, yet its effects on individual skill development remain unclear. We examine how both AI usage and informativeness can shape learning in the context of a controlled logical reasoning task with on-demand access to AI assistance. We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. We also find in our study that these patterns are mediated by AI informativeness. Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. On the other hand, high-information AI was found to improve short-run performance without reducing post-AI outcomes on average in our experiments, but with heterogeneous effects. Our findings in general suggest that AI can, depending on context, either complement human skill development by amplifying independent reasoning or can act as a substitute that undermines such reasoning, with the implication that regulating AI access and usage will be important for promoting skill development in the presence of AI assistance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports results from a controlled logical reasoning experiment with on-demand AI access. It claims that greater AI usage is associated with weaker skill development after AI removal: heavy users underperform relative to matched comparable peers, while light users perform similarly to non-users. These patterns are mediated by AI informativeness, with low-information AI linked to no immediate gains and weaker overall learning, and high-information AI improving short-run performance without average reductions in post-AI outcomes (with noted heterogeneity). The authors conclude that AI can either complement or substitute for independent reasoning depending on usage and informativeness.
Significance. If the identification strategy successfully isolates usage effects from baseline ability and motivation, the findings would be significant for understanding when AI assistance supports versus undermines cognitive skill development. The mediation analysis by informativeness and the distinction between heavy/light usage add nuance beyond simple usage-volume claims. The study also offers falsifiable predictions about post-AI performance that could be tested in follow-up work.
major comments (1)
- The central causal interpretation—that heavy AI usage weakens skill development relative to comparable peers—rests on the claim that usage intensity can be isolated from pre-existing differences in reasoning aptitude or motivation. The abstract invokes 'comparable peers' and 'matched users' but supplies no information on sample size, baseline measures, matching procedure, or regression controls. This is load-bearing: without these details, observed post-task gaps could reflect selection rather than usage effects, and the same concern applies to the informativeness mediation (which is only observed among users who actually query the system).
minor comments (2)
- The abstract states that high-information AI 'was found to improve short-run performance without reducing post-AI outcomes on average' but does not report effect sizes, confidence intervals, or the exact definition of 'on average' versus heterogeneous effects.
- Notation for 'AI informativeness' and 'skill development' should be defined explicitly in the methods section with reference to the specific logical reasoning measures used.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive report. We address the major comment on the identification of usage effects below, and we plan to incorporate clarifications in the revised manuscript.
read point-by-point responses
-
Referee: The central causal interpretation—that heavy AI usage weakens skill development relative to comparable peers—rests on the claim that usage intensity can be isolated from pre-existing differences in reasoning aptitude or motivation. The abstract invokes 'comparable peers' and 'matched users' but supplies no information on sample size, baseline measures, matching procedure, or regression controls. This is load-bearing: without these details, observed post-task gaps could reflect selection rather than usage effects, and the same concern applies to the informativeness mediation (which is only observed among users who actually query the system).
Authors: We agree that the abstract does not provide sufficient detail on these methodological aspects, which are important for evaluating the causal claims. The full paper includes a description of the sample size and experimental procedure in the Methods section. Baseline measures of reasoning aptitude were collected via a pre-experiment test, and we include these as controls in our main regressions. To address selection into usage intensity, we use a matching procedure based on baseline aptitude, self-reported motivation, and other observables to compare heavy users to similar light or non-users. We will revise the manuscript to explicitly summarize these details in the abstract and to add a subsection on the matching method and its assumptions. We will also expand the discussion of the informativeness mediation to note that it is estimated conditional on AI queries being made and to include additional analyses addressing potential selection into querying. revision: yes
Circularity Check
Empirical study reports observed associations with no derivation chain or fitted predictions
full rationale
This is a controlled empirical study that measures participant-chosen AI usage intensity and informativeness during a logical reasoning task, then reports post-task performance associations relative to matched peers. No equations, first-principles derivations, or model parameters are presented as predictions that could reduce to the inputs by construction. Claims rest on direct experimental observations and comparisons rather than self-referential definitions or self-citation chains that bear the load of the central result. The design is therefore self-contained against external benchmarks and exhibits no circularity of the enumerated kinds.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
propensity score matching (PSM) ... matching Light and Heavy users separately to Zero-usage participants based on Phase 1 performance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Experimental evidence on the productivity effects of generative artificial intelligence
Noy S, Zhang W. Experimental evidence on the productivity effects of generative artificial intelligence. Science. 2023;381(6654):187-92
work page 2023
-
[2]
ChatGPT for good? On opportunities and challenges of large language models for education
Kasneci E, Seßler K, K ¨uchemann S, Bannert M, Dementieva D, Fischer F, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual dif- ferences. 2023;103:102274
work page 2023
-
[3]
The impact of AI on developer productivity: Evidence from Github Copilot
Peng S, Kalliamvakou E, Cihon P, Demirer M. The impact of AI on developer productivity: Evidence from Github Copilot. arXiv preprint arXiv:230206590. 2023
work page 2023
-
[4]
Vaithilingam P, Zhang T, Glassman EL. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In: CHI conference on human factors in computing systems extended abstracts; 2022. p. 1-7
work page 2022
-
[5]
Kalra N, Paddock SM. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation research part A: policy and practice. 2016;94:182-93
work page 2016
-
[6]
Budzy ´n K, Roma´nczyk M, Kitala D, Kołodziej P, Bugajski M, Adami HO, et al. Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study. The Lancet Gastroenterology & Hepatology. 2025
work page 2025
-
[7]
Bayesian modeling of human–AI complementarity
Steyvers M, Tejeda H, Kerrigan G, Smyth P. Bayesian modeling of human–AI complementarity. Pro- ceedings of the National Academy of Sciences. 2022;119(11):e2111547119
work page 2022
-
[8]
Wilder B, Horvitz E, Kamar E. Learning to complement humans. arXiv preprint arXiv:200500582. 2020
work page 2020
-
[9]
F ¨ugener A, Walzner DD, Gupta A. Roles of artificial intelligence in collaboration with humans: Au- tomation, augmentation, and the future of work. Management Science. 2026;72(1):538-57
work page 2026
-
[10]
AI tools in society: Impacts on cognitive offloading and the future of critical thinking
Gerlich M. AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies. 2025;15(1):6
work page 2025
-
[11]
Macnamara BN, Berber I, C ¸ avus ¸o˘glu MC, Krupinski EA, Nallapareddy N, Nelson NE, et al. Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness? Cognitive Research: Principles and Implications. 2024;9(1):46
work page 2024
-
[12]
Zhang Y , Liao QV , Bellamy RK. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 conference on fairness, accountability, and transparency; 2020. p. 295-305
work page 2020
-
[13]
Explainable AI improves task perfor- mance in human–AI collaboration
Senoner J, Schallmoser S, Kratzwald B, Feuerriegel S, Netland T. Explainable AI improves task perfor- mance in human–AI collaboration. Scientific reports. 2024;14(1):31150
work page 2024
-
[14]
Buc ¸inca Z, Malaya MB, Gajos KZ. To trust or to think: cognitive forcing functions can reduce overre- liance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-computer Interaction. 2021;5(CSCW1):1-21
work page 2021
-
[15]
de Jong S, Paananen V , Tag B, van Berkel N. Cognitive forcing for better decision-making: reducing overreliance on AI systems through partial explanations. Proceedings of the ACM on Human-Computer Interaction. 2025;9(2):1-30
work page 2025
-
[16]
Lai V , Tan C. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In: Proceedings of the conference on fairness, accountability, and transparency; 2019. p. 29-38
work page 2019
-
[17]
Brynjolfsson E, Li D, Raymond L. Generative AI at work. The Quarterly Journal of Economics. 2025;140(2):889-942
work page 2025
-
[18]
Mind in society: The development of higher psychological processes
Vygotsky LS. Mind in society: The development of higher psychological processes. vol. 86. Harvard university press; 1978
work page 1978
-
[19]
Yan L, Martinez-Maldonado R, Jin Y , Echeverria V , Milesi M, Fan J, et al. The effects of generative AI agents and scaffolding on enhancing students’ comprehension of visual learning analytics. Computers & Education. 2025:105322
work page 2025
-
[20]
Do people engage cognitively with AI? Impact of AI assistance on incidental learning
Gajos KZ, Mamykina L. Do people engage cognitively with AI? Impact of AI assistance on incidental learning. In: Proceedings of the 27th International Conference on Intelligent User Interfaces; 2022. p. 794-806
work page 2022
- [21]
-
[22]
Kosmyna N, Hauptmann E, Yuan YT, Situ J, Liao XH, Beresnitzky A V , et al. Your brain on Chat- GPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv preprint arXiv:250608872. 2025
work page 2025
-
[23]
How AI impacts skill formation
Shen JH, Tamkin A. How AI impacts skill formation. arXiv preprint arXiv:260120245. 2026
work page 2026
-
[24]
Karny S, Mayer LW, Ayoub J, Song M, Su H, Tian D, et al. Learning with AI assistance: A path to better task performance or dependence? In: Proceedings of the ACM Collective Intelligence Conference; 2024. p. 10-7
work page 2024
-
[25]
Pretest-posttest designs and measurement of change
Dimitrov DM, Rumrill PDJ. Pretest-posttest designs and measurement of change. Work. 2003;20(2):159-65
work page 2003
-
[26]
Personalized help for optimizing low-skilled users’ strategy
Gu F, Wongkamjan W, Boyd-Graber JL, Kummerfeld JK, Peskoff D, May J. Personalized help for optimizing low-skilled users’ strategy. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 2: Short Papers); 2025. p. 65-74
work page 2025
-
[27]
Accuracy-time tradeoffs in AI-assisted decision making under time pressure
Swaroop S, Buc ¸inca Z, Gajos KZ, Doshi-Velez F. Accuracy-time tradeoffs in AI-assisted decision making under time pressure. In: Proceedings of the 29th International Conference on Intelligent User Interfaces; 2024. p. 138-54
work page 2024
-
[28]
How time pressure in different phases of decision-making influences human-AI collaboration
Cao S, Gomez C, Huang CM. How time pressure in different phases of decision-making influences human-AI collaboration. Proceedings of the ACM on Human-computer Interaction. 2023;7(CSCW2):1- 26
work page 2023
-
[29]
Who goes first? Influences of human-AI workflow on decision making in clinical imaging
Fogliato R, Chappidi S, Lungren M, Fisher P, Wilson D, Fitzke M, et al. Who goes first? Influences of human-AI workflow on decision making in clinical imaging. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency; 2022. p. 1362-74
work page 2022
-
[30]
Lee HP, Sarkar A, Tankelevitch L, Drosos I, Rintel S, Banks R, et al. The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems; 2025. p. 1-22
work page 2025
-
[31]
Social desirability, anonymity, and Internet-based questionnaires
Joinson A. Social desirability, anonymity, and Internet-based questionnaires. Behavior research meth- ods, instruments, & computers. 1999;31(3):433-8
work page 1999
-
[32]
Expla- nations can reduce overreliance on AI systems during decision-making
Vasconcelos H, J ¨orke M, Grunde-McLaughlin M, Gerstenberg T, Bernstein MS, Krishna R. Expla- nations can reduce overreliance on AI systems during decision-making. Proceedings of the ACM on Human-Computer Interaction. 2023;7(CSCW1):1-38
work page 2023
-
[33]
How displaying AI confidence affects reliance and hybrid human-AI performance
Tejeda Lemus H, Kumar A, Steyvers M. How displaying AI confidence affects reliance and hybrid human-AI performance. In: HHAI 2023: Augmenting Human Intellect. IOS Press; 2023. p. 234-42
work page 2023
-
[34]
Kahr PK, Rooks G, Snijders C, Willemsen MC. The trust recovery journey. The effect of timing of errors on the willingness to follow AI advice. In: Proceedings of the 29th International Conference on Intelligent User Interfaces; 2024. p. 609-22
work page 2024
-
[35]
Panigutti C, Beretta A, Giannotti F, Pedreschi D. Understanding the impact of explanations on advice- taking: a user study for AI-based clinical Decision Support Systems. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems; 2022. p. 1-9
work page 2022
-
[36]
Improving human situation awareness in AI-advised decision making
Srivastava DK, Lilly JM, Feigh KM. Improving human situation awareness in AI-advised decision making. In: 2022 IEEE 3rd International Conference on Human-Machine Systems (ICHMS). IEEE
work page 2022
-
[37]
Cao S, Liu A, Huang CM. Designing for appropriate reliance: The roles of AI uncertainty presentation, initial user decision, and user demographics in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction. 2024;8(CSCW1):1-32
work page 2024
-
[38]
Eisbach S, Langer M, Hertel G. Optimizing human-AI collaboration: Effects of motivation and accu- racy information in AI-supported decision-making. Computers in Human Behavior: Artificial Humans. 2023;1(2):100015
work page 2023
-
[39]
Toward a unified view of the speed-accuracy trade-off
Standage D, Wang DH, Heitz RP, Simen P. Toward a unified view of the speed-accuracy trade-off. Frontiers in Neuroscience. 2015;9:139
work page 2015
-
[40]
Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research. 2011;46(3):399-424
work page 2011
-
[41]
Generative AI usage and exam performance
Wecks JO, V oshaar J, Plate BJ, Zimmermann J. Generative AI usage and exam performance. arXiv preprint arXiv:240419699. 2024
work page 2024
-
[42]
Deskilling, upskilling, and reskilling: a case for hybrid intelligence
Rafner JF, Dellermann D, Hjorth A, Veraszt ´o D, Kampf CE, Mackay W, et al. Deskilling, upskilling, and reskilling: a case for hybrid intelligence. Morals & Machines. 2021;1(2):24-39
work page 2021
-
[43]
Natali C, Marconi L, Dias Duran LD, Cabitza F. AI-induced deskilling in medicine: a mixed-method review and research agenda for healthcare and beyond. Artificial Intelligence Review. 2025;58(11):356
work page 2025
-
[44]
Learning password best practices through in-task instruction
Ma Q, Zhou Y , Kaushik S, Joshi A, Majumdar A, Apthorpe N, et al. Learning password best practices through in-task instruction. arXiv preprint arXiv:260106650. 2026
work page 2026
-
[45]
Pacing for mastery: Optimizing LLM interactions for learning
Tran K, Gao G, Lombard A, Yu T, Jiang H, Yeh TY . Pacing for mastery: Optimizing LLM interactions for learning. In: Proceedings of the 57th ACM Technical Symposium on Computer Science Education V . 1; 2026. p. 1068-74
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.