Generative AI and the Productivity Divide: Human-AI Complementarities in Education

Bharat Anand; Lihi Idan

arxiv: 2605.18143 · v1 · pith:2NR73ZP6new · submitted 2026-05-18 · 💻 cs.AI

Generative AI and the Productivity Divide: Human-AI Complementarities in Education

Lihi Idan , Bharat Anand This is my paper

Pith reviewed 2026-05-20 10:41 UTC · model grok-4.3

classification 💻 cs.AI

keywords generative AIproductivityAI interaction competencehuman-AI complementarityeducationinequalityrandomized experimentscaffolding

0 comments

The pith

Generative AI boosts average performance on learning tasks but creates large gaps based on how skillfully users interact with the models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a randomized experiment in which participants similar to early-career knowledge workers studied a technical topic either with standard resources or with large-language-model help. Average performance rose with AI access, yet the extra gains were not uniform. They were predicted by AI Interaction Competence, the skill of posing effective queries, judging model outputs, and correcting errors. Participants strong in this competence captured most of the benefit; those weak in it saw small or even negative returns. Adding simple conceptual-map scaffolding narrowed the spread of results across users.

Core claim

In the controlled self-study setting, GenAI access raised mean task performance while widening outcome variance along a new dimension. Gains were unrelated to GPA or prior knowledge and instead tracked AI Interaction Competence, defined as the capacity to elicit useful outputs, filter inaccuracies, and verify information from the model. High-competence users achieved substantially larger improvements; low-competence users realized limited or negative marginal returns. A scaffolding intervention using conceptual maps reduced this variance, showing that standardized workflows can compress inequality in AI-mediated performance.

What carries the argument

AI Interaction Competence (AIC), the ability to elicit, filter, and verify generative-model outputs, which determines the size of productivity gains from AI access and therefore governs the distribution of benefits.

If this is right

GenAI raises average productivity while adding a new axis of inequality tied to human-AI interaction skill.
Firms can reduce uneven adoption by offering short micro-training on effective AI interaction.
Simple standardized procedures such as conceptual maps can lower outcome variance across users.
Consistent value capture requires pairing tool access with both training and operating protocols rather than access alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Educational programs may need to treat AI interaction skill as a teachable component alongside domain content to avoid creating a new performance divide.
In hiring or team design, organizations could treat measured interaction competence as a distinct capability worth developing or screening for.
The same pattern could appear in other knowledge tasks such as code review or market analysis, suggesting the competence factor is not limited to educational settings.

Load-bearing premise

That differences in performance on this short self-study task with student-like participants will translate to productivity differences in real professional knowledge-work settings.

What would settle it

A field study that measures participants' AI Interaction Competence before they use generative AI on actual job tasks and checks whether competence scores predict the size of performance gains in the workplace.

Figures

Figures reproduced from arXiv: 2605.18143 by Bharat Anand, Lihi Idan.

read the original abstract

Generative Artificial Intelligence (GenAI) is transforming how firms create, process, and apply knowledge, yet little is known about the heterogeneity of its productivity effects across users. We report results from a randomized controlled experiment in which participants-analogs of early-career knowledge workers-were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. On average, GenAI access significantly increased task performance, but the distribution of gains was highly uneven. Improvements were not predicted by GPA or prior knowledge, but by \textit{AI Interaction Competence (AIC)} -- the ability to elicit, filter, and verify model outputs. High-AIC participants realized outsized gains; low-AIC participants saw limited or even negative marginal returns. A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. We interpret these findings through the lens of human-AI complementarities: GenAI raises mean productivity while introducing a new axis of capability inequality. Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures to capture value consistently and avoid uneven adoption outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The RCT shows GenAI access lifts average task performance but gains concentrate among users skilled at prompting and verifying outputs, with the lab setup leaving real-world productivity mapping untested.

read the letter

This paper runs a randomized experiment where people studied a technical topic either with standard resources or LLM assistance. Average performance went up with AI access, but the benefits skewed heavily toward participants who handled the model well—eliciting good prompts, filtering results, and checking accuracy. That skill, which they label AI Interaction Competence, predicted outcomes better than GPA or prior knowledge. Low-skill users got little or even negative returns. Adding conceptual maps as scaffolding narrowed the outcome spread, pointing to a possible way to reduce uneven results.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a randomized controlled experiment in which participants (analogs of early-career knowledge workers) were assigned to self-study a technical domain using either traditional resources or LLM assistance. On average, GenAI access increased task performance, but gains were highly uneven and predicted by AI Interaction Competence (AIC)—the ability to elicit, filter, and verify model outputs—rather than by GPA or prior knowledge. High-AIC participants realized large gains while low-AIC participants saw limited or negative returns. A scaffolding intervention using conceptual maps reduced outcome variance. The authors interpret the findings as evidence that GenAI raises mean productivity while introducing a new axis of capability inequality via human-AI complementarities, and recommend short AIC micro-training plus standard operating procedures for firms.

Significance. If the results hold, the paper contributes empirical evidence from an RCT on heterogeneous GenAI effects in knowledge-work tasks, moving beyond average-effect studies to identify user competencies as a moderator. The randomized design and scaffolding test are strengths that support causal claims about complementarities and inequality mitigation. The work is notable for its focus on actionable managerial implications and for introducing AIC as a measurable construct. These elements could inform both education research and organizational AI adoption strategies, provided the lab task generalizes to professional settings.

major comments (3)

[Methods] Methods section: the manuscript does not report sample size, exact outcome measures, statistical power, or the regression specifications used to test heterogeneous effects by AIC versus GPA/prior knowledge. These details are load-bearing for evaluating the reliability of the average and distributional claims in the abstract and results.
[Results] Results section: the claim that performance gains are predicted by AIC but not by GPA or prior knowledge requires the specific statistical output (e.g., regression coefficients, interaction terms, or Table reporting these tests) to be shown; without it the moderator result cannot be fully assessed.
[Discussion] Discussion section: the managerial recommendations (AIC micro-training and SOPs) rest on the assumption that lab self-study performance differences map to real professional productivity, yet the paper provides no additional evidence or robustness discussion addressing differences in task duration, team integration, or long-horizon outcomes.

minor comments (2)

[Abstract] Abstract: the term AI Interaction Competence (AIC) is introduced without a short parenthetical definition, which would improve accessibility for readers.
Throughout: ensure consistent definition of all acronyms at first use and verify that figure captions fully describe the scaffolding intervention and outcome measures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to strengthen the paper where possible.

read point-by-point responses

Referee: [Methods] Methods section: the manuscript does not report sample size, exact outcome measures, statistical power, or the regression specifications used to test heterogeneous effects by AIC versus GPA/prior knowledge. These details are load-bearing for evaluating the reliability of the average and distributional claims in the abstract and results.

Authors: We agree that these methodological details are essential for assessing the robustness of our results. In the revised manuscript, we will expand the Methods section to explicitly report the sample size, precise definitions and operationalization of the outcome measures, a post-hoc statistical power analysis, and the full regression equations and specifications used to examine heterogeneous treatment effects by AI Interaction Competence (AIC) as well as comparisons with GPA and prior knowledge. revision: yes
Referee: [Results] Results section: the claim that performance gains are predicted by AIC but not by GPA or prior knowledge requires the specific statistical output (e.g., regression coefficients, interaction terms, or Table reporting these tests) to be shown; without it the moderator result cannot be fully assessed.

Authors: We acknowledge that the Results section would benefit from more transparent reporting of the statistical tests supporting the moderator claims. We will add a table presenting the regression results, including coefficients, standard errors, t-statistics or p-values for the main effects and interaction terms involving AIC, GPA, and prior knowledge. This will allow readers to directly evaluate the differential predictive power of AIC. revision: yes
Referee: [Discussion] Discussion section: the managerial recommendations (AIC micro-training and SOPs) rest on the assumption that lab self-study performance differences map to real professional productivity, yet the paper provides no additional evidence or robustness discussion addressing differences in task duration, team integration, or long-horizon outcomes.

Authors: We concur that generalizability from the laboratory setting to professional environments is an important consideration. Our study employs a task and participant pool designed to approximate early-career knowledge work, but we do not claim direct equivalence to all aspects of professional productivity. In the revised Discussion, we will include an expanded limitations subsection that addresses potential differences in task duration, team integration, and long-horizon outcomes, along with a call for future research in field settings. We believe the RCT design still offers valuable insights into the mechanisms of human-AI complementarities that can guide initial managerial strategies, while noting the need for further validation. revision: partial

Circularity Check

0 steps flagged

Empirical RCT with no mathematical derivation or self-referential reduction

full rationale

The paper reports results from a randomized controlled experiment assigning participants to self-study tasks with or without LLM assistance. Central claims rest on measured performance differences and a constructed AIC metric (ability to elicit, filter, and verify outputs), with no equations, fitted parameters, or derivations that reduce outcomes to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing. The design is self-contained against external benchmarks via direct experimental comparison, yielding no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard RCT assumptions and introduces AIC as a new explanatory construct without external validation in the abstract.

axioms (1)

standard math Random assignment balances observable and unobservable characteristics across treatment and control groups
Invoked implicitly by the description of the randomized controlled experiment

invented entities (1)

AI Interaction Competence (AIC) no independent evidence
purpose: Explains why some users gain more from GenAI than others
Defined in abstract as ability to elicit, filter, and verify model outputs; treated as the key predictor of heterogeneous gains

pith-pipeline@v0.9.0 · 5734 in / 1217 out tokens · 34679 ms · 2026-05-20T10:41:03.623344+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Improvements were not predicted by GPA or prior knowledge, but by AI Interaction Competence (AIC) -- the ability to elicit, filter, and verify model outputs.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A scaffolding intervention (conceptual maps) reduced outcome variance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Artificial intelligence, automation, and work.NBER Working Paper No

Daron Acemoglu and Pascual Restrepo. Artificial intelligence, automation, and work.NBER Working Paper No. 24196, 2018

work page 2018
[2]

Robots and jobs: Evidence from us labor markets.Journal of Political Economy, 128(6):2188–2244, 2020

Daron Acemoglu and Pascual Restrepo. Robots and jobs: Evidence from us labor markets.Journal of Political Economy, 128(6):2188–2244, 2020

work page 2020
[3]

Sinan Aral, Erik Brynjolfsson, and D. J. Wu. Three-way complementarities: Performance pay, human resource analytics, and information technology.Management Science, 58(5):913–931, 2012

work page 2012
[4]

Applying ai to rebuild middle-class jobs

David Autor. Applying ai to rebuild middle-class jobs. Technical Report 32140, National Bureau of Economic Research, 2024

work page 2024
[5]

Erik Brynjolfsson and Lorin M. Hitt. Computing productivity: Firm-level evidence.Review of Eco- nomics and Statistics, 85(4):793–808, 2003

work page 2003
[6]

Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond. Generative ai at work: Evidence from a call center experiment. Technical Report 31161, National Bureau of Economic Research, 2023

work page 2023
[7]

Erik Brynjolfsson and Andrew McAfee.Machine, Platform, Crowd: Harnessing Our Digital Future. W. W. Norton & Company, New York, 2017

work page 2017
[8]

Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R

Fabrizio Dell’Acqua, Charles Ayoubi, Hila Lifshitz, Raffaella Sadun, Ethan R. Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R. Lakhani. The cybernetic teammate: A field experiment on generative ai reshaping teamwork and expertise. Technical Report 33641, National Bureau of Economic Research, 2025

work page 2025
[9]

David J. Deming. The growing importance of social skills in the labor market.Quarterly Journal of Economics, 132(4):1593–1640, 2017

work page 2017
[10]

doi:10.48550/arXiv.2303.10130 , url =

Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. Gpts are gpts: An early look at the labor market impact potential of large language models.arXiv preprint arXiv:2303.10130, 2023

work page arXiv 2023
[11]

Information technology and organizational performance: An integrative model of it business value.MIS Quarterly, 28(2):283–322, 2004

Nigel Melville, Kenneth Kraemer, and Vijay Gurbaxani. Information technology and organizational performance: An integrative model of it business value.MIS Quarterly, 28(2):283–322, 2004

work page 2004
[12]

Editor’s comments: Next-generation digital plat- forms: Toward human–ai hybrids.MIS Quarterly, 43(1):iii–ix, 2019

Arun Rai, Panos Constantinides, and Saonee Sarker. Editor’s comments: Next-generation digital plat- forms: Toward human–ai hybrids.MIS Quarterly, 43(1):iii–ix, 2019

work page 2019
[13]

Artificial intelligence in human resources management: Challenges and a path forward.California Management Review, 61(4):15–42, 2019

Prasanna Tambe, Peter Cappelli, and Valery Yakubovich. Artificial intelligence in human resources management: Challenges and a path forward.California Management Review, 61(4):15–42, 2019. 13

work page 2019

[1] [1]

Artificial intelligence, automation, and work.NBER Working Paper No

Daron Acemoglu and Pascual Restrepo. Artificial intelligence, automation, and work.NBER Working Paper No. 24196, 2018

work page 2018

[2] [2]

Robots and jobs: Evidence from us labor markets.Journal of Political Economy, 128(6):2188–2244, 2020

Daron Acemoglu and Pascual Restrepo. Robots and jobs: Evidence from us labor markets.Journal of Political Economy, 128(6):2188–2244, 2020

work page 2020

[3] [3]

Sinan Aral, Erik Brynjolfsson, and D. J. Wu. Three-way complementarities: Performance pay, human resource analytics, and information technology.Management Science, 58(5):913–931, 2012

work page 2012

[4] [4]

Applying ai to rebuild middle-class jobs

David Autor. Applying ai to rebuild middle-class jobs. Technical Report 32140, National Bureau of Economic Research, 2024

work page 2024

[5] [5]

Erik Brynjolfsson and Lorin M. Hitt. Computing productivity: Firm-level evidence.Review of Eco- nomics and Statistics, 85(4):793–808, 2003

work page 2003

[6] [6]

Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond. Generative ai at work: Evidence from a call center experiment. Technical Report 31161, National Bureau of Economic Research, 2023

work page 2023

[7] [7]

Erik Brynjolfsson and Andrew McAfee.Machine, Platform, Crowd: Harnessing Our Digital Future. W. W. Norton & Company, New York, 2017

work page 2017

[8] [8]

Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R

Fabrizio Dell’Acqua, Charles Ayoubi, Hila Lifshitz, Raffaella Sadun, Ethan R. Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R. Lakhani. The cybernetic teammate: A field experiment on generative ai reshaping teamwork and expertise. Technical Report 33641, National Bureau of Economic Research, 2025

work page 2025

[9] [9]

David J. Deming. The growing importance of social skills in the labor market.Quarterly Journal of Economics, 132(4):1593–1640, 2017

work page 2017

[10] [10]

doi:10.48550/arXiv.2303.10130 , url =

Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. Gpts are gpts: An early look at the labor market impact potential of large language models.arXiv preprint arXiv:2303.10130, 2023

work page arXiv 2023

[11] [11]

Information technology and organizational performance: An integrative model of it business value.MIS Quarterly, 28(2):283–322, 2004

Nigel Melville, Kenneth Kraemer, and Vijay Gurbaxani. Information technology and organizational performance: An integrative model of it business value.MIS Quarterly, 28(2):283–322, 2004

work page 2004

[12] [12]

Editor’s comments: Next-generation digital plat- forms: Toward human–ai hybrids.MIS Quarterly, 43(1):iii–ix, 2019

Arun Rai, Panos Constantinides, and Saonee Sarker. Editor’s comments: Next-generation digital plat- forms: Toward human–ai hybrids.MIS Quarterly, 43(1):iii–ix, 2019

work page 2019

[13] [13]

Artificial intelligence in human resources management: Challenges and a path forward.California Management Review, 61(4):15–42, 2019

Prasanna Tambe, Peter Cappelli, and Valery Yakubovich. Artificial intelligence in human resources management: Challenges and a path forward.California Management Review, 61(4):15–42, 2019. 13

work page 2019