pith. sign in

arxiv: 2606.26959 · v1 · pith:DMZRAXSFnew · submitted 2026-06-25 · 💰 econ.GN · q-fin.EC

The Shift to Agentic AI: Evidence from Codex

Pith reviewed 2026-06-26 01:50 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC
keywords agentic AICodexusage patternstask complexityproductivityworkflow changesorganizational adoptionoutput tokens
0
0 comments X

The pith

Codex usage data shows agentic AI adoption grew more than fivefold in the first half of 2026 with nearly tenfold higher task complexity and sharply rising output tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines internal logs from OpenAI's Codex tool to track how agentic AI systems that act on users' behalf are altering work patterns across different groups. It documents rapid growth in active users, especially outside software development, alongside increases in the sophistication of requests and the volume of generated output. A sympathetic reader would care because these trends suggest concrete shifts in daily workflows and potential effects on productivity and job structure as such tools spread.

Core claim

We use an automated, privacy-protecting pipeline to contrast usage across three populations: external personal-account users, external organizational-account users, and workers within OpenAI. Agentic AI usage grew more than fivefold in the first half of 2026, with the most rapid increase outside the initial audience of software developers. Within OpenAI, Codex usage is nearly universal and has largely replaced business usage of ChatGPT. A similar though slower shift occurs outside OpenAI, particularly within organizations. More than 10% of users manage three or more concurrent Codex agents each week, 26.6% use skills for sharing complex workflows, the share of users submitting tasks estimate

What carries the argument

An automated privacy-protecting pipeline that classifies Codex requests by estimated human completion time and attributes output tokens across user populations and tools.

If this is right

  • The number of active Codex users grew more than fivefold in the first half of 2026, fastest outside software developers.
  • Within OpenAI, Codex usage became nearly universal and replaced most business ChatGPT use.
  • Over 10 percent of users managed three or more concurrent agents weekly and 26.6 percent used skills to share complex workflows.
  • The share of users submitting requests for tasks estimated to take more than eight human hours rose nearly tenfold since the start of the year.
  • Median monthly output tokens rose 13 times for legal roles and 50 times for researchers inside OpenAI between November 2025 and June 2026.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the internal patterns generalize, organizations adopting agentic tools may experience measurable gains in task throughput without proportional increases in headcount.
  • The faster uptake inside OpenAI compared with external groups points to company-level policies or infrastructure as accelerators of the shift.
  • Rising concurrent agent use and skill sharing could lead to new job roles centered on orchestrating and maintaining multiple AI agents.
  • Comparing productivity metrics such as project completion rates before and after Codex rollout would test whether the observed token increases translate into net output gains.

Load-bearing premise

The automated pipeline correctly classifies tasks by estimated human completion time and accurately attributes output tokens to Codex versus ChatGPT without systematic measurement error or selection bias in the three user populations.

What would settle it

A re-analysis or external audit of the raw request logs that finds the human-time estimates or Codex-versus-ChatGPT token attributions deviate by more than a factor of two from the reported figures on average.

Figures

Figures reproduced from arXiv: 2606.26959 by Aaron Chatterji, Alex Martin Richmond, Christopher Ong, David Holtz, Drew Johnston, Prasanna Tambe.

Figure 12
Figure 12. Figure 12: Change in median output tokens per person, by OpenAI job function Plot shows the normalized number of output tokens generated in the trailing 28 days by the median worker in each job function within OpenAI. Output tokens include those generated on ChatGPT and those generated on Codex. Each point includes only users active in the preceding 28 days. All series are normalized to have a value of 1 on November… view at source ↗
read the original abstract

We analyze usage data from OpenAI's Codex tool to present large-scale evidence of how agentic AI technology, which can take actions on a user's behalf, changes how people work. We use an automated, privacy-protecting pipeline to contrast usage across three populations: external personal-account users, external organizational-account users, and workers within OpenAI. We find that agentic AI usage is growing rapidly: the number of active users has grown more than fivefold in the first half of 2026, with the most rapid increase occurring outside the initial audience of software developers. Uptake is uneven: within OpenAI, Codex usage is nearly universal and has largely replaced business usage of ChatGPT. We document a similar shift to agentic tooling outside OpenAI, particularly within organizations, although external adoption remains lower and more uneven. In addition to headline usage figures, we observe measures of sophistication, and find that a growing number of users have used Codex to change their workflows substantially. More than 10% of users manage three or more concurrent Codex agents at some point each week and that 26.6% use skills, which allow users to share instructions for complex workflows. Alongside these changes in usage practices, request complexity has increased: since the start of the year, the share of individual Codex users who submit at least one request for a task estimated to require more than eight hours for an experienced human to complete has increased nearly tenfold. Concurrently, output has grown rapidly -- in June 2026, the median OpenAI employee in a legal role generated 13 times more monthly output tokens across Codex and ChatGPT than they did in November 2025, while the median researcher generated more than 50 times as many. We conclude by discussing the implications of these patterns for productivity, job reorganization, and workforce restructuring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper analyzes usage logs from OpenAI's Codex tool across external personal, external organizational, and internal OpenAI user populations. It reports rapid growth in agentic AI adoption (more than fivefold increase in active users in H1 2026), rising workflow sophistication (over 10% managing three or more concurrent agents weekly; 26.6% using skills), increased task complexity (nearly tenfold rise in share of users submitting >8-hour tasks), and large output growth (13x median monthly tokens for legal roles and 50x for researchers within OpenAI), all derived from an automated privacy-protecting pipeline that classifies tasks by estimated human completion time and attributes tokens between Codex and ChatGPT.

Significance. If the pipeline measurements prove reliable, the manuscript supplies large-scale descriptive evidence on the diffusion of agentic AI, documenting shifts away from ChatGPT, uneven adoption, and proxies for productivity gains. Such data could inform models of technology adoption and labor reorganization in economics. The internal OpenAI population offers a useful benchmark, but the absence of validation or external benchmarks limits the strength of the claims.

major comments (3)
  1. [Methods] Methods: The automated pipeline that estimates human task completion time and partitions output tokens between Codex and ChatGPT underpins every headline statistic (fivefold user growth, tenfold complexity increase, 13x/50x token multipliers), yet the manuscript reports no validation data, inter-rater reliability metrics, sensitivity checks, or handling of novel agentic workflows and concurrent agents.
  2. [Results] Results: No sample sizes, confidence intervals, or robustness checks accompany the reported growth rates or median token increases, preventing assessment of precision or sensitivity to pipeline assumptions.
  3. [Data] Data section: Potential selection bias in the three user populations and measurement error in the internal OpenAI logs are not addressed, even though the pipeline is the sole source for all complexity and attribution classifications.
minor comments (1)
  1. [Abstract] Abstract: Dates (November 2025–June 2026) are referenced without stating the exact observation window or data-collection start date.

Simulated Author's Rebuttal

3 responses · 1 unresolved

Thank you for the opportunity to respond to the referee's report. The referee correctly identifies that the pipeline is central to our findings and that additional details on methods, results precision, and data limitations would improve the manuscript. We address each point below and propose revisions accordingly. We note that certain validation aspects cannot be provided due to the privacy design of our data pipeline.

read point-by-point responses
  1. Referee: [Methods] Methods: The automated pipeline that estimates human task completion time and partitions output tokens between Codex and ChatGPT underpins every headline statistic (fivefold user growth, tenfold complexity increase, 13x/50x token multipliers), yet the manuscript reports no validation data, inter-rater reliability metrics, sensitivity checks, or handling of novel agentic workflows and concurrent agents.

    Authors: We acknowledge the importance of validating the pipeline. Unfortunately, the privacy-protecting nature of the pipeline prevents us from retaining or accessing data necessary for external validation or inter-rater reliability studies. We will, however, add sensitivity checks to the revised manuscript and provide more explicit discussion of how the pipeline handles concurrent agents and novel workflows based on the classification logic. revision: partial

  2. Referee: [Results] Results: No sample sizes, confidence intervals, or robustness checks accompany the reported growth rates or median token increases, preventing assessment of precision or sensitivity to pipeline assumptions.

    Authors: We agree that including sample sizes, confidence intervals, and robustness checks would enhance the presentation of results. In the revised version, we will report the underlying sample sizes for key statistics and include confidence intervals where feasible. We will also conduct and report additional robustness checks regarding the pipeline assumptions. revision: yes

  3. Referee: [Data] Data section: Potential selection bias in the three user populations and measurement error in the internal OpenAI logs are not addressed, even though the pipeline is the sole source for all complexity and attribution classifications.

    Authors: We will revise the data section to explicitly discuss potential selection biases across the external personal, organizational, and internal populations, as well as any measurement considerations in the internal logs. This will include a more detailed description of the populations and limitations. revision: yes

standing simulated objections not resolved
  • Validation data, inter-rater reliability metrics, or external benchmarks for the automated pipeline due to its privacy-protecting design

Circularity Check

0 steps flagged

No circularity: purely descriptive statistics from usage logs

full rationale

The paper reports raw counts, shares, and growth multipliers computed directly from observed Codex usage logs via an automated pipeline. No equations, fitted parameters, predictions, or derivations are presented that reduce by construction to the inputs. The central claims (fivefold user growth, tenfold rise in complex tasks, 13x/50x token increases) are direct empirical summaries with no self-definitional loops, self-citation load-bearing steps, or ansatz smuggling. The pipeline's classification rules are an external measurement procedure, not a mathematical identity that forces the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is purely empirical and relies on standard statistical aggregation of usage logs. No free parameters are fitted to produce the headline growth rates. No new entities are postulated. The only background assumptions are that the privacy pipeline preserves accurate attribution and that task-duration estimates reflect real human effort.

axioms (1)
  • domain assumption The automated pipeline correctly maps raw API requests to user identities, agent counts, and estimated human completion times without material error.
    Invoked implicitly when reporting user counts, concurrent agents, and complexity shares; no validation details given in abstract.

pith-pipeline@v0.9.1-grok · 5876 in / 1352 out tokens · 15358 ms · 2026-06-26T01:50:42.385337+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 2 canonical work pages

  1. [1]

    Why Do People Use

    Skjuve, Marita and Brandtzaeg, Petter Bae and F. Why Do People Use. First Monday , volume =. 2024 , doi =

  2. [2]

    and Hitzig, Zoe and Ong, Christopher and Shan, Carl Yan and Wadman, Kevin , institution =

    Chatterji, Aaron and Cunningham, Thomas and Deming, David J. and Hitzig, Zoe and Ong, Christopher and Shan, Carl Yan and Wadman, Kevin , institution =. How People Use. 2025 , doi =

  3. [3]

    The Adoption and Usage of

    Yang, Jeremy and Yonack, Noah and Zyskowski, Kate and Yarats, Denis and Ho, Johnny and Ma, Jerry , journal =. The Adoption and Usage of. 2025 , doi =

  4. [4]

    , institution =

    Sarkar, Suproteem K. , institution =. 2026 , doi =

  5. [5]

    , journal =

    Huang, Lingxiao and Xiao, Wenyang and Vishnoi, Nisheeth K. , journal =. Delegation and Verification Under. 2026 , doi =

  6. [6]

    Collaborating with

    Ju, Harang and Aral, Sinan , journal =. Collaborating with. 2025 , doi =

  7. [7]

    arXiv preprint arXiv:2508.02966 , year =

    Measuring Human Leadership Skills with Artificially Intelligent Agents , author =. arXiv preprint arXiv:2508.02966 , year =. doi:10.48550/arXiv.2508.02966 , note =

  8. [8]

    2025 , doi =

    Korinek, Anton , institution =. 2025 , doi =

  9. [9]

    The Emerging Market for Intelligence: Pricing, Supply, and Demand for

    Demirer, Mert and Fradkin, Andrey and Tadelis, Nadav and Peng, Sida , institution =. The Emerging Market for Intelligence: Pricing, Supply, and Demand for. 2025 , doi =

  10. [10]

    2026 , month = may, note =

    Task-Completion Time Horizons of Frontier. 2026 , month = may, note =

  11. [11]

    Bai, Longju and Huang, Zhemin and Wang, Xingyao and Sun, Jiao and Mihalcea, Rada and Brynjolfsson, Erik and Pentland, Alex and Pei, Jiaxin , journal =. How Do. 2026 , doi =

  12. [12]

    and Amodei, Dario and Kaplan, Jared and Clark, Jack and Ganguli, Deep , journal =

    Handa, Kunal and Tamkin, Alex and McCain, Miles and Huang, Saffron and Durmus, Esin and Heck, Sarah and Mueller, Jared and Hong, Jerry and Ritchie, Stuart and Belonax, Tim and Troy, Kevin K. and Amodei, Dario and Kaplan, Jared and Clark, Jack and Ganguli, Deep , journal =. Which Economic Tasks Are Performed with. 2025 , doi =

  13. [13]

    , journal =

    Bick, Alexander and Blandin, Adam and Deming, David J. , journal =. The Rapid Adoption of Generative. 2026 , doi =

  14. [14]

    Generative

    Brynjolfsson, Erik and Li, Danielle and Raymond, Lindsey , journal =. Generative. 2025 , doi =

  15. [15]

    Science , volume =

    Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence , author =. Science , volume =. 2023 , doi =

  16. [16]

    Organization Science , year =

    Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality , author =. Organization Science , year =

  17. [17]

    The Cybernetic Teammate: A Field Experiment on Generative

    Dell'Acqua, Fabrizio and Ayoubi, Charles and Lifshitz, Hila and Sadun, Raffaella and Mollick, Ethan and Mollick, Lilach and Han, Yi and Goldman, Jeff and Nair, Hari and Taub, Stewart and Lakhani, Karim , institution =. The Cybernetic Teammate: A Field Experiment on Generative. 2025 , doi =

  18. [18]

    and Jaffe, Sonia and Immorlica, Nicole and Stanton, Christopher T

    Dillon, Eleanor W. and Jaffe, Sonia and Immorlica, Nicole and Stanton, Christopher T. , journal =. Shifting Work Patterns with Generative. 2025 , note =

  19. [19]

    and Immorlica, Nicole and Lucier, Brendan and Shahidi, Peyman , institution =

    Demirer, Mert and Horton, John J. and Immorlica, Nicole and Lucier, Brendan and Shahidi, Peyman , institution =. Chaining Tasks, Redefining Work: A Theory of. 2026 , month = feb, doi =

  20. [20]

    2026 , note =

    Artificial Intelligence in the Firm , author =. 2026 , note =

  21. [21]

    2026 , doi =

    Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives , author =. 2026 , doi =

  22. [22]

    Understanding Firms'

    Babina, Tania , institution =. Understanding Firms'. 2026 , doi =

  23. [23]

    Journal of Economic Perspectives , year =

    Automation and New Tasks: How Technology Displaces and Reinstates Labor , author =. Journal of Economic Perspectives , year =

  24. [24]

    2024 , volume =

    Eloundou, Tyna and Manning, Sam and Mishkin, Pamela and Rock, Daniel , journal =. 2024 , volume =

  25. [25]

    and Raj, Manav and Seamans, Robert , institution =

    Felten, Edward W. and Raj, Manav and Seamans, Robert , institution =. Occupational Heterogeneity in Exposure to Generative. 2023 , doi =

  26. [26]

    The Productivity

    Brynjolfsson, Erik and Rock, Daniel and Syverson, Chad , journal =. The Productivity. 2021 , volume =

  27. [27]

    The Economics of Artificial Intelligence: An Agenda , editor =

    Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics , author =. The Economics of Artificial Intelligence: An Agenda , editor =. 2019 , pages =

  28. [28]

    The Impact of

    Peng, Sida and Kalliamvakou, Eirini and Cihon, Peter and Demirer, Mert , journal =. The Impact of. 2023 , doi =

  29. [29]

    , journal =

    Tambe, Prasanna B. , journal =. Reskilling the Workforce for. 2026 , volume =

  30. [30]

    The Effects of Generative

    Cui, Zheyuan Kevin and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias , journal =. The Effects of Generative. 2026 , note =

  31. [31]

    and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , booktitle =

    Jimenez, Carlos E. and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , booktitle =

  32. [32]

    and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik and Press, Ofir , booktitle =

    Yang, John and Jimenez, Carlos E. and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik and Press, Ofir , booktitle =. 2024 , doi =

  33. [33]

    arXiv preprint arXiv:2107.03374 , year =

    Evaluating Large Language Models Trained on Code , author =. arXiv preprint arXiv:2107.03374 , year =

  34. [34]

    Measuring Coding Challenge Competence with

    Hendrycks, Dan and Basart, Steven and Kadavath, Saurav and Mazeika, Mantas and Arora, Akul and Guo, Ethan and Burns, Collin and Puranik, Samir and He, Horace and Song, Dawn and Steinhardt, Jacob , booktitle =. Measuring Coding Challenge Competence with

  35. [35]

    Lu, Shuai and Guo, Daya and Ren, Shuo and Huang, Junjie and Svyatkovskiy, Alexey and Blanco, Ambrosio and Clement, Colin and Drain, Dawn and Jiang, Daxin and Tang, Duyu and Li, Ge and Zhou, Lidong and Shou, Linjun and Zhou, Long and Tufano, Michele and Gong, Ming and Zhou, Ming and Duan, Nan and Sundaresan, Neel and Deng, Shao Kun and Fu, Shengyu and Liu,...

  36. [36]

    2025 , doi =

    Miserendino, Samuel and Wang, Michele and Patwardhan, Tejal and Heidecke, Johannes , journal =. 2025 , doi =

  37. [37]

    2024 , doi =

    Zhang, Yuntong and Ruan, Haifeng and Fan, Zhiyu and Roychoudhury, Abhik , booktitle =. 2024 , doi =

  38. [38]

    Wang, Xingyao and Li, Boxuan and Song, Yufan and Xu, Frank F. and Tang, Xiangru and Zhuge, Mingchen and Pan, Jiayi and Song, Yueqi and Li, Bowen and Singh, Jaskirat and Tran, Hoang and Li, Fuqiang and Ma, Ren and Zheng, Mingzhang and Qian, Bill and Shao, Daniel and Muennighoff, Niklas and Zhang, Yizhe and Hui, Binyuan and Lin, Junyang and Brennan, Robert ...

  39. [39]

    Demystifying

    Xia, Chunqiu Steven and Deng, Yinlin and Dunn, Soren and Zhang, Lingming , journal =. Demystifying. 2025 , doi =

  40. [40]

    Agentic Coding and Persistent Returns to Expertise , author =

  41. [41]

    American Economic Review , volume =

    The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox , author =. American Economic Review , volume =

  42. [42]

    Yang, Jeremy and Zyskowski, Kate and Yonack, Noah and Ma, Jerry , journal =. How. 2026 , doi =

  43. [43]

    Writing Code vs

    Demirer, Mert and Musolff, Leon and Yang, Liyuan , institution =. Writing Code vs. Shipping Code: Productivity Effects Across Generations of. 2026 , month = may, doi =

  44. [44]

    2026 , doi =

    Baumann, Joachim and Padmakumar, Vishakh and Li, Xiang and Yang, John and Yang, Diyi and Koyejo, Sanmi , journal =. 2026 , doi =

  45. [45]

    Strategic Management Journal , year =

    Artificial Intelligence Adoption and the Demand for Managerial Expertise , author =. Strategic Management Journal , year =. doi:10.1002/smj.70099 , note =