From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI

Alexander Oettl; Sampsa Samila; Sharique Hasan

arxiv: 2506.22440 · v2 · pith:WN6XEI25new · submitted 2025-06-10 · 💻 cs.CY · cs.LG· cs.MA· econ.GN· q-fin.EC

From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI

Sharique Hasan , Alexander Oettl , Sampsa Samila This is my paper

Pith reviewed 2026-05-22 01:12 UTC · model grok-4.3

classification 💻 cs.CY cs.LGcs.MAecon.GNq-fin.EC

keywords generative AILLMsorganizational designcomplexity redistributionGAS frameworkAI strategytrade-offscompetitive advantage

0 comments

The pith

LLMs do not erase AI trade-offs but relocate complexity from users to organizational infrastructure, compliance, and specialists.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Generality-Accuracy-Simplicity framework to examine how large language models reshape organizations. It claims that LLMs deliver high generality and accuracy through simple user interfaces, yet the underlying trade-offs do not vanish. Instead, complexity moves to infrastructure, compliance processes, and specialized personnel inside the firm. A reader would care because this relocation means competitive advantage now depends on how well an organization designs abstraction layers, aligns workflows, and builds complementary expertise rather than on adopting the technology alone.

Core claim

The authors argue that the inherent trade-offs among generality, accuracy, and simplicity in AI systems persist with large language models. User-facing simplicity masks a relocation of complexity to the organizational level, where it appears as demands on infrastructure, compliance, and expert staff. Competitive advantage therefore stems from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and integration of complementary expertise, rather than from simple model adoption.

What carries the argument

The Generality-Accuracy-Simplicity (GAS) framework, which tracks how trade-offs among generality, accuracy, and simplicity are redistributed across organizational stakeholders rather than eliminated.

If this is right

Competitive advantage shifts from AI adoption to the design of abstraction layers that manage relocated complexity.
Workflow alignment and complementary expertise become necessary conditions for effective technology integration.
New managerial challenges emerge especially around ensuring accuracy in high-stakes applications.
Organizations must treat AI strategy as an exercise in organizational design rather than model selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Firms in regulated sectors may face steeper internal complexity burdens and could test whether sector-specific rules amplify the relocation effect.
The pattern raises the question of whether future model improvements will continue shifting complexity or eventually lower the total amount.
One could examine whether certain organizational structures, such as flatter hierarchies, absorb the relocated complexity more efficiently than others.

Load-bearing premise

Trade-offs among generality, accuracy, and simplicity are fixed properties of AI systems that technology can only relocate rather than reduce or remove.

What would settle it

A controlled comparison of organizations before and after LLM adoption that shows a net drop in total infrastructure costs, compliance overhead, and need for specialized personnel would challenge the redistribution claim.

Figures

Figures reproduced from arXiv: 2506.22440 by Alexander Oettl, Sampsa Samila, Sharique Hasan.

**Figure 2.** Figure 2: Tasks vary in their accuracy and generality demands, shaping where they fall along [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: To achieve high task-specific accuracy, organizations layer technical and organiza [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

read the original abstract

This paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise. This study advances AI strategy by clarifying how scalable cognition relocates complexity and redefines the conditions for technology integration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's GAS framework usefully organizes AI trade-offs for org strategy but treats relocation as given without evidence that the dimensions can't improve together.

read the letter

Hi colleague, the main thing here is that the authors introduce the Generality-Accuracy-Simplicity framework to argue LLMs shift complexity from users to infrastructure, compliance, and experts rather than erasing the underlying tensions. They link this shift to new requirements for abstraction layers, workflow design, and complementary skills as the real source of advantage in AI adoption. This gives a clean way to talk about why simple interfaces still create managerial headaches, especially around accuracy in high-stakes settings. It pulls together familiar points on AI limitations and applies them directly to organizational choices, which is a reasonable extension for the management and strategy audience. The framing is coherent on its own terms and could help readers think through integration decisions more systematically than just calling AI a cost reducer. The soft spot is the lack of grounding for the relocation claim itself. The paper asserts that the trade-offs are inherent and conserved without a derivation, bounding argument, or data showing why scaling and tooling cannot ease generality, accuracy, and simplicity at the same time. The implications for competitive advantage therefore rest on the framework's definitions more than on independent checks, which leaves the central mechanism feeling circular. No empirical mapping or falsification criteria appear in what is presented. This is for readers in strategy and organizational theory who are looking for conceptual tools to discuss AI's internal effects. A practitioner or early-stage researcher might find the vocabulary helpful for framing problems, while someone needing tests or formal models will see the gaps quickly. It deserves a serious referee because the lens is clear and could be developed, even if the current version needs more support for why the trade-offs behave as claimed. I would send it out for review with that expectation.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze LLM impacts on organizations. It claims that LLMs deliver high generality and accuracy via simple user interfaces, but this masks relocation of complexity to infrastructure, compliance, and specialized personnel; competitive advantage therefore stems from designing abstraction layers and workflows to manage the redistributed complexity rather than from adoption alone.

Significance. If the relocation mechanism can be grounded, the framework offers a potentially useful conceptual lens for AI strategy literature by shifting focus from technical capabilities to organizational design challenges in high-stakes settings.

major comments (2)

[GAS framework (core definitions and trade-off section)] The central claim that generality-accuracy-simplicity trade-offs are inherent and fixed, and therefore must be relocated rather than jointly improved, is introduced axiomatically in the GAS framework without derivation, bounding argument, or engagement with scaling literature that suggests simultaneous gains are possible. This makes the relocation conclusion non-falsifiable on the evidence presented.
[Redistribution and managerial implications discussion] The assertion that user-facing simplicity necessarily shifts complexity to infrastructure, compliance, and personnel is presented as a direct consequence of the framework but supplies no empirical mapping, case evidence, or measurement protocol to demonstrate the relocation mechanism or its magnitude.

minor comments (1)

[Abstract and §1] The abstract and introduction could more explicitly separate definitional claims of the GAS framework from the interpretive claims about organizational relocation to avoid circular reading.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which help sharpen the positioning of the GAS framework as a conceptual contribution. We respond to each major point below and indicate planned revisions.

read point-by-point responses

Referee: [GAS framework (core definitions and trade-off section)] The central claim that generality-accuracy-simplicity trade-offs are inherent and fixed, and therefore must be relocated rather than jointly improved, is introduced axiomatically in the GAS framework without derivation, bounding argument, or engagement with scaling literature that suggests simultaneous gains are possible. This makes the relocation conclusion non-falsifiable on the evidence presented.

Authors: We agree that the trade-off is presented conceptually rather than formally derived. The framework synthesizes existing ideas from AI systems literature and organizational economics rather than claiming a new empirical law. In revision we will add a short subsection that grounds the GAS trade-offs in the no-free-lunch theorem and related results, while explicitly discussing scaling-law findings (Kaplan et al. and subsequent work). We will state the boundary conditions under which joint improvements are observed and clarify that the relocation claim applies primarily to high-stakes, regulated settings where residual complexity cannot be fully automated. This makes the argument more falsifiable by specifying when and where relocation is expected. revision: yes
Referee: [Redistribution and managerial implications discussion] The assertion that user-facing simplicity necessarily shifts complexity to infrastructure, compliance, and personnel is presented as a direct consequence of the framework but supplies no empirical mapping, case evidence, or measurement protocol to demonstrate the relocation mechanism or its magnitude.

Authors: The manuscript is a conceptual paper; the relocation mechanism follows logically from the framework and is illustrated with references to documented enterprise LLM deployments. We do not claim to have conducted primary empirical research. In the revised version we will add a brief subsection with illustrative cases drawn from publicly reported LLM implementations in legal, healthcare, and financial services domains to map the specific complexity shifts. We will also sketch a high-level measurement protocol (e.g., tracking changes in infrastructure spend, compliance review hours, and required specialist roles) as a direction for future empirical work. This addresses the request for clearer mapping while preserving the paper’s conceptual focus. revision: partial

Circularity Check

1 steps flagged

GAS framework posits inherent trade-offs by definition, making relocation a direct consequence rather than independently derived

specific steps

self definitional [Abstract]
"We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization"

The paper defines the GAS framework around inherent and fixed trade-offs, then concludes that these trade-offs must be relocated rather than reduced or eliminated. The relocation mechanism and resulting managerial challenges are entailed by the definitional premise itself, without a separate argument demonstrating conservation of the trade-off surface or ruling out simultaneous improvement across all three dimensions.

full rationale

The manuscript introduces the GAS framework as capturing fixed trade-offs among generality, accuracy, and simplicity, then interprets LLM user interfaces as necessarily relocating rather than relaxing those trade-offs. The central claims about complexity shifting to infrastructure and personnel follow tautologically from the initial assumption that such trade-offs are conserved quantities. No equations, bounding arguments, empirical mappings, or external benchmarks are supplied to establish why the three dimensions cannot improve jointly. The derivation is therefore self-contained within the framework's definitional premises.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper rests on conceptual domain assumptions about the persistence of GAS trade-offs and the necessity of complexity relocation; no free parameters or invented physical entities are present because the work is a management framework rather than a formal model.

axioms (2)

domain assumption Trade-offs among generality, accuracy, and simplicity are inherent in AI systems and persist even with large language models.
Invoked to argue that LLMs only appear to break the trade-off while actually relocating it.
domain assumption User-facing simplicity necessarily shifts complexity to infrastructure, compliance, and specialized personnel.
Central premise used to derive new managerial challenges and sources of competitive advantage.

invented entities (1)

GAS framework no independent evidence
purpose: Analytical lens for examining trade-offs and complexity redistribution in generative AI adoption.
New conceptual construct introduced by the authors without external empirical grounding shown in the abstract.

pith-pipeline@v0.9.0 · 5730 in / 1437 out tokens · 70029 ms · 2026-05-22T01:12:07.905658+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

no model can perfectly replicate the phenomena it represents. Instead, effective models must balance competing priorities across three dimensions: generality, accuracy, and simplicity
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Tesler’s Law, or the Law of Conservation of Complexity... there is an inherent amount of complexity that cannot be eliminated, only redistributed

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vibecoding and Digital Entrepreneurship
econ.GN 2025-11 unverdicted novelty 6.0

GenAI coding automation boosts first-time launches and shortens time to market, yet viable entry rises 11% only in product segments where AI augments engineering capabilities, driven by STEM-educated founders.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Abrego, M. (2025). JPMorgan execs detail how AI is transforming the bank.Business Insider. Accessed June 8,

work page 2025
[2]

Agarwal, M., Qureshi, A., Sardana, N., Li, L., Quevedo, J., and Khudia, D. (2023). LLM inference performance engineering: Best practices. Mosaic AI Research blog (Databricks). Accessed 10 May

work page 2023
[3]

Agrawal, A., McHale, J., and Oettl, A. (2018). Finding needles in haystacks: Artificial intelli- gence and recombinant growth. InThe economics of artificial intelligence: An agenda, pages 149–174. University of Chicago Press. Alekseeva, L., Azar, J., Giné, M., and Samila, S. (2024). AI adoption and the demand for managerial expertise. Working Paper. Aleks...

work page 2018
[4]

L., and McNelly, T

Arthur Jr., W., Bennett Jr., W., Stanush, P. L., and McNelly, T. L. (1998). Factors that influence skill decay and retention: A quantitative review and analysis.Human Performance, 11(1):57–101. Bank of America (2024). BofA’s Erica surpasses 2 billion interactions, helping 42 million clients since launch. Accessed June 8,

work page 1998
[5]

Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., and Mariman, R. (2024). Generative AI can harm learning.The Wharton School Research Paper. Accessed May 19,

work page 2024
[6]

Bauer, J. (2024). Does GitHub Copilot improve code quality? Here’s what the data says.The GitHub Blog. Accessed: 2025-06-08. Note: The user-provided PDF does not have a specific day, but online versions point to Nov 14,

work page 2024
[7]

M., Gebru, T., McMillan-Major, A., and Shmitchell, S

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery. Bloomberg (2023). Introducingbloomberggpt: b...

work page 2021
[8]

D., Freed, J

Brodeur, P.G., Buckley, T.A., Kanjee, Z., Goh, E., Ling, E.B., Jain, P., Cabral, S., Abdulnour, R.-E., Haimovich, A. D., Freed, J. A., Olson, A., Morgan, D. J., Hom, J., Gallo, R., McCoy, L.G., Mombini, H., Lucas, C., Fotoohi, M., Gwiazdon, M., Restifo, D., Restrepo, D., Horvitz, E., Chen, J., Manrai, A. K., and Rodman, A. (2024). Superhuman performance o...

work page arXiv 2024
[9]

Claburn, T. (2024). GitHub’s boast that Copilot produces high-quality code challenged.The Register. Accessed: 2025-06-08. Cottier, B., Snodin, B., Owen, D., and Adamczewski, T. (2025). LLM inference prices have fallen rapidly but unequally across tasks. Accessed: 2025-05-10. Cui, K. Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., and Salz, T. (2024). T...

work page 2024
[10]

Early impacts of M365 Copilot.arXiv preprint arXiv:2504.11443, 2025

Also circulated as “The Wharton School Research Paper”. 57 Denny, P., Leinonen, J., Prather, J., Luxton-Reilly, A., Amarouche, T., Becker, B. A., and Reeves, B. N. (2024). Prompt problems: A new programming exercise for the generative AI era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2024), pages 296–302...

work page arXiv 2024
[11]

Eloundou, T., Manning, S., Mishkin, P., and Rock, D. (2023). GPTs are GPTs: An early look at thelabormarketimpactpotentialoflargelanguagemodels. arXiv preprint arXiv:2303.10130. Epstein, D. (2019).Range: Why Generalists Triumph in a Specialized World. Riverhead Books. Ferguson, J.-P. and Hasan, S. (2013). Specialization and career dynamics: Evidence from ...

work page arXiv 2023
[12]

Haase, J., Hanel, P

URL: https://verissimo.substack.com/p/the-unreliability-of-llms-and-what- that-means-for-builders. Haase, J., Hanel, P. H. P., and Pokutta, S. (2025). Has the creativity of large-language models peaked? an analysis of inter- and intra-llm variability. Hackenburg, K., Tappin, B. M., Röttger, P., Hale, S. A., Bright, J., and Margetts, H. (2025). Scaling lan...

work page 2025
[13]

arXiv preprint arXiv:2503.04761 , year=

Handa, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., Troy, K. K., Amodei, D., Kaplan, J., Clark, J., and Ganguli, D. (2025). Which economic tasks are performed with AI? evidence from millions of Claude conversations. arXiv preprint arXiv:2503.04761. 58 Harris, L. and Heikkilä, M. (2025). Insu...

work page arXiv 2025
[14]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Hoffman, M., Puranik, H., Srivastava, S. B., and Brynjolfsson, E. (2024). Generative AI and distributed work: Evidence from open source software. Working Paper. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). LoRA: Low-rank adaptation of large language models. Hu, K. (2023). Chatgpt sets record for fastest-...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

and Aral, S

Ju, H. and Aral, S. (2025). Collaborating with AI agents: Field experiments on teamwork, productivity, and performance. arXiv:2503.18238v1 [cs.CY]. Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. Kasneci, E., Sessler, K., Küchemann, ...

work page arXiv 2025
[16]

Klarna AI assistant handles two-thirds of customer service chats in its first month

Klarna (2024). Klarna AI assistant handles two-thirds of customer service chats in its first month. Press release. Koomen, P. (2025). AI horseless carriages. Kuhn, T. (1977). Objectivity, value judgment, and theory choice. InThe Essential Tension, pages 320–39. University of Chicago Press. Laban, P., Hayashi, H., Zhou, Y., and Neville, J. (2025). LLMs get...

work page 2024
[17]

59 Lehmann, M., Cornelius, P

Available at SSRN: 5166938. 59 Lehmann, M., Cornelius, P. B., and Sting, F. J. (2024). AI meets the classroom: When does chatgpt harm learning? Levins, R. (1966). The strategy of model building in population biology.American Scientist, 54(4):421–431. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t...

work page 2024
[18]

GPT-4 Technical Report

OpenAI (2023). Gpt–4 technical report. Technical Report arXiv:2303.08774, OpenAI. Accessed 10 May

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Morgan stanley uses AI evals to shape the future of financial services

OpenAI (2024). Morgan stanley uses AI evals to shape the future of financial services. Accessed 2025-05-20. OpenAI (2025). OpenAI o3 and o4-mini system card. Osmani, A. (2024). The 70% problem: Hard truths about ai-assisted coding.Addy Osmani’s Substack. Osmani, A. (2025). Beyond the 70%: Maximizing the human 30% of ai-assisted coding.Addy Osmani’s Substa...

work page 2024
[20]

Ptacek, T. (2025). My AI skeptic friends are all nuts. Blog post, The Fly Blog. Ring, S. (2025). AI law firm offering £2 legal letters wins ‘landmark’ approval.Financial Times. Accessed May 18,

work page 2025
[21]

Roldán-Monés, A. (2024). When GenAI increases inequality: Evidence from a university de- bating competition. Working Paper 24/09, EsadeEcPol and London School of Economics, Barcelona. Working Paper, version of 26 August

work page 2024
[22]

R., Prescott, J

Schwarcz, D., Manning, S., Barry, P., Cleveland, D. R., Prescott, J. J., and Rich, B. (2025). Ai- powered lawyering: AI reasoning models, retrieval augmented generation, and the future of legalpractice. ResearchPaper25-16, UniversityofMinnesotaLawSchool. Posted4Mar2025; last revised 4 Mar

work page 2025
[23]

grunt work

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., and Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems 28 (NIPS 2015). Sellen, A. and Horvitz, E. (2024). The rise of the AI co-pilot: Lessons for design from aviation ...

work page arXiv 2015
[24]

Zhang, R., Zhao, W., and Eger, S. (2025). How good are LLMs for literary translation, really? literary translation evaluation with humans and LLMs. Zhang, Z., Zheng, C., Tang, D., Sun, K., Ma, Y., Bu, Y., Zhou, X., and Zhao, L. (2023). Balancing specialized and general skills in LLMs: The impact of modern tuning and data strategy. 63

work page 2025

[1] [1]

Abrego, M. (2025). JPMorgan execs detail how AI is transforming the bank.Business Insider. Accessed June 8,

work page 2025

[2] [2]

Agarwal, M., Qureshi, A., Sardana, N., Li, L., Quevedo, J., and Khudia, D. (2023). LLM inference performance engineering: Best practices. Mosaic AI Research blog (Databricks). Accessed 10 May

work page 2023

[3] [3]

Agrawal, A., McHale, J., and Oettl, A. (2018). Finding needles in haystacks: Artificial intelli- gence and recombinant growth. InThe economics of artificial intelligence: An agenda, pages 149–174. University of Chicago Press. Alekseeva, L., Azar, J., Giné, M., and Samila, S. (2024). AI adoption and the demand for managerial expertise. Working Paper. Aleks...

work page 2018

[4] [4]

L., and McNelly, T

Arthur Jr., W., Bennett Jr., W., Stanush, P. L., and McNelly, T. L. (1998). Factors that influence skill decay and retention: A quantitative review and analysis.Human Performance, 11(1):57–101. Bank of America (2024). BofA’s Erica surpasses 2 billion interactions, helping 42 million clients since launch. Accessed June 8,

work page 1998

[5] [5]

Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., and Mariman, R. (2024). Generative AI can harm learning.The Wharton School Research Paper. Accessed May 19,

work page 2024

[6] [6]

Bauer, J. (2024). Does GitHub Copilot improve code quality? Here’s what the data says.The GitHub Blog. Accessed: 2025-06-08. Note: The user-provided PDF does not have a specific day, but online versions point to Nov 14,

work page 2024

[7] [7]

M., Gebru, T., McMillan-Major, A., and Shmitchell, S

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery. Bloomberg (2023). Introducingbloomberggpt: b...

work page 2021

[8] [8]

D., Freed, J

Brodeur, P.G., Buckley, T.A., Kanjee, Z., Goh, E., Ling, E.B., Jain, P., Cabral, S., Abdulnour, R.-E., Haimovich, A. D., Freed, J. A., Olson, A., Morgan, D. J., Hom, J., Gallo, R., McCoy, L.G., Mombini, H., Lucas, C., Fotoohi, M., Gwiazdon, M., Restifo, D., Restrepo, D., Horvitz, E., Chen, J., Manrai, A. K., and Rodman, A. (2024). Superhuman performance o...

work page arXiv 2024

[9] [9]

Claburn, T. (2024). GitHub’s boast that Copilot produces high-quality code challenged.The Register. Accessed: 2025-06-08. Cottier, B., Snodin, B., Owen, D., and Adamczewski, T. (2025). LLM inference prices have fallen rapidly but unequally across tasks. Accessed: 2025-05-10. Cui, K. Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., and Salz, T. (2024). T...

work page 2024

[10] [10]

Early impacts of M365 Copilot.arXiv preprint arXiv:2504.11443, 2025

Also circulated as “The Wharton School Research Paper”. 57 Denny, P., Leinonen, J., Prather, J., Luxton-Reilly, A., Amarouche, T., Becker, B. A., and Reeves, B. N. (2024). Prompt problems: A new programming exercise for the generative AI era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2024), pages 296–302...

work page arXiv 2024

[11] [11]

Eloundou, T., Manning, S., Mishkin, P., and Rock, D. (2023). GPTs are GPTs: An early look at thelabormarketimpactpotentialoflargelanguagemodels. arXiv preprint arXiv:2303.10130. Epstein, D. (2019).Range: Why Generalists Triumph in a Specialized World. Riverhead Books. Ferguson, J.-P. and Hasan, S. (2013). Specialization and career dynamics: Evidence from ...

work page arXiv 2023

[12] [12]

Haase, J., Hanel, P

URL: https://verissimo.substack.com/p/the-unreliability-of-llms-and-what- that-means-for-builders. Haase, J., Hanel, P. H. P., and Pokutta, S. (2025). Has the creativity of large-language models peaked? an analysis of inter- and intra-llm variability. Hackenburg, K., Tappin, B. M., Röttger, P., Hale, S. A., Bright, J., and Margetts, H. (2025). Scaling lan...

work page 2025

[13] [13]

arXiv preprint arXiv:2503.04761 , year=

Handa, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., Troy, K. K., Amodei, D., Kaplan, J., Clark, J., and Ganguli, D. (2025). Which economic tasks are performed with AI? evidence from millions of Claude conversations. arXiv preprint arXiv:2503.04761. 58 Harris, L. and Heikkilä, M. (2025). Insu...

work page arXiv 2025

[14] [14]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Hoffman, M., Puranik, H., Srivastava, S. B., and Brynjolfsson, E. (2024). Generative AI and distributed work: Evidence from open source software. Working Paper. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). LoRA: Low-rank adaptation of large language models. Hu, K. (2023). Chatgpt sets record for fastest-...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

and Aral, S

Ju, H. and Aral, S. (2025). Collaborating with AI agents: Field experiments on teamwork, productivity, and performance. arXiv:2503.18238v1 [cs.CY]. Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. Kasneci, E., Sessler, K., Küchemann, ...

work page arXiv 2025

[16] [16]

Klarna AI assistant handles two-thirds of customer service chats in its first month

Klarna (2024). Klarna AI assistant handles two-thirds of customer service chats in its first month. Press release. Koomen, P. (2025). AI horseless carriages. Kuhn, T. (1977). Objectivity, value judgment, and theory choice. InThe Essential Tension, pages 320–39. University of Chicago Press. Laban, P., Hayashi, H., Zhou, Y., and Neville, J. (2025). LLMs get...

work page 2024

[17] [17]

59 Lehmann, M., Cornelius, P

Available at SSRN: 5166938. 59 Lehmann, M., Cornelius, P. B., and Sting, F. J. (2024). AI meets the classroom: When does chatgpt harm learning? Levins, R. (1966). The strategy of model building in population biology.American Scientist, 54(4):421–431. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t...

work page 2024

[18] [18]

GPT-4 Technical Report

OpenAI (2023). Gpt–4 technical report. Technical Report arXiv:2303.08774, OpenAI. Accessed 10 May

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Morgan stanley uses AI evals to shape the future of financial services

OpenAI (2024). Morgan stanley uses AI evals to shape the future of financial services. Accessed 2025-05-20. OpenAI (2025). OpenAI o3 and o4-mini system card. Osmani, A. (2024). The 70% problem: Hard truths about ai-assisted coding.Addy Osmani’s Substack. Osmani, A. (2025). Beyond the 70%: Maximizing the human 30% of ai-assisted coding.Addy Osmani’s Substa...

work page 2024

[20] [20]

Ptacek, T. (2025). My AI skeptic friends are all nuts. Blog post, The Fly Blog. Ring, S. (2025). AI law firm offering £2 legal letters wins ‘landmark’ approval.Financial Times. Accessed May 18,

work page 2025

[21] [21]

Roldán-Monés, A. (2024). When GenAI increases inequality: Evidence from a university de- bating competition. Working Paper 24/09, EsadeEcPol and London School of Economics, Barcelona. Working Paper, version of 26 August

work page 2024

[22] [22]

R., Prescott, J

Schwarcz, D., Manning, S., Barry, P., Cleveland, D. R., Prescott, J. J., and Rich, B. (2025). Ai- powered lawyering: AI reasoning models, retrieval augmented generation, and the future of legalpractice. ResearchPaper25-16, UniversityofMinnesotaLawSchool. Posted4Mar2025; last revised 4 Mar

work page 2025

[23] [23]

grunt work

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., and Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems 28 (NIPS 2015). Sellen, A. and Horvitz, E. (2024). The rise of the AI co-pilot: Lessons for design from aviation ...

work page arXiv 2015

[24] [24]

Zhang, R., Zhao, W., and Eger, S. (2025). How good are LLMs for literary translation, really? literary translation evaluation with humans and LLMs. Zhang, Z., Zheng, C., Tang, D., Sun, K., Ma, Y., Bu, Y., Zhou, X., and Zhao, L. (2023). Balancing specialized and general skills in LLMs: The impact of modern tuning and data strategy. 63

work page 2025