Who Uses AI? Platform Selection and the Measurement of Occupational AI Exposure

Burhan Ogut; Michelle Yin

arxiv: 2605.21743 · v2 · pith:IHLRPUOTnew · submitted 2026-05-20 · 💻 cs.AI · econ.GN· q-fin.EC

Who Uses AI? Platform Selection and the Measurement of Occupational AI Exposure

Michelle Yin , Burhan Ogut This is my paper

Pith reviewed 2026-05-22 08:40 UTC · model grok-4.3

classification 💻 cs.AI econ.GNq-fin.EC

keywords AI exposureplatform user basemeasurement erroremployment effectslabor market impactsworkforce compositionsubstitution vs augmentation

0 comments

The pith

AI exposure measures from platform logs capture user demographics more than the general workforce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that AI exposure scores calculated from conversation logs on different platforms do not reflect the overall workforce but instead the specific users of each platform. When the same analysis is run with different platform data but everything else held constant, the estimated effect of AI on employment after ChatGPT's release changes by a factor of 1.9. Consumer and enterprise versions of the same platform can even produce estimates with opposite signs. Reweighting the data to match official labor statistics from the Bureau of Labor Statistics reduces these estimates by between 42 and 93 percent. This measurement issue leads to understating the degree to which AI substitutes for human labor compared to how it augments it.

Core claim

The central claim is that platform-based AI exposure measures are contaminated by user base composition. Holding the outcome variable, sample, controls, and statistical estimator fixed, but changing only the platform from which the exposure scores are derived, multiplies the post-ChatGPT employment coefficient by 1.9. Within the same vendor, consumer and enterprise channels produce coefficients that disagree in sign. Reweighting observations to Bureau of Labor Statistics workforce shares attenuates the estimates by 42 to 93 percent. The authors formalize this as non-classical measurement error and derive the resulting probability limits and partial-identification bounds for employment elast

What carries the argument

Platform user base composition as the source of measurement error in AI exposure scores derived from conversation logs.

If this is right

Switching between platforms can reverse the sign of estimated AI employment effects.
Adjusting for workforce composition substantially lowers measured AI impacts on jobs.
The bias from user mismatch affects substitution estimates more than augmentation estimates.
Partial identification bounds can be placed on the true employment elasticities despite the error.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Studies of AI labor impacts should incorporate demographic adjustments or multi-platform data to reduce bias.
Reported levels of AI exposure in occupations may be skewed toward users who are early adopters of the technology.
Policy estimates of job displacement risks from AI could be understated without correcting for these platform differences.

Load-bearing premise

The observed differences in estimates across platforms and channels stem primarily from differences in the composition of their user bases.

What would settle it

If the employment coefficients remained unchanged after reweighting the platform data to match Bureau of Labor Statistics occupation shares, this would indicate that user base composition is not the main driver of the variation.

Figures

Figures reproduced from arXiv: 2605.21743 by Burhan Ogut, Michelle Yin.

**Figure 3.** Figure 3: Cross-source difference-in-differences coefficients across ten exposure variants. [PITH_FULL_IMAGE:figures/full_fig_p043_3.png] view at source ↗

**Figure 4.** Figure 4: Event-study coefficients on the employment indicator by exposure variant, 2015 to 2024. [PITH_FULL_IMAGE:figures/full_fig_p044_4.png] view at source ↗

read the original abstract

Conversation logs from AI platforms are increasingly used to measure occupational exposure to artificial intelligence, but the users observed in these logs are not the workforce. We show that platform-derived exposure scores combine task-level AI applicability with the occupational composition of the platform's user base. Holding the empirical design fixed, changing only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and consumer and enterprise channels within the same vendor disagree in sign. We formalize the resulting non-classical measurement error, decompose it into between- and within-occupation selection, and construct workforce-reweighted partial-identification bounds. Reweighting to Bureau of Labor Statistics employment shares attenuates estimates by 42 to 93 percent. The bias captures augmentation among observed users more directly than substitution in the workforce.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Swapping platform exposure measures while holding everything else fixed shifts post-ChatGPT employment coefficients by a factor of 1.9 and flips signs within the same vendor, with BLS reweighting cutting estimates by 42-93%.

read the letter

The paper's core demonstration is that AI exposure scores drawn from platform logs are sensitive to which platform you pick. Holding the outcome variable, sample, controls, and estimator fixed, switching the exposure input alone moves the key coefficient by a factor of 1.9. Consumer and enterprise channels from the same vendor even produce opposite signs. Reweighting the data to Bureau of Labor Statistics workforce shares then attenuates the estimates by 42 to 93 percent. They formalize this as non-classical measurement error and derive probability limits plus partial identification bounds for the employment elasticities. The bias appears to understate substitution effects more than augmentation ones. That is the main new piece: a direct quantification of how much platform choice matters in this specific literature, plus the measurement-error framing applied to it. The empirical variation is straightforward to understand and the formal bounds are a useful addition beyond just reporting different numbers. The central assumption is that the observed differences trace mainly to user-base composition rather than to platform-specific details in query logging, occupation mapping, or filtering. The stress-test note raises a fair point here. If those other platform features covary with the employment outcome, the coefficient shifts cannot be signed as cleanly as composition-driven bias. The abstract does not supply enough on data construction or robustness checks to settle that. The reweighting results are still informative on their own, but the direction of the bias claim rests on the composition interpretation holding up. This work is aimed at researchers who use or cite AI exposure measures in labor economics and related fields. Anyone running regressions with these scores should see the sensitivity results. It is worth sending to peer review because the empirical pattern is concrete and the formalization adds structure, even if the attribution to user demographics needs tighter evidence in revision.

Referee Report

2 major / 2 minor

Summary. The paper claims that AI exposure scores from platform conversation logs partly capture platform user-base composition rather than workforce exposure. Holding outcome, sample, controls, and estimator fixed while swapping only the platform-derived exposure measure produces a 1.9-fold change in the post-ChatGPT employment coefficient and sign disagreements between consumer and enterprise channels from the same vendor. Reweighting to BLS workforce shares attenuates estimates by 42–93 percent. The authors formalize this as non-classical measurement error, derive probability limits and partial-identification bounds, and conclude that the bias understates substitution more than augmentation.

Significance. If the central interpretation holds, the result would caution the growing literature that relies on platform logs for occupation-level AI exposure. The formal derivation of bounds under non-classical error and the empirical demonstration of sensitivity to platform choice provide a concrete methodological contribution. The reweighting exercise and the finding that bias direction favors understating substitution are potentially useful for correcting future estimates.

major comments (2)

Abstract and identification strategy: the claim that the 1.9 factor change and consumer-enterprise sign disagreement are driven primarily by user-base composition (rather than platform-specific differences in query logging, occupation mapping, prompt distributions, or post-processing filters) is load-bearing for the subsequent probability limits and partial-identification bounds. If any of these measurement features covary with the employment outcome, the observed coefficient differences cannot be attributed solely to demographics, and the signed bias result does not follow.
Reweighting and bounds section: the attenuation of 42–93 percent upon reweighting to BLS shares is presented as evidence of composition-driven bias, but without explicit robustness checks showing that the reweighting does not interact with platform-specific measurement artifacts, the direction of the bias (understating substitution) remains sensitive to the same identification assumption.

minor comments (2)

Clarify in the methods whether the within-vendor consumer-versus-enterprise comparison holds all other data-construction steps (e.g., occupation classification rules) exactly fixed or whether any vendor-specific post-processing differs.
Add a table or appendix entry reporting the raw (unreweighted) versus reweighted coefficient magnitudes for each platform/channel to make the 42–93 percent attenuation range directly verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our identification strategy and reweighting exercise. These points have prompted us to clarify key assumptions and add supporting discussion. We address each major comment below.

read point-by-point responses

Referee: Abstract and identification strategy: the claim that the 1.9 factor change and consumer-enterprise sign disagreement are driven primarily by user-base composition (rather than platform-specific differences in query logging, occupation mapping, prompt distributions, or post-processing filters) is load-bearing for the subsequent probability limits and partial-identification bounds. If any of these measurement features covary with the employment outcome, the observed coefficient differences cannot be attributed solely to demographics, and the signed bias result does not follow.

Authors: We agree that cleanly attributing the coefficient differences to user demographics is central. The within-vendor consumer-versus-enterprise comparison is designed to hold fixed many platform-specific features (query logging, occupation mapping, and post-processing) while varying user base. We have revised the identification section to make this argument more explicit, including a discussion of why residual differences in prompt distributions or filters are unlikely to produce the observed sign flip and 1.9-fold variation. We have also updated the abstract to note that the within-vendor evidence supports the demographic interpretation. revision: yes
Referee: Reweighting and bounds section: the attenuation of 42–93 percent upon reweighting to BLS shares is presented as evidence of composition-driven bias, but without explicit robustness checks showing that the reweighting does not interact with platform-specific measurement artifacts, the direction of the bias (understating substitution) remains sensitive to the same identification assumption.

Authors: We acknowledge the value of explicit checks for interactions between reweighting and platform artifacts. In the revision we have added a new robustness subsection that reapplies the BLS reweighting to platform-specific subsamples and to the within-vendor channels separately. These checks show that the attenuation pattern is stable and not driven by platform-specific measurement features. We have also clarified the maintained assumptions in the partial-identification bounds to reflect this additional evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation

full rationale

The paper's central results compare post-ChatGPT employment coefficients across platform-derived exposure measures while holding outcome, sample, controls, and estimator fixed, then reweight using external BLS workforce shares and derive probability limits from a non-classical measurement-error model. These steps rely on external benchmarks and standard econometric formalization rather than reducing to self-definitional constructs, fitted parameters renamed as predictions, or self-citation chains. No load-bearing self-citations, uniqueness theorems, or ansatz smuggling are present; the derivation remains self-contained against external data sources.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the domain assumption that platform logs are a noisy but usable proxy for exposure once user-base composition is accounted for, plus standard econometric assumptions for non-classical measurement error; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Platform conversation logs provide a valid but composition-biased measure of AI exposure
Invoked when interpreting cross-platform variation and reweighting results as evidence of user-base contamination rather than true exposure differences.

pith-pipeline@v0.9.0 · 5632 in / 1376 out tokens · 43927 ms · 2026-05-22T08:40:30.816100+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize the non-classical measurement error... plim β̂_p = β λ_p κ_p / (λ_p² κ_p + 1)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.