pith. sign in

arxiv: 2507.22748 · v3 · submitted 2025-07-30 · 💰 econ.GN · q-fin.EC

How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index

Pith reviewed 2026-05-19 02:52 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC
keywords generative AIlabor market exposuretask-based approachUK jobslarge language modelsGAISIemployment surveysoccupational analysis
0
0 comments X

The pith

A new index finds that 94% of UK jobs have some exposure to large language models but only 13% are heavily exposed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a Generative AI Susceptibility Index by asking large language models to rate worker-reported tasks from UK employment surveys on whether they could be finished at least 25% faster than current methods. This produces a job-level score that tracks how much each occupation could be affected as the technology spreads. The results show exposure is now nearly universal across the workforce yet remains light for most roles, with the strongest concentration appearing in scientific and technical fields. Exposure has grown modestly since 2017, mainly because workers have shifted into more exposed occupations rather than because tasks inside jobs have changed. The index also links to recent drops in wage premiums and hiring for higher-exposure roles, suggesting early labor-market effects.

Core claim

The Generative AI Susceptibility Index measures the share of a job's tasks that large language models can complete at least 25% faster than existing tools. When applied to British Skills and Employment Survey data, the index shows 94% of UK jobs now carry some exposure while only 13% exceed a heavy-exposure threshold of 0.5, with the highest values concentrated in scientific and technical professions. Overall exposure rose by about 16% of a standard deviation between 2017 and 2023/24, driven by occupational shifts rather than within-job task changes, and the wage premium attached to exposed tasks declined 12% over the same period.

What carries the argument

The Generative AI Susceptibility Index (GAISI), which scores each job as the share of its tasks that LLMs rate as completable at least 25% faster than existing tools, constructed by linking probabilistic LLM ratings to worker-reported task data from the British Skills and Employment Surveys.

If this is right

  • Scientific and technical professions carry the highest share of heavily exposed jobs.
  • Aggregate exposure increased mainly through shifts of workers into different occupations rather than changes in tasks performed inside occupations.
  • Wage premiums for tasks rated as AI-exposed fell by 12% between 2017 and 2023/24.
  • Job postings contracted relatively in occupations with higher exposure after the release of ChatGPT.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The index could serve as a monitoring tool for policymakers tracking how exposure evolves with successive improvements in model capabilities.
  • If exposure remains concentrated in a narrow set of occupations, targeted training programs might focus on those roles rather than broad workforce retraining.
  • Longer-term productivity effects would likely appear first in the scientific and technical sectors where GAISI scores are highest.

Load-bearing premise

That LLM judgments about whether a task can be sped up by 25% provide a valid stand-in for real labor-market exposure once matched to survey responses.

What would settle it

Direct measurement of time savings or output gains from LLM use in high-GAISI versus low-GAISI jobs, or continued divergence in hiring and wage trends between high- and low-exposure occupations.

Figures

Figures reproduced from arXiv: 2507.22748 by Alan Felstead, Duncan Gallie, Francis Green, Golo Henseke, Rhys Davies, Ying Zhou.

Figure 1
Figure 1. Figure 1: GAISI Classification Pipeline 13 [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of Generative AI Exposure Across Jobs in 2023-24. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average Generative AI Exposure across Demographic Groups. [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predicted Probability of AI Use by GAISI Quintile. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Changes in Generative AI Exposure, 2017–2023/24. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: displays the estimated average marginal effect of AI exposure on job postings, relative to Q3 2022. The effects fluctuated around zero before GPT (average slope = 0.034, standard error = 0.010, 𝑝 < 0.001), indicating a mostly stable relationship between occupations’ AI exposure and hiring patterns from 2017 through the pandemic, up to the launch of ChatGPT. However, after GPT, the relationship changed: pos… view at source ↗
Figure 7
Figure 7. Figure 7: Heatmap of Dominant Exposure Class (E0, E1, E2) across the 12 SES Task Categories [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of Generative AI Affordances across SES Task Categories. [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
read the original abstract

Building on the task-based approach to labour markets, we develop the Generative AI Susceptibility Index (GAISI), a job-level measure of UK exposure to large language models (LLMs). Drawing on Eloundou et al. (2024), we use LLMs as probabilistic raters to classify task exposure, linking ratings to worker-reported task data from the British Skills and Employment Surveys. GAISI measures the share of job activities where LLMs can reduce task completion time by at least 25% beyond existing tools. Systematic validations demonstrate high reliability, strong validity, and predictive power over existing exposure measures. By 2023/24, nearly all UK jobs (94%) exhibited some LLM exposure, yet only 13% were heavily exposed (GAISI > 0.5), with the highest concentration in scientific and technical professions. Aggregate exposure rose 16% of one standard deviation since 2017, driven by occupational shifts rather than within-occupation task changes. The wage premium for AI-exposed tasks declined 12% between 2017 and 2023/24, and the period since ChatGPT's release has coincided with a relative contraction of job postings in more AI-exposed occupations. These findings are consistent with generative AI beginning to affect hiring and pay in exposed occupations, though causal attribution requires further research. GAISI offers policymakers and researchers a validated, replicable tool for monitoring AI exposure at the job level as this technology diffuses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops the Generative AI Susceptibility Index (GAISI), a job-level measure of UK exposure to large language models. It uses LLMs as probabilistic raters to score tasks from the British Skills and Employment Surveys on whether they can be completed at least 25% faster than with existing tools, then aggregates these into an index of the share of job activities meeting the criterion. The paper reports that by 2023/24 nearly all UK jobs (94%) have some exposure while only 13% are heavily exposed (GAISI > 0.5), with highest concentration in scientific and technical professions. Aggregate exposure rose 16% of one standard deviation since 2017 due to occupational shifts, the wage premium for AI-exposed tasks fell 12%, and job postings contracted relatively in more exposed occupations.

Significance. If the LLM ratings validly proxy real exposure, the work supplies a replicable, UK-specific monitoring tool that extends task-based approaches to a new technology. The reported internal reliability, validity, and predictive-power checks are strengths, as is the linkage to worker-reported task data rather than purely occupational aggregates. The descriptive patterns (widespread but shallow exposure, recent shifts, and early labor-market signals) could usefully inform policy discussion, though the authors correctly note that causal attribution requires further research.

major comments (2)
  1. [Methods (LLM rating and aggregation)] Methods section on LLM rating procedure and aggregation: The central exposure percentages (94% some exposure, 13% GAISI > 0.5) are direct aggregates of the LLM probabilistic ratings. The manuscript should supply fuller detail on the exact prompting template, number of independent LLM evaluations per task, how probabilities are converted to binary exposure indicators, and any sensitivity tests around the 25% time-reduction threshold and the GAISI > 0.5 heavy-exposure cutoff. These choices are load-bearing for the headline numbers and the 16% SD rise since 2017.
  2. [Validation and robustness checks] Validation section: The paper states that systematic validations demonstrate high reliability, strong validity, and predictive power. However, these appear to be internal consistency and correlation checks. Because the index is interpreted as a measure of actual labor-market exposure, external validation against observed productivity gains, measured time savings, or human-expert benchmarks on a subset of tasks would materially strengthen the claim that LLM ratings serve as a valid proxy, especially given possible LLM biases in assessing tacit knowledge.
minor comments (2)
  1. [Abstract] Abstract: Define 'some LLM exposure' more explicitly in relation to the GAISI scale or any minimum threshold applied.
  2. [Results (wage premium and postings)] Results on wage premium and job postings: Provide the precise econometric specification (e.g., controls, fixed effects, sample) used to estimate the 12% decline and the relative contraction in postings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our methods and strengthen the interpretation of our validation exercises. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: Methods section on LLM rating procedure and aggregation: The central exposure percentages (94% some exposure, 13% GAISI > 0.5) are direct aggregates of the LLM probabilistic ratings. The manuscript should supply fuller detail on the exact prompting template, number of independent LLM evaluations per task, how probabilities are converted to binary exposure indicators, and any sensitivity tests around the 25% time-reduction threshold and the GAISI > 0.5 heavy-exposure cutoff. These choices are load-bearing for the headline numbers and the 16% SD rise since 2017.

    Authors: We agree that greater detail on the LLM rating and aggregation procedure is warranted to support replicability. In the revised manuscript we will expand the Methods section to include the precise prompting template, the number of independent LLM evaluations performed per task, the exact procedure for converting probabilistic ratings into binary exposure indicators, and sensitivity checks that vary both the 25% time-reduction threshold and the GAISI > 0.5 heavy-exposure cutoff. These additions will directly address the load-bearing nature of these choices for the reported exposure shares and the 2017–2023/24 change. revision: yes

  2. Referee: Validation section: The paper states that systematic validations demonstrate high reliability, strong validity, and predictive power. However, these appear to be internal consistency and correlation checks. Because the index is interpreted as a measure of actual labor-market exposure, external validation against observed productivity gains, measured time savings, or human-expert benchmarks on a subset of tasks would materially strengthen the claim that LLM ratings serve as a valid proxy, especially given possible LLM biases in assessing tacit knowledge.

    Authors: Our validation exercises consist of internal consistency (inter-LLM reliability), convergent validity with prior exposure measures, and predictive tests against observed labor-market outcomes such as wages and job postings. We will revise the Validation section to describe these checks more explicitly and to acknowledge that they remain internal or correlational. We agree that direct external benchmarks—productivity gains, time-use diaries, or expert human ratings—would be desirable; however, such data are not available in the British Skills and Employment Surveys and would require new primary data collection outside the scope of the present study. In the revision we will add an explicit limitations paragraph discussing potential LLM biases in tacit-knowledge tasks and outline directions for future external validation. revision: partial

Circularity Check

0 steps flagged

No significant circularity in GAISI construction or exposure claims

full rationale

The paper constructs the Generative AI Susceptibility Index (GAISI) by applying LLM probabilistic ratings—drawn from the method in Eloundou et al. (2024), an independent prior study—to task-level data from the British Skills and Employment Surveys. The headline exposure statistics (94% of jobs with some exposure, 13% with GAISI > 0.5) and trends (16% SD rise since 2017) are direct aggregates of these ratings rather than outputs of any fitted model, self-referential equation, or parameter estimated from the target exposure figures themselves. No step renames a known result, imports uniqueness via self-citation, or treats a fitted input as a prediction. Reported validations address reliability and predictive power as separate checks without reducing the core claims to the same inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the validity of LLM task ratings and the choice of a 25% time-saving threshold to define exposure; these are introduced without independent external benchmarks in the abstract.

free parameters (2)
  • 25% time reduction threshold
    Defines what counts as meaningful LLM exposure for task classification.
  • GAISI > 0.5 cutoff for heavy exposure
    Arbitrary threshold used to report the 13% heavily exposed share.
axioms (1)
  • domain assumption LLMs can serve as reliable probabilistic raters of task-level time savings beyond existing tools
    Invoked when the paper uses LLMs to classify exposure in the abstract.
invented entities (1)
  • Generative AI Susceptibility Index (GAISI) no independent evidence
    purpose: Job-level measure of exposure to LLMs
    New constructed index introduced in this paper

pith-pipeline@v0.9.0 · 5811 in / 1440 out tokens · 34516 ms · 2026-05-19T02:52:08.171820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. What Jobs Can AI Learn? Measuring Exposure by Reinforcement Learning

    econ.GN 2026-05 unverdicted novelty 7.0

    A new RL Feasibility Index based on task learnability via reinforcement learning diverges from prior AI exposure measures, rating operational jobs like power plant operators as highly feasible while rating creative an...

  2. From Exposure to Adoption: Generative AI in European Workplaces

    econ.GN 2026-04 unverdicted novelty 5.0

    Generative AI adoption in Europe ranges from under 3% to 25%, is steeper for skilled workers in abstract-task jobs and in digitally advanced countries with training, shows a gender gap in exposed roles, and has produc...

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 2 Pith papers

  1. [1]

    Can Large Language Models Transform Computational Social Science?

    https://doi.org/10.1162/coli_a_00502 36 Appendices A Appendix: Classification Prompt <System prompt> You are an AI job expert tasked with analyzing job tasks for their exposure to AI, specifically Large Language Models (LLMs). Your goal is to determine the probability of different levels of AI exposure for each task, considering the incremental impact of ...

  2. [2]

    Analyze each task in the context of the occupation details provided

  3. [3]

    Consider how an average worker in this occupation would typically perform each task using existing tools

  4. [4]

    Assess how LLMs could potentially assist with each task, focusing on new capabilities beyond existing tools

  5. [5]

    For EACH TASK, calculate the probability of it falling into each of 37 the following AI exposure levels: - E0: No Exposure (LLMs do not meaningfully reduce time by 25% or more) - E1: Direct Exposure (LLMs alone can reduce time by at least 25%) - E2: Exposure via Imaginable LLM-Powered Applications (LLMs + additional software could reduce time by at least ...

  6. [6]

    In your analysis for each task, include: a

    Provide a brief justification for each probability distribution. In your analysis for each task, include: a. Task summary in the context of the occupation b. Existing tools typically used for this task c. Potential LLM capabilities that could assist with the task d. Arguments for and against each exposure level First provide a high-level analysis of the e...