pith. sign in

arxiv: 2602.15249 · v2 · pith:5GPDW4INnew · submitted 2026-02-16 · 💻 cs.DL · cs.AI

Artificial Intelligence Specialization in the European Union: Underexplored Role of the Periphery at NUTS-3 Level

Pith reviewed 2026-05-15 21:22 UTC · model grok-4.3

classification 💻 cs.DL cs.AI
keywords AI researchregional specializationNUTS-3bibliometricsEuropean Unioncitation impactperipheral regionsmachine learning
0
0 comments X

The pith

Peripheral NUTS-3 regions lead in relative AI specialization across the EU.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines AI research distribution in 781 European NUTS-3 regions from 2015 to 2024 using Clarivate InCites data and Citation Topics classification at macro and meso levels. It calculates Relative Specialization Index and Relative Citation Impact to show that major hubs like Paris, Warszawa, and Madrid lead in absolute publication counts, yet peripheral regions in Eastern Europe and Spain achieve the highest relative specialization, with Granada and Vilniaus apskritis also displaying strong citation visibility. The work identifies a weak relationship between specialization and impact, uncovering varied regional profiles including high-specialization low-visibility cases and high-impact low-specialization outliers such as Fyn.

Core claim

Using bibliometric data classified into Electrical Engineering, Electronics & Computer Science at the macro level and Artificial Intelligence & Machine Learning at the meso level, the study applies RSI and RCI metrics to reveal that relative AI specialization concentrates in peripheral NUTS-3 regions rather than metropolitan centers, with selected peripheral areas combining high specialization and citation performance while overall specialization correlates only weakly with impact.

What carries the argument

Relative Specialization Index (RSI) applied to meso-level AI & Machine Learning Citation Topics at the NUTS-3 scale, which quantifies how much a region's AI output deviates from the expected share based on its total research activity.

Load-bearing premise

The Clarivate Citation Topics system correctly identifies AI and Machine Learning publications without significant misclassification or coverage differences across regions.

What would settle it

Re-running the RSI calculations on the same publications but with an independent AI topic classification that eliminates the reported concentration of high specialization in Eastern European and Spanish peripheral regions.

Figures

Figures reproduced from arXiv: 2602.15249 by Carmen G\'alvez, Victor Herrero-Solana.

Figure 1
Figure 1. Figure 1: Output vs. cites TABLE I TOP-20 RSI REGIONS Region Country RSI Bugas Bulgaria 0.820 Olomoucky kraj Czech Rep 0.801 Giessen, Landkreis Germany 0.751 Granada Spain 0.712 Banskobystricky kraj Slovakia 0.706 Jihocesky kraj Czech Rep 0.702 Krakowski Poland 0.692 Czestochowski Poland 0.665 Vilniaus apskritis Lithuania 0.647 Rzeszowski Poland 0.641 Burgos Spain 0.631 Navarra Spain 0.630 Jaén Spain 0.621 Córdoba S… view at source ↗
Figure 2
Figure 2. Figure 2: Geospatial Relative Specialization Index (RSI) map [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

This study examines the distribution of Artificial Intelligence (AI) research across European NUTS-3 regions during the period 2015-2024. Using bibliometric data from Clarivate InCites and the Citation Topics classification system, we analyse two hierarchical thematic levels: Electrical Engineering, Electronics & Computer Science (Macro Citation Topic 4) and Artificial Intelligence & Machine Learning (Meso Citation Topic 4.61). Relative Specialization Index (RSI) and Relative Citation Impact (RCI) indicators are calculated for 781 European NUTS-3 regions. While major metropolitan hubs such as Paris, Warszawa, and Madrid dominate in absolute publication volume, the results reveal that the highest levels of relative AI specialization are concentrated in peripheral regions, particularly in Eastern Europe and Spain. Granada and Vilniaus apskritis stand out as regions combining high specialization with strong citation visibility. The analysis further suggests a weak relationship between regional specialization and citation impact, revealing multiple regional profiles, including highly specialized regions with limited citation visibility, highly visible regions with comparatively low specialization, and diversified scientific systems combining moderate specialization with strong citation impact. Fyn emerges as an extreme case of very high citation impact despite relatively low specialization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the geographic distribution of AI research across 781 EU NUTS-3 regions (2015-2024) using Clarivate InCites bibliometric data and the Citation Topics taxonomy. It distinguishes macro topic 4 (Electrical Engineering, Electronics & Computer Science) from meso topic 4.61 (Artificial Intelligence & Machine Learning), computes Relative Specialization Index (RSI) and Relative Citation Impact (RCI) for each region, and reports that absolute publication volume is concentrated in hubs such as Paris, Warszawa and Madrid while the highest RSI values occur in peripheral regions, notably Granada and Vilniaus apskritis. The study further finds only a weak relationship between regional specialization and citation impact and identifies several distinct regional profiles.

Significance. If the underlying classification and data handling are reliable, the work supplies a policy-relevant descriptive map at an unusually fine NUTS-3 scale, shifting emphasis from absolute output in core cities to relative specialization in the periphery. The identification of multiple profiles (high-specialization/low-visibility, high-visibility/low-specialization, and balanced) adds nuance beyond simple rankings. The scale of the analysis (781 regions) and the use of established RSI/RCI formulas are strengths, but the absence of validation or robustness checks limits the strength of the headline claim.

major comments (2)
  1. [Data and Methods] Data and Methods section: The central RSI rankings rest entirely on the accuracy of Clarivate meso-level topic 4.61 labels for AI/ML. No validation against alternative taxonomies (arXiv cs.AI, Scopus AI keywords), manual coding checks, or sensitivity tests for regional coverage or language bias is reported. Because NUTS-3 units often have low publication counts, even modest misclassification rates can produce large swings in the regional AI share that enters the RSI numerator while the EU-wide denominator remains fixed.
  2. [Results] Results and Discussion: The headline finding that peripheral regions (Granada, Vilniaus apskritis) exhibit the highest relative specialization is presented without confidence intervals, bootstrap standard errors, or robustness checks to alternative normalizations or exclusion of multi-affiliated papers. For small-count regions this omission leaves open whether the reported top ranks are stable or artifacts of classification or counting conventions.
minor comments (2)
  1. The manuscript would benefit from an explicit table or appendix listing the top 10-15 regions by RSI together with their raw AI and total publication counts, RCI values, and population or researcher-base denominators used in any normalization.
  2. [Methods] Clarify in the Methods whether fractional counting was applied to multi-affiliated publications and whether any threshold on minimum publications per region was imposed before computing RSI.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which have helped us strengthen the robustness and transparency of our analysis. We address each major comment point by point below, indicating where revisions have been made to the manuscript.

read point-by-point responses
  1. Referee: [Data and Methods] Data and Methods section: The central RSI rankings rest entirely on the accuracy of Clarivate meso-level topic 4.61 labels for AI/ML. No validation against alternative taxonomies (arXiv cs.AI, Scopus AI keywords), manual coding checks, or sensitivity tests for regional coverage or language bias is reported. Because NUTS-3 units often have low publication counts, even modest misclassification rates can produce large swings in the regional AI share that enters the RSI numerator while the EU-wide denominator remains fixed.

    Authors: We acknowledge the referee's concern regarding the reliance on the proprietary Clarivate Citation Topics taxonomy. In the revised manuscript, we have expanded the Methods section to include a dedicated discussion of the taxonomy's construction and known limitations, citing prior validation studies on similar bibliometric classifications. We performed and now report sensitivity tests by applying minimum publication thresholds (excluding regions with fewer than 5 or 10 AI publications) and confirm that the top-ranked peripheral regions (Granada, Vilniaus apskritis) remain stable. Full cross-validation with arXiv cs.AI or Scopus keyword-based definitions is not feasible due to licensing restrictions on the InCites data; this limitation is now explicitly stated in the Discussion section along with a call for future multi-source studies. revision: partial

  2. Referee: [Results] Results and Discussion: The headline finding that peripheral regions (Granada, Vilniaus apskritis) exhibit the highest relative specialization is presented without confidence intervals, bootstrap standard errors, or robustness checks to alternative normalizations or exclusion of multi-affiliated papers. For small-count regions this omission leaves open whether the reported top ranks are stable or artifacts of classification or counting conventions.

    Authors: We agree that uncertainty measures are essential for interpreting rankings in low-count regions. The revised Results section now includes bootstrap standard errors (1,000 resamples) for RSI values of the top 20 regions, demonstrating that Granada and Vilniaus apskritis retain their leading positions with non-overlapping intervals relative to core hubs. We have also added a robustness check excluding multi-affiliated papers, which yields consistent top rankings. These additions are presented in new tables and figures, directly addressing concerns about stability. revision: yes

standing simulated objections not resolved
  • Complete external validation of all Clarivate meso-topic 4.61 labels via manual coding or direct comparison to arXiv/Scopus taxonomies across the full dataset, due to proprietary data access restrictions.

Circularity Check

0 steps flagged

No circularity; standard indices on external bibliometric data

full rationale

The derivation uses the standard RSI formula (regional AI share divided by EU-wide AI share) and RCI on raw publication and citation counts from Clarivate InCites Citation Topics 4.61. No parameters are fitted to the target result, no self-citations form the load-bearing premise, and no step renames or re-derives the input data. The classification taxonomy is treated as an external input; any coverage concerns are external validity issues, not circularity. The chain from data to specialization rankings is direct and non-reductive.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard bibliometric assumptions and external data classification without introducing new free parameters or entities.

axioms (2)
  • domain assumption NUTS-3 regions form comparable units for measuring scientific specialization and impact
    The study analyzes 781 such regions as the base units for RSI and RCI calculations.
  • domain assumption The Citation Topics hierarchical system validly separates AI & Machine Learning from broader engineering topics
    Used to define the two thematic levels for specialization measurement.

pith-pipeline@v0.9.0 · 5523 in / 1344 out tokens · 45007 ms · 2026-05-15T21:22:53.092711+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Arranz et al

    D. Arranz et al. Trends in the use of AI in science: a bibliometric analysis. European Commission, R&D Paper Series, Working Paper 2023/04, March 2023

  2. [2]

    Bibliometric analysis of scientific production on artificial intelligence from 1960 to 2021,

    X. Liu et al., “Bibliometric analysis of scientific production on artificial intelligence from 1960 to 2021,” Technol. Forecast. Soc. Change, 2024

  3. [3]

    An extensive bibliometric analysis of artificial intelligence techniques from 2013 to 2023,

    H. Saeidnia et al., “An extensive bibliometric analysis of artificial intelligence techniques from 2013 to 2023,” J. Supercomput., 2025

  4. [4]

    Bibliometric Mining of Research Trends in Machine Learning,

    P. Waltman et al., “Bibliometric Mining of Research Trends in Machine Learning,” AI, vol. 5, no. 1, art. 12, 2024

  5. [5]

    A Decade of Artificial Intelligence Research in the European Union: A Bibliometric Analysis,

    A. Frankowska and B. Pawlik, “A Decade of Artificial Intelligence Research in the European Union: A Bibliometric Analysis,” in C. Biele et al. (eds.) Digital Interaction and Machine Intelligence. MIDI 2021. LNNS, vol 440, 2022, pp. 52-62

  6. [6]

    Reviewing research on regional development in the AI era — new there(s), new actors, and an old call “from cluster to process,

    J. Hautala, “Reviewing research on regional development in the AI era — new there(s), new actors, and an old call “from cluster to process,” Fennia vol. 202, no. 2, pp. 299- 312, 2024

  7. [7]

    Research into journalism in Spain: Sizeable, but neither international nor impactful,

    B. Salvador -Mata, S. Cortiñas -Rovira, and V. Herrero - Solana, "Research into journalism in Spain: Sizeable, but neither international nor impactful," Journalism, vol. 25, no. 1, pp. 1–22, 2024

  8. [8]

    A new bibliometric approach to assess the scientific specialization of regions,

    G. Abramo, C. A. D’Angelo, and F. Di Costa, “A new bibliometric approach to assess the scientific specialization of regions,” Research Evaluation, vol. 23, no. 2, pp. 183–194, 2014

  9. [9]

    Mainstream research in Latin America and the Caribbean,

    J. D. Frame, “Mainstream research in Latin America and the Caribbean,” Interciencia, vol. 2, no. 3, pp. 143 –148, 1977

  10. [10]

    Relative indicators and relational charts for comparative assessment of publication output and citation impact,

    A. Schubert and T. Braun, “Relative indicators and relational charts for comparative assessment of publication output and citation impact,” Scientometrics, vol. 9, pp. 281–291, 1986

  11. [11]

    Trade liberalisation and ‘revealed’ comparative advantage,

    B. Balassa, “Trade liberalisation and ‘revealed’ comparative advantage,” The Manchester School, vol. 33, pp. 99–123, 1965

  12. [12]

    Reflections on the activity index and related indicators,

    R. Rousseau and L.Y. Yang, “Reflections on the activity index and related indicators,” J. Informetrics, vol. 6, pp. 413–421, 2012

  13. [13]

    Balassa = revealed competitive advantage = activity,

    R. Rousseau, “Balassa = revealed competitive advantage = activity,” Scientometrics, vol. 121, pp. 1835 –1836, 2019

  14. [14]

    A note on using revealed comparative advantages in scientometrics studies,

    M. J. Mansourzadeh et al. “A note on using revealed comparative advantages in scientometrics studies,” Scientometrics, vol. 121, pp. 595–599, 2019

  15. [15]

    From Louvain to Leiden: Guaranteeing well -connected communities,

    V.A. Traag, L. Waltman, and N.J. van Eck, “From Louvain to Leiden: Guaranteeing well -connected communities,” Scientific Reports, vol. 9, art. 5233, 2019

  16. [16]

    Revealed comparative advantage and the alternatives as measures of international specialization,

    K. Laursen, “Revealed comparative advantage and the alternatives as measures of international specialization,” Eurasian Business Review, vol. 5, pp. 99–115, 2015

  17. [17]

    Computer science in Eastern Europe 1989 –2014: a bibliometric study,

    D. Fiala, and P. Willett, “Computer science in Eastern Europe 1989 –2014: a bibliometric study,” Aslib J of Inform Management, vol. 67, pp. 526–541, 2015

  18. [18]

    The creation and integration of AI in Europe,

    V. Buarque et al. "The creation and integration of AI in Europe," Cambridge J. of Regions, Econ and Soc, vol. 13, pp. 175-193, 2020

  19. [19]

    The emergence of artificial intelligence in European regions: the role of a local ICT base,

    J. Xiao and T. Boschma, “The emergence of artificial intelligence in European regions: the role of a local ICT base,” Annals of Regional Sci, vol. 71, pp. 747 –773, 2023

  20. [20]

    EU- funded investment in Artificial Intelligence and regional specialization,

    A.M. Santos, F. Molica, and C. Torrecilla -Salinas, “EU- funded investment in Artificial Intelligence and regional specialization,” Regional Sci Policy & Practice, vol. 17, 100190, 2025

  21. [21]

    Regional artificial intelligence and the geography of environmental technologies: does local AI knowledge help regional green -tech specialization?

    G. Cicerone, A. Faggian, S. Montresor, and F. Rentocchini, “Regional artificial intelligence and the geography of environmental technologies: does local AI knowledge help regional green -tech specialization?” Regional Studies, vol. 57, pp. 330–343, 2022

  22. [22]

    Patents of Industry 4.0 in Spain: topics and actors,

    V. Herrero -Solana and B. Jürgens, “Patents of Industry 4.0 in Spain: topics and actors,” Ibersid, vol. 19, no. 1, pp. 45–53, 2025

  23. [23]

    Innovation policy for a complex world,

    P.A. Balland, “Innovation policy for a complex world,” in Science, Research and Innovation Performance of the EU 2022. Brussels: European Commission, 2022. 6 Fig. 3. Relative Specialization Index (RSI) vs. Relative Citation Impact (RCI) in regions with at least 100 papers in AI