pith. sign in

arxiv: 2509.13387 · v1 · submitted 2025-09-16 · 💻 cs.CY · cs.AI

Uncovering AI Governance Themes in EU Policies using BERTopic and Thematic Analysis

Pith reviewed 2026-05-18 16:21 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI governanceEU AI ActBERTopicthematic analysistrustworthy AIpolicy evolutiontopic modelingEU policies
0
0 comments X

The pith

EU AI policies have evolved from broad ethical principles toward specific regulatory requirements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies qualitative thematic analysis together with the BERTopic topic-modeling technique to a set of EU documents on AI, including the High-Level Expert Group Ethics Guidelines and the 2024 AI Act. The goal is to identify the main themes that shape AI governance and to map how those themes have changed in documents issued after 2018. A reader would care because the EU is a leading actor in setting AI standards worldwide, and seeing the internal shifts in emphasis, scope, and normativity helps make sense of the larger, still-fragmented policy landscape. The combined methods enlarge the sample size and add quantitative structure to what would otherwise be a purely manual reading of the texts.

Core claim

By running thematic analysis on key EU documents and then applying the BERTopic model to a larger collection of post-2018 EU AI policy texts, the authors extract recurring governance themes and show how the EU's overall approach has moved from high-level ethical guidance to more concrete regulatory instruments such as the AI Act.

What carries the argument

BERTopic topic modeling combined with qualitative thematic analysis performed on selected EU AI policy documents.

If this is right

  • The two methods together reveal differences in scope, emphasis, degrees of normativity, and priorities across the examined documents.
  • The larger post-2018 document sample makes visible how themes have shifted over time.
  • The resulting theme inventory offers a structured view of what the EU currently treats as central to trustworthy and safe AI.
  • Alignment or divergence between the HLEG guidelines and the AI Act can be read directly from the extracted themes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar mixed-method pipelines could be applied to AI policy texts from other jurisdictions to enable direct comparison of governance priorities.
  • If the detected shift toward regulation continues, later revisions of the AI Act may tighten requirements in areas the current themes flag as underdeveloped.
  • The theme list could serve as a baseline for tracking whether future EU documents close identified gaps in coverage of risk, accountability, or stakeholder participation.

Load-bearing premise

The chosen set of EU documents and the chosen BERTopic parameters are assumed to represent the full range of AI governance themes without important selection bias or modeling distortions.

What would settle it

Repeating the analysis on a materially different collection of EU documents or with different BERTopic parameter settings and obtaining substantially different themes would undermine the reported evolution and theme list.

Figures

Figures reproduced from arXiv: 2509.13387 by Aphra Kerr, Arjumand Younus, Dave Lewis, Delaram Golpayegani, Marta Lasek-Markey.

Figure 1
Figure 1. Figure 1: The overall methodology 3. Methodology As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Word Cloud of the identified AI governance themes from EU AI policies 4.2. Evolution of Themes in EU AI Policies Our analysis shows a shift in the AI policy discourse in the EU following the adoption of the EU AI Act. As shown in Figure 3a, the pre-AI Act thematic landscape predominately focused on ethical AI, data governance, and risks particularly those that impact vulner￾able groups. This emphasis align… view at source ↗
Figure 3
Figure 3. Figure 3: Word clouds demonstrating evolution of AI governance themes in the EU was emerging as a popular yet still unfamiliar topic. Themes from the post-AI Act era (Figure 3b) illustrate a clear shift from aspirational ethical AI toward operationalised le￾gal AI, with particular emphasis on regulatory enforcement. In addition to concrete com￾pliance tasks such as documentation and risk assessment, the AI Act and i… view at source ↗
Figure 4
Figure 4. Figure 4: Stream graphs demonstrating evolution of most and least prevalent topics over time (3) it concentrates specifically on AI governance and success criteria for the responsible development and deployment of AI. In future work, we plan to expand our analysis and investigate AI governance themes that emerged in the guidelines set forth by supranational bodies, such as UNESCO, The Organisation for Economic Co-op… view at source ↗
read the original abstract

The upsurge of policies and guidelines that aim to ensure Artificial Intelligence (AI) systems are safe and trustworthy has led to a fragmented landscape of AI governance. The European Union (EU) is a key actor in the development of such policies and guidelines. Its High-Level Expert Group (HLEG) issued an influential set of guidelines for trustworthy AI, followed in 2024 by the adoption of the EU AI Act. While the EU policies and guidelines are expected to be aligned, they may differ in their scope, areas of emphasis, degrees of normativity, and priorities in relation to AI. To gain a broad understanding of AI governance from the EU perspective, we leverage qualitative thematic analysis approaches to uncover prevalent themes in key EU documents, including the AI Act and the HLEG Ethics Guidelines. We further employ quantitative topic modelling approaches, specifically through the use of the BERTopic model, to enhance the results and increase the document sample to include EU AI policy documents published post-2018. We present a novel perspective on EU policies, tracking the evolution of its approach to addressing AI governance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript applies a mixed-methods approach combining qualitative thematic analysis on key EU documents (HLEG Ethics Guidelines and the 2024 AI Act) with BERTopic topic modeling on an expanded corpus of post-2018 EU AI policy documents. It aims to identify prevalent AI governance themes and track the evolution of the EU's policy approach, presenting this as a novel perspective on alignment, scope, and priorities across documents.

Significance. If the methodological pipeline proves robust, the work offers a scalable quantitative lens to complement existing qualitative studies of EU AI governance, potentially illuminating shifts from ethics-focused guidelines to regulatory instruments. The use of off-the-shelf tools on real policy texts is a strength, but the absence of validation steps limits claims about genuine theme evolution versus modeling artifacts.

major comments (2)
  1. [Methods] Methods section: Explicit inclusion/exclusion criteria, search strategy, database sources, and total document count for the post-2018 EU AI policy corpus are not reported. This is load-bearing for the evolution-tracking claim, as unstated selection rules risk bias in the sample used to contrast HLEG and AI Act themes.
  2. [Methods] BERTopic subsection of Methods: No values or justification are given for key hyperparameters (UMAP n_neighbors, HDBSCAN min_cluster_size, embedding model choice, or target topic count), nor are coherence/diversity metrics or sensitivity analyses provided. Topic models are known to produce unstable themes under modest parameter changes; without these checks the extracted governance themes cannot be confidently distinguished from pipeline artifacts.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'qualitative thematic analysis approaches' is vague; specifying the exact framework (e.g., Braun & Clarke reflexive thematic analysis) would improve clarity.
  2. [Results] Figure captions (if present in results): Ensure all topic-model visualizations include axis labels, legend details, and the exact hyperparameter settings used to generate them.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas for improving methodological transparency, which we will address in the revision to strengthen the manuscript's rigor and reproducibility.

read point-by-point responses
  1. Referee: [Methods] Methods section: Explicit inclusion/exclusion criteria, search strategy, database sources, and total document count for the post-2018 EU AI policy corpus are not reported. This is load-bearing for the evolution-tracking claim, as unstated selection rules risk bias in the sample used to contrast HLEG and AI Act themes.

    Authors: We agree that these details are essential for reproducibility and for supporting claims about theme evolution across the corpus. The original submission omitted a full description of the document collection process. In the revised manuscript, we will add a new subsection under Methods that explicitly states: the search strategy (targeted queries on EUR-Lex and the European Commission website using terms such as “artificial intelligence governance” and “AI policy” limited to post-2018), inclusion criteria (official EU policy documents, guidelines, communications, and legislative texts), exclusion criteria (non-policy documents, duplicates, non-English texts, and pre-2018 materials), database sources, and the final corpus size (45 documents). This addition will allow readers to evaluate potential selection bias and will directly support the validity of the HLEG–AI Act comparison. revision: yes

  2. Referee: [Methods] BERTopic subsection of Methods: No values or justification are given for key hyperparameters (UMAP n_neighbors, HDBSCAN min_cluster_size, embedding model choice, or target topic count), nor are coherence/diversity metrics or sensitivity analyses provided. Topic models are known to produce unstable themes under modest parameter changes; without these checks the extracted governance themes cannot be confidently distinguished from pipeline artifacts.

    Authors: We acknowledge that the absence of hyperparameter values, justifications, and validation metrics limits confidence in the extracted themes. In the revised version we will expand the BERTopic subsection to report the exact settings used (UMAP n_neighbors = 15, HDBSCAN min_cluster_size = 5, embedding model = sentence-transformers/all-MiniLM-L6-v2, target topic count = 8 determined via iterative runs and elbow inspection) together with brief justifications tied to the characteristics of policy text. We will also add coherence (C_v) and diversity scores for the final model and a short sensitivity analysis showing that the core governance themes remain stable under modest parameter perturbations. These changes will help demonstrate that the reported themes are robust rather than modeling artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: results derive from external documents via standard tools

full rationale

The paper applies BERTopic topic modeling and qualitative thematic analysis to a corpus of post-2018 EU policy documents (including the AI Act and HLEG guidelines). The extracted themes and evolution narrative are direct outputs of running these off-the-shelf methods on the selected external texts. No equations, fitted parameters, or self-citations are shown to define the target quantities in terms of themselves, and the central claim does not reduce to a renaming or construction from the modeling choices. The pipeline is self-contained against the input documents and standard libraries.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of topic modeling for policy text and on the representativeness of the chosen document set; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption BERTopic can reliably surface latent governance themes from policy documents
    Invoked when the quantitative results are said to enhance the qualitative findings.

pith-pipeline@v0.9.0 · 5736 in / 1024 out tokens · 57752 ms · 2026-05-18T16:21:39.863773+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Available from: http://data.europa.eu/eli/reg/2024/1689/oj

    Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence ...

  2. [2]

    Ethics Guidelines for Trustworthy AI

    European Commission and Directorate-General for Communications Networks, Content and Technol- ogy. Ethics Guidelines for Trustworthy AI. Publications Office of the European Union; 2019

  3. [3]

    Commission Guidelines on Prohibited Artificial Intelligence Practices Estab- lished by Regulation (EU) 2024/1689 (AI Act); 2025

    European Commission. Commission Guidelines on Prohibited Artificial Intelligence Practices Estab- lished by Regulation (EU) 2024/1689 (AI Act); 2025

  4. [4]

    In: Teo T, editor

    Clarke V , Braun V . In: Teo T, editor. Thematic Analysis. New York, NY: Springer New York; 2014. p. 1947-52. Available from:https://doi.org/10.1007/978-1-4614-5583-7_311

  5. [5]

    What do governments plan in the field of artificial intelligence? Analysing national AI strategies using NLP

    Papadopoulos T, Charalabidis Y . What do governments plan in the field of artificial intelligence? Analysing national AI strategies using NLP. Proceedings of the 13th International Conference on The- ory and Practice of Electronic Governance. 2020:100–111. Available from:https://doi.org/10. 1145/3428502.3428514

  6. [6]

    LDA-based topic mining research on China’s government data governance policy

    Yang Q. LDA-based topic mining research on China’s government data governance policy. Social Security and Administration Management. 2022;3(2)

  7. [7]

    Artificial intelligence policy frameworks in China, the European Union and the United States: An analysis based on structure topic model

    Wang S, Zhang Y , Xiao Y , Liang Z. Artificial intelligence policy frameworks in China, the European Union and the United States: An analysis based on structure topic model. Technological Forecasting and Social Change. 2025;212:123971

  8. [8]

    Ethics and Diversity in Artificial Intelligence Policies, Strategies and Initia- tives

    Roche C, Wall P, Lewis D. Ethics and Diversity in Artificial Intelligence Policies, Strategies and Initia- tives. AI and Ethics. 2023;3(4):1095-115

  9. [9]

    AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection

    Schiff D, Borenstein J, Biddle J, Laas K. AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection. IEEE Transactions on Technology and Society. 2021;2(1):31-42

  10. [10]

    Topical Review of Artificial Intelligence National Policies: A Mixed Method Anal- ysis

    Saheb T, Saheb T. Topical Review of Artificial Intelligence National Policies: A Mixed Method Anal- ysis. Technology in Society. 2023;74:102316. Available from:https://www.sciencedirect.com/ science/article/pii/S0160791X23001215

  11. [11]

    Mapping ethical artificial intelligence policy landscape: a mixed method analysis

    Saheb T, Saheb T. Mapping ethical artificial intelligence policy landscape: a mixed method analysis. Science and engineering ethics. 2024;30(2):9

  12. [12]

    Post-GPT Policy: Risk and Regu- lation in EU AI Discourse

    Kajava K, ¨Ohman E, Takagi NM, Nakajima-Wickham E, Vitiugin F. Post-GPT Policy: Risk and Regu- lation in EU AI Discourse. Proceedings of the International AAAI Conference on Web and Social Me- dia. 2025;19(1):994-1006. Available from:https://ojs.aaai.org/index.php/ICWSM/article/ view/35856

  13. [13]

    When Politicians Talk AI: Issue-Frames in Parliamentary Debates Before and After ChatGPT

    Suter V , Ma C, P ¨ohlmann G, Meckel M. When Politicians Talk AI: Issue-Frames in Parliamentary Debates Before and After ChatGPT. Policy & Internet. 2025;17(3)

  14. [14]

    Policy and investment recommendations for trustworthy AI

    Commission E, Directorate-General for Communications Networks C, Technology, on Artificial Intel- ligence HLEG. Policy and investment recommendations for trustworthy AI. Publications Office of the European Union; 2019

  15. [15]

    The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self assessment

    European Commission and Directorate-General for Communications Networks, Content and Technol- ogy. The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self assessment. Publica- tions Office of the European Union; 2020

  16. [16]

    Sectoral Consider- ations on the Policy and Investment Recommendations for Trustworthy Artificial Intelligence

    Commission E, Directorate-General for Communications Networks C, Technology. Sectoral Consider- ations on the Policy and Investment Recommendations for Trustworthy Artificial Intelligence. Publica- tions Office; 2020

  17. [17]

    Commission – Guidelines on the scope of the obligations for general-purpose AI models established by Regulation (EU) 2024/1689 (AI Act); 2025

    European Commission. Commission – Guidelines on the scope of the obligations for general-purpose AI models established by Regulation (EU) 2024/1689 (AI Act); 2025

  18. [18]

    General-Purpose AI Code of Practice; 2025.https://digital-strategy.ec.europa.eu/en/ policies/contents-code-gpai

  19. [19]

    Thematic analysis

    Squires V . Thematic analysis. In: Varieties of qualitative research methods: Selected contextual per- spectives. Springer; 2023. p. 463-8

  20. [20]

    Dhakal K. NVivo. Journal of the Medical Library Association: JMLA. 2022;110(2):270

  21. [21]

    The Evolution of Topic Modeling

    Churchill R, Singh L. The Evolution of Topic Modeling. ACM Comput Surv. 2022 Nov;54(10s). Available from:https://doi.org/10.1145/3507900

  22. [22]

    Latent dirichlet allocation

    Blei DM, Ng AY , Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993-1022. September 2025

  23. [23]

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure

    Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure; 2022. Avail- able from:https://arxiv.org/abs/2203.05794

  24. [24]

    Recommendation on the Ethics of Artificial Intelligence; 2022

    UNESCO. Recommendation on the Ethics of Artificial Intelligence; 2022

  25. [25]

    TopicGPT: A Prompt-based Topic Modeling Framework

    Pham C, Hoyle A, Sun S, Resnik P, Iyyer M. TopicGPT: A Prompt-based Topic Modeling Framework. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies (V olume 1: Long Papers). 2024:2956-84