Uncovering AI Governance Themes in EU Policies using BERTopic and Thematic Analysis
Pith reviewed 2026-05-18 16:21 UTC · model grok-4.3
The pith
EU AI policies have evolved from broad ethical principles toward specific regulatory requirements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By running thematic analysis on key EU documents and then applying the BERTopic model to a larger collection of post-2018 EU AI policy texts, the authors extract recurring governance themes and show how the EU's overall approach has moved from high-level ethical guidance to more concrete regulatory instruments such as the AI Act.
What carries the argument
BERTopic topic modeling combined with qualitative thematic analysis performed on selected EU AI policy documents.
If this is right
- The two methods together reveal differences in scope, emphasis, degrees of normativity, and priorities across the examined documents.
- The larger post-2018 document sample makes visible how themes have shifted over time.
- The resulting theme inventory offers a structured view of what the EU currently treats as central to trustworthy and safe AI.
- Alignment or divergence between the HLEG guidelines and the AI Act can be read directly from the extracted themes.
Where Pith is reading between the lines
- Similar mixed-method pipelines could be applied to AI policy texts from other jurisdictions to enable direct comparison of governance priorities.
- If the detected shift toward regulation continues, later revisions of the AI Act may tighten requirements in areas the current themes flag as underdeveloped.
- The theme list could serve as a baseline for tracking whether future EU documents close identified gaps in coverage of risk, accountability, or stakeholder participation.
Load-bearing premise
The chosen set of EU documents and the chosen BERTopic parameters are assumed to represent the full range of AI governance themes without important selection bias or modeling distortions.
What would settle it
Repeating the analysis on a materially different collection of EU documents or with different BERTopic parameter settings and obtaining substantially different themes would undermine the reported evolution and theme list.
Figures
read the original abstract
The upsurge of policies and guidelines that aim to ensure Artificial Intelligence (AI) systems are safe and trustworthy has led to a fragmented landscape of AI governance. The European Union (EU) is a key actor in the development of such policies and guidelines. Its High-Level Expert Group (HLEG) issued an influential set of guidelines for trustworthy AI, followed in 2024 by the adoption of the EU AI Act. While the EU policies and guidelines are expected to be aligned, they may differ in their scope, areas of emphasis, degrees of normativity, and priorities in relation to AI. To gain a broad understanding of AI governance from the EU perspective, we leverage qualitative thematic analysis approaches to uncover prevalent themes in key EU documents, including the AI Act and the HLEG Ethics Guidelines. We further employ quantitative topic modelling approaches, specifically through the use of the BERTopic model, to enhance the results and increase the document sample to include EU AI policy documents published post-2018. We present a novel perspective on EU policies, tracking the evolution of its approach to addressing AI governance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies a mixed-methods approach combining qualitative thematic analysis on key EU documents (HLEG Ethics Guidelines and the 2024 AI Act) with BERTopic topic modeling on an expanded corpus of post-2018 EU AI policy documents. It aims to identify prevalent AI governance themes and track the evolution of the EU's policy approach, presenting this as a novel perspective on alignment, scope, and priorities across documents.
Significance. If the methodological pipeline proves robust, the work offers a scalable quantitative lens to complement existing qualitative studies of EU AI governance, potentially illuminating shifts from ethics-focused guidelines to regulatory instruments. The use of off-the-shelf tools on real policy texts is a strength, but the absence of validation steps limits claims about genuine theme evolution versus modeling artifacts.
major comments (2)
- [Methods] Methods section: Explicit inclusion/exclusion criteria, search strategy, database sources, and total document count for the post-2018 EU AI policy corpus are not reported. This is load-bearing for the evolution-tracking claim, as unstated selection rules risk bias in the sample used to contrast HLEG and AI Act themes.
- [Methods] BERTopic subsection of Methods: No values or justification are given for key hyperparameters (UMAP n_neighbors, HDBSCAN min_cluster_size, embedding model choice, or target topic count), nor are coherence/diversity metrics or sensitivity analyses provided. Topic models are known to produce unstable themes under modest parameter changes; without these checks the extracted governance themes cannot be confidently distinguished from pipeline artifacts.
minor comments (2)
- [Abstract] Abstract: The phrase 'qualitative thematic analysis approaches' is vague; specifying the exact framework (e.g., Braun & Clarke reflexive thematic analysis) would improve clarity.
- [Results] Figure captions (if present in results): Ensure all topic-model visualizations include axis labels, legend details, and the exact hyperparameter settings used to generate them.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments highlight important areas for improving methodological transparency, which we will address in the revision to strengthen the manuscript's rigor and reproducibility.
read point-by-point responses
-
Referee: [Methods] Methods section: Explicit inclusion/exclusion criteria, search strategy, database sources, and total document count for the post-2018 EU AI policy corpus are not reported. This is load-bearing for the evolution-tracking claim, as unstated selection rules risk bias in the sample used to contrast HLEG and AI Act themes.
Authors: We agree that these details are essential for reproducibility and for supporting claims about theme evolution across the corpus. The original submission omitted a full description of the document collection process. In the revised manuscript, we will add a new subsection under Methods that explicitly states: the search strategy (targeted queries on EUR-Lex and the European Commission website using terms such as “artificial intelligence governance” and “AI policy” limited to post-2018), inclusion criteria (official EU policy documents, guidelines, communications, and legislative texts), exclusion criteria (non-policy documents, duplicates, non-English texts, and pre-2018 materials), database sources, and the final corpus size (45 documents). This addition will allow readers to evaluate potential selection bias and will directly support the validity of the HLEG–AI Act comparison. revision: yes
-
Referee: [Methods] BERTopic subsection of Methods: No values or justification are given for key hyperparameters (UMAP n_neighbors, HDBSCAN min_cluster_size, embedding model choice, or target topic count), nor are coherence/diversity metrics or sensitivity analyses provided. Topic models are known to produce unstable themes under modest parameter changes; without these checks the extracted governance themes cannot be confidently distinguished from pipeline artifacts.
Authors: We acknowledge that the absence of hyperparameter values, justifications, and validation metrics limits confidence in the extracted themes. In the revised version we will expand the BERTopic subsection to report the exact settings used (UMAP n_neighbors = 15, HDBSCAN min_cluster_size = 5, embedding model = sentence-transformers/all-MiniLM-L6-v2, target topic count = 8 determined via iterative runs and elbow inspection) together with brief justifications tied to the characteristics of policy text. We will also add coherence (C_v) and diversity scores for the final model and a short sensitivity analysis showing that the core governance themes remain stable under modest parameter perturbations. These changes will help demonstrate that the reported themes are robust rather than modeling artifacts. revision: yes
Circularity Check
No circularity: results derive from external documents via standard tools
full rationale
The paper applies BERTopic topic modeling and qualitative thematic analysis to a corpus of post-2018 EU policy documents (including the AI Act and HLEG guidelines). The extracted themes and evolution narrative are direct outputs of running these off-the-shelf methods on the selected external texts. No equations, fitted parameters, or self-citations are shown to define the target quantities in terms of themselves, and the central claim does not reduce to a renaming or construction from the modeling choices. The pipeline is self-contained against the input documents and standard libraries.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption BERTopic can reliably surface latent governance themes from policy documents
Reference graph
Works this paper leans on
-
[1]
Available from: http://data.europa.eu/eli/reg/2024/1689/oj
Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence ...
work page 2024
-
[2]
Ethics Guidelines for Trustworthy AI
European Commission and Directorate-General for Communications Networks, Content and Technol- ogy. Ethics Guidelines for Trustworthy AI. Publications Office of the European Union; 2019
work page 2019
-
[3]
European Commission. Commission Guidelines on Prohibited Artificial Intelligence Practices Estab- lished by Regulation (EU) 2024/1689 (AI Act); 2025
work page 2024
-
[4]
Clarke V , Braun V . In: Teo T, editor. Thematic Analysis. New York, NY: Springer New York; 2014. p. 1947-52. Available from:https://doi.org/10.1007/978-1-4614-5583-7_311
-
[5]
Papadopoulos T, Charalabidis Y . What do governments plan in the field of artificial intelligence? Analysing national AI strategies using NLP. Proceedings of the 13th International Conference on The- ory and Practice of Electronic Governance. 2020:100–111. Available from:https://doi.org/10. 1145/3428502.3428514
-
[6]
LDA-based topic mining research on China’s government data governance policy
Yang Q. LDA-based topic mining research on China’s government data governance policy. Social Security and Administration Management. 2022;3(2)
work page 2022
-
[7]
Wang S, Zhang Y , Xiao Y , Liang Z. Artificial intelligence policy frameworks in China, the European Union and the United States: An analysis based on structure topic model. Technological Forecasting and Social Change. 2025;212:123971
work page 2025
-
[8]
Ethics and Diversity in Artificial Intelligence Policies, Strategies and Initia- tives
Roche C, Wall P, Lewis D. Ethics and Diversity in Artificial Intelligence Policies, Strategies and Initia- tives. AI and Ethics. 2023;3(4):1095-115
work page 2023
-
[9]
AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection
Schiff D, Borenstein J, Biddle J, Laas K. AI Ethics in the Public, Private, and NGO Sectors: A Review of a Global Document Collection. IEEE Transactions on Technology and Society. 2021;2(1):31-42
work page 2021
-
[10]
Topical Review of Artificial Intelligence National Policies: A Mixed Method Anal- ysis
Saheb T, Saheb T. Topical Review of Artificial Intelligence National Policies: A Mixed Method Anal- ysis. Technology in Society. 2023;74:102316. Available from:https://www.sciencedirect.com/ science/article/pii/S0160791X23001215
work page 2023
-
[11]
Mapping ethical artificial intelligence policy landscape: a mixed method analysis
Saheb T, Saheb T. Mapping ethical artificial intelligence policy landscape: a mixed method analysis. Science and engineering ethics. 2024;30(2):9
work page 2024
-
[12]
Post-GPT Policy: Risk and Regu- lation in EU AI Discourse
Kajava K, ¨Ohman E, Takagi NM, Nakajima-Wickham E, Vitiugin F. Post-GPT Policy: Risk and Regu- lation in EU AI Discourse. Proceedings of the International AAAI Conference on Web and Social Me- dia. 2025;19(1):994-1006. Available from:https://ojs.aaai.org/index.php/ICWSM/article/ view/35856
work page 2025
-
[13]
When Politicians Talk AI: Issue-Frames in Parliamentary Debates Before and After ChatGPT
Suter V , Ma C, P ¨ohlmann G, Meckel M. When Politicians Talk AI: Issue-Frames in Parliamentary Debates Before and After ChatGPT. Policy & Internet. 2025;17(3)
work page 2025
-
[14]
Policy and investment recommendations for trustworthy AI
Commission E, Directorate-General for Communications Networks C, Technology, on Artificial Intel- ligence HLEG. Policy and investment recommendations for trustworthy AI. Publications Office of the European Union; 2019
work page 2019
-
[15]
The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self assessment
European Commission and Directorate-General for Communications Networks, Content and Technol- ogy. The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self assessment. Publica- tions Office of the European Union; 2020
work page 2020
-
[16]
Commission E, Directorate-General for Communications Networks C, Technology. Sectoral Consider- ations on the Policy and Investment Recommendations for Trustworthy Artificial Intelligence. Publica- tions Office; 2020
work page 2020
-
[17]
European Commission. Commission – Guidelines on the scope of the obligations for general-purpose AI models established by Regulation (EU) 2024/1689 (AI Act); 2025
work page 2024
-
[18]
General-Purpose AI Code of Practice; 2025.https://digital-strategy.ec.europa.eu/en/ policies/contents-code-gpai
work page 2025
-
[19]
Squires V . Thematic analysis. In: Varieties of qualitative research methods: Selected contextual per- spectives. Springer; 2023. p. 463-8
work page 2023
-
[20]
Dhakal K. NVivo. Journal of the Medical Library Association: JMLA. 2022;110(2):270
work page 2022
-
[21]
The Evolution of Topic Modeling
Churchill R, Singh L. The Evolution of Topic Modeling. ACM Comput Surv. 2022 Nov;54(10s). Available from:https://doi.org/10.1145/3507900
-
[22]
Blei DM, Ng AY , Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993-1022. September 2025
work page 2003
-
[23]
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure; 2022. Avail- able from:https://arxiv.org/abs/2203.05794
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[24]
Recommendation on the Ethics of Artificial Intelligence; 2022
UNESCO. Recommendation on the Ethics of Artificial Intelligence; 2022
work page 2022
-
[25]
TopicGPT: A Prompt-based Topic Modeling Framework
Pham C, Hoyle A, Sun S, Resnik P, Iyyer M. TopicGPT: A Prompt-based Topic Modeling Framework. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies (V olume 1: Long Papers). 2024:2956-84
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.