Studying Lobby Influence in the European Parliament
Pith reviewed 2026-05-24 06:38 UTC · model grok-4.3
The pith
NLP methods discover links between European Parliament members and lobbies by matching speeches to position papers on semantic similarity and entailment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By comparing lobbies' position papers and MEPs' speeches on the basis of semantic similarity and entailment, the method uncovers interpretable links between individual MEPs and lobbies. These links are validated indirectly against a curated retweet dataset and publicly disclosed MEP meetings, with the best performing method achieving an AUC of 0.77 and outperforming baselines. Aggregate analysis of links between lobby groups and MEP political groups matches expectations from the groups' ideologies.
What carries the argument
Text comparison via semantic similarity and entailment measures applied to lobbies' position papers and MEPs' speeches to surface influence links.
If this is right
- The method supplies a scalable way to surface influence patterns across large collections of parliamentary and lobby texts.
- Aggregate link patterns can reveal how lobby groups align with different political ideologies in the parliament.
- The approach offers one route toward greater transparency in how interest groups shape legislation.
Where Pith is reading between the lines
- The same text-matching pipeline could be applied to other national or regional legislatures that publish speeches and lobby documents.
- If the links prove stable over time, they could serve as input features for models that predict voting behavior on specific bills.
- Extending the comparison to include amendments or voting records might tighten the causal connection between lobby texts and MEP actions.
Load-bearing premise
Discovered links can be validated by matching them against retweet connections and disclosed MEP meetings when no direct ground-truth dataset exists.
What would settle it
A finding that the NLP-derived links show no better-than-chance agreement with the retweet dataset or the disclosed meeting records would undermine the validation.
Figures
read the original abstract
We present a method based on natural language processing (NLP), for studying the influence of interest groups (lobbies) in the law-making process in the European Parliament (EP). We collect and analyze novel datasets of lobbies' position papers and speeches made by members of the EP (MEPs). By comparing these texts on the basis of semantic similarity and entailment, we are able to discover interpretable links between MEPs and lobbies. In the absence of a ground-truth dataset of such links, we perform an indirect validation by comparing the discovered links with a dataset, which we curate, of retweet links between MEPs and lobbies, and with the publicly disclosed meetings of MEPs. Our best method achieves an AUC score of 0.77 and performs significantly better than several baselines. Moreover, an aggregate analysis of the discovered links, between groups of related lobbies and political groups of MEPs, correspond to the expectations from the ideology of the groups (e.g., center-left groups are associated with social causes). We believe that this work, which encompasses the methodology, datasets, and results, is a step towards enhancing the transparency of the intricate decision-making processes within democratic institutions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an NLP method using semantic similarity and entailment between lobby position papers and MEP speeches to discover influence links in the European Parliament. In the absence of ground truth, it performs indirect validation against curated retweet links and disclosed MEP meetings, reporting an AUC of 0.77 that outperforms baselines, and shows that aggregate discovered links between lobby groups and MEP political groups align with ideological expectations (e.g., center-left groups linked to social causes).
Significance. If the method can isolate influence from topical or ideological alignment, the approach and datasets could contribute to transparency tools for EU legislative processes. The indirect validation strategy and reproducible comparison to baselines are positive elements, but the central claim that textual overlap indicates lobbying influence rather than shared policy focus remains vulnerable to alternative explanations.
major comments (2)
- [Validation and aggregate analysis sections (around the AUC results and ideology correspondence)] The indirect validation (retweet and meeting proxies) does not include controls for political group membership or topic, leaving open the possibility that the AUC 0.77 reflects ideological/topic alignment rather than influence. This is load-bearing because the paper's own aggregate analysis shows links matching ideological expectations, which is also predicted by the confound.
- [Method and results sections describing the similarity/entailment model] The claim that discovered links are interpretable as influence rests on the assumption that semantic similarity/entailment captures lobbying effects beyond independent alignment on issues; no ablation or matching procedure is described to test this (e.g., comparing within vs. across political groups).
minor comments (2)
- [Method section] Clarify the exact entailment model and similarity threshold choices in the main text rather than deferring entirely to supplementary material.
- [Abstract] The abstract states the aggregate analysis 'correspond to the expectations from the ideology of the groups'; rephrase for precision to avoid implying this is confirmatory rather than consistent with multiple interpretations.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The concerns about potential confounds from ideological or topical alignment in both the validation and aggregate analysis are substantive. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: The indirect validation (retweet and meeting proxies) does not include controls for political group membership or topic, leaving open the possibility that the AUC 0.77 reflects ideological/topic alignment rather than influence. This is load-bearing because the paper's own aggregate analysis shows links matching ideological expectations, which is also predicted by the confound.
Authors: We agree this is a valid limitation of the current validation strategy. Retweets and meetings were selected as they reflect observable interactions, yet without stratification the AUC may partly capture alignment. In revision we will recompute AUC scores within versus across political groups and topics, and expand the limitations discussion to note that the observed ideological correspondence in aggregate links is consistent with both influence and alignment explanations. revision: yes
-
Referee: The claim that discovered links are interpretable as influence rests on the assumption that semantic similarity/entailment captures lobbying effects beyond independent alignment on issues; no ablation or matching procedure is described to test this (e.g., comparing within vs. across political groups).
Authors: The manuscript currently presents no such ablation. To test whether similarity and entailment scores capture effects beyond alignment, we will add a within-group versus across-group performance comparison in the results section. This analysis, together with an explicit statement of the underlying assumption, will be included in the revised manuscript. revision: yes
Circularity Check
No significant circularity; validation uses independent external datasets
full rationale
The paper's core derivation computes semantic similarity and entailment scores between independently collected lobby position papers and MEP speeches to produce candidate links. Validation is performed against two separately curated external datasets (retweet links and disclosed MEP meetings) that are not generated by or fitted within the similarity model. No equations, parameters, or self-citations are shown to reduce the discovered links or the AUC metric back to the model's own outputs by construction. The reported ideological alignment of aggregate links is an observational result rather than an input used to derive the links. The chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Semantic similarity and textual entailment between lobby position papers and MEP speeches indicate influence or alignment links
- domain assumption Retweet links and publicly disclosed MEP-lobby meetings constitute valid indirect proxies for validating influence links
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Bedn\'arikov\'a, Z.; and J\'ilkov\'a, J. 2012. Why is the agricultural lobby in the European Union member states so effective? E+M Ekonomie a Management, (2): 26
work page 2012
-
[4]
Bouwen, P. 2003. A Theoretical and Empirical Study of Corporate Lobbying in the European Parliament . European integration online papers (EIoP), 7(11)
work page 2003
-
[5]
European Parliament . 2019. EP Approves More Transparency and Efficiency in its Internal Rules. Accessed: 2023-08-06
work page 2019
-
[6]
European Union . 2011. EU Transparency Register . Accessed: 2023-06-20
work page 2011
-
[7]
European Union . 2021. European Data Portal . Accessed: 2021-02-14
work page 2021
-
[8]
HTCondor . 2023. HTCondor Overview. Accessed: 2023-08-06
work page 2023
-
[9]
Ibenskas, R.; and Bunea, A. 2021. Legislators, organizations and ties: U nderstanding interest group recognition in the European Parliament . European Journal of Political Research, 60(3): 560--582
work page 2021
-
[10]
Integrity Watch . 2023. Integrity Watch Data Hub . Accessed: 2023-06-20
work page 2023
-
[11]
Jolly, S.; Bakker, R.; Hooghe, L.; Marks, G.; Polk, J.; Rovny, J.; Steenbergen, M.; and Vachudova, M. A. 2022. Chapel Hill Expert Survey trend file, 1999–2019. Electoral Studies, 75: 102420
work page 2022
-
[12]
Joulin, A.; Grave, E.; Bojanowski, P.; Douze, M.; J \'e gou, H.; and Mikolov, T. 2016 a . fastText.zip : C ompressing text classification models. arXiv preprint arXiv:1612.03651
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[13]
Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2016 b . fastText : L anguage Identification. https://fasttext.cc/docs/en/language-identification.html
work page 2016
-
[14]
Kristof, V.; Suresh, A.; Grossglauser, M.; and Thiran, P. 2021. War of Words II : E nriched Models of Law-Making Processes. In Proceedings of the Web Conference 2021, WWW '21, 2014–2024. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383127
work page 2021
-
[15]
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. 2020. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871--7880. Online: Association for ...
work page 2020
-
[16]
Metaxas, P.; Mustafaraj, E.; Wong, K.; Zeng, L.; O'Keefe, M.; and Finn, S. 2015. What do retweets indicate? R esults from user survey and meta-review of research. In Proceedings of the International AAAI Conference on Web and Social Media, volume 9, 658--661
work page 2015
-
[17]
Obama White House . 2018. Open Government Initiative . Accessed: 2020-10-19
work page 2018
-
[18]
OpenAI . 2023. Chat Completions API . Accessed: 2023-06-20
work page 2023
-
[19]
Parltrack. 2023. Parltrack. https://parltrack.org/
work page 2023
-
[20]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140): 1--67
work page 2020
-
[21]
Rasmussen, M. K. 2015. The Battle for Influence: T he Politics of Business Lobbying in the European Parliament . JCMS: Journal of Common Market Studies, 53(2): 365--382
work page 2015
-
[22]
Reimers, N. 2022. EasyNMT . https://github.com/UKPLab/EasyNMT
work page 2022
-
[23]
Reimers, N.; and Gurevych, I. 2019. Sentence- BERT : S entence Embeddings using Siamese BERT -Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982--3992
work page 2019
-
[24]
Swiss Government . 2021. Swiss Open Government Data . Accessed: 2021-02-14
work page 2021
-
[25]
Tarrant, A.; and Cowen, T. 2022. Big T ech Lobbying in the EU . The Political Quarterly, 93(2): 218--226
work page 2022
-
[26]
Tiedemann, J.; and Thottingal, S. 2020. OPUS-MT — B uilding open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). Lisbon, Portugal
work page 2020
-
[27]
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Transparency International . 1993. Mission, Vision and Values . Accessed: 2023-06-20
work page 1993
-
[29]
Twitter. 2023. Twitter API . https://developer.twitter.com/en/docs/twitter-api
work page 2023
-
[30]
Zheng, L.; Chiang, W.-L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint arXiv:2306.05685
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.