pith. sign in

arxiv: 2309.11381 · v1 · submitted 2023-09-20 · 💻 cs.CL · cs.CE· cs.CY· cs.SI

Studying Lobby Influence in the European Parliament

Pith reviewed 2026-05-24 06:38 UTC · model grok-4.3

classification 💻 cs.CL cs.CEcs.CYcs.SI
keywords lobby influenceEuropean Parliamentnatural language processingsemantic similaritytext entailmentpolitical influenceMEP analysis
0
0 comments X

The pith

NLP methods discover links between European Parliament members and lobbies by matching speeches to position papers on semantic similarity and entailment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an NLP approach to study lobby influence on law-making in the European Parliament. It gathers lobbies' position papers and MEPs' speeches, then compares them using semantic similarity and entailment to identify connections between specific MEPs and lobbies. Without direct ground-truth links, validation relies on indirect checks against curated retweet networks between MEPs and lobbies plus records of disclosed meetings. The strongest model reaches an AUC of 0.77 and beats several baselines. Group-level patterns in the discovered links align with expected ideological alignments, such as center-left MEP groups associating with social causes.

Core claim

By comparing lobbies' position papers and MEPs' speeches on the basis of semantic similarity and entailment, the method uncovers interpretable links between individual MEPs and lobbies. These links are validated indirectly against a curated retweet dataset and publicly disclosed MEP meetings, with the best performing method achieving an AUC of 0.77 and outperforming baselines. Aggregate analysis of links between lobby groups and MEP political groups matches expectations from the groups' ideologies.

What carries the argument

Text comparison via semantic similarity and entailment measures applied to lobbies' position papers and MEPs' speeches to surface influence links.

If this is right

  • The method supplies a scalable way to surface influence patterns across large collections of parliamentary and lobby texts.
  • Aggregate link patterns can reveal how lobby groups align with different political ideologies in the parliament.
  • The approach offers one route toward greater transparency in how interest groups shape legislation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same text-matching pipeline could be applied to other national or regional legislatures that publish speeches and lobby documents.
  • If the links prove stable over time, they could serve as input features for models that predict voting behavior on specific bills.
  • Extending the comparison to include amendments or voting records might tighten the causal connection between lobby texts and MEP actions.

Load-bearing premise

Discovered links can be validated by matching them against retweet connections and disclosed MEP meetings when no direct ground-truth dataset exists.

What would settle it

A finding that the NLP-derived links show no better-than-chance agreement with the retweet dataset or the disclosed meeting records would undermine the validation.

Figures

Figures reproduced from arXiv: 2309.11381 by Antoine Magron, Aswin Suresh, Francesco Salvi, Lazar Radojevic, Matthias Grossglauser, Victor Kristof.

Figure 1
Figure 1. Figure 1: Data collection pipeline for lobbies. D1 contains all crawled PDF documents, D2 contains all English documents in D1, D3 contains the documents in D2 classified as position papers, and D4 contains the summaries of the documents in D3. of the word ‘position’ in the URL as the label. On the manu￾ally labeled validation set of 200 PDFs, the model achieves a precision of 95% and a recall of 39% in identifying … view at source ↗
Figure 2
Figure 2. Figure 2: ROC curves for the Retweet dataset - Full (Top) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Lobby focus heatmap. Political groups are ordered [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Lobby clusters projected on principal components. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

We present a method based on natural language processing (NLP), for studying the influence of interest groups (lobbies) in the law-making process in the European Parliament (EP). We collect and analyze novel datasets of lobbies' position papers and speeches made by members of the EP (MEPs). By comparing these texts on the basis of semantic similarity and entailment, we are able to discover interpretable links between MEPs and lobbies. In the absence of a ground-truth dataset of such links, we perform an indirect validation by comparing the discovered links with a dataset, which we curate, of retweet links between MEPs and lobbies, and with the publicly disclosed meetings of MEPs. Our best method achieves an AUC score of 0.77 and performs significantly better than several baselines. Moreover, an aggregate analysis of the discovered links, between groups of related lobbies and political groups of MEPs, correspond to the expectations from the ideology of the groups (e.g., center-left groups are associated with social causes). We believe that this work, which encompasses the methodology, datasets, and results, is a step towards enhancing the transparency of the intricate decision-making processes within democratic institutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents an NLP method using semantic similarity and entailment between lobby position papers and MEP speeches to discover influence links in the European Parliament. In the absence of ground truth, it performs indirect validation against curated retweet links and disclosed MEP meetings, reporting an AUC of 0.77 that outperforms baselines, and shows that aggregate discovered links between lobby groups and MEP political groups align with ideological expectations (e.g., center-left groups linked to social causes).

Significance. If the method can isolate influence from topical or ideological alignment, the approach and datasets could contribute to transparency tools for EU legislative processes. The indirect validation strategy and reproducible comparison to baselines are positive elements, but the central claim that textual overlap indicates lobbying influence rather than shared policy focus remains vulnerable to alternative explanations.

major comments (2)
  1. [Validation and aggregate analysis sections (around the AUC results and ideology correspondence)] The indirect validation (retweet and meeting proxies) does not include controls for political group membership or topic, leaving open the possibility that the AUC 0.77 reflects ideological/topic alignment rather than influence. This is load-bearing because the paper's own aggregate analysis shows links matching ideological expectations, which is also predicted by the confound.
  2. [Method and results sections describing the similarity/entailment model] The claim that discovered links are interpretable as influence rests on the assumption that semantic similarity/entailment captures lobbying effects beyond independent alignment on issues; no ablation or matching procedure is described to test this (e.g., comparing within vs. across political groups).
minor comments (2)
  1. [Method section] Clarify the exact entailment model and similarity threshold choices in the main text rather than deferring entirely to supplementary material.
  2. [Abstract] The abstract states the aggregate analysis 'correspond to the expectations from the ideology of the groups'; rephrase for precision to avoid implying this is confirmatory rather than consistent with multiple interpretations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The concerns about potential confounds from ideological or topical alignment in both the validation and aggregate analysis are substantive. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: The indirect validation (retweet and meeting proxies) does not include controls for political group membership or topic, leaving open the possibility that the AUC 0.77 reflects ideological/topic alignment rather than influence. This is load-bearing because the paper's own aggregate analysis shows links matching ideological expectations, which is also predicted by the confound.

    Authors: We agree this is a valid limitation of the current validation strategy. Retweets and meetings were selected as they reflect observable interactions, yet without stratification the AUC may partly capture alignment. In revision we will recompute AUC scores within versus across political groups and topics, and expand the limitations discussion to note that the observed ideological correspondence in aggregate links is consistent with both influence and alignment explanations. revision: yes

  2. Referee: The claim that discovered links are interpretable as influence rests on the assumption that semantic similarity/entailment captures lobbying effects beyond independent alignment on issues; no ablation or matching procedure is described to test this (e.g., comparing within vs. across political groups).

    Authors: The manuscript currently presents no such ablation. To test whether similarity and entailment scores capture effects beyond alignment, we will add a within-group versus across-group performance comparison in the results section. This analysis, together with an explicit statement of the underlying assumption, will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; validation uses independent external datasets

full rationale

The paper's core derivation computes semantic similarity and entailment scores between independently collected lobby position papers and MEP speeches to produce candidate links. Validation is performed against two separately curated external datasets (retweet links and disclosed MEP meetings) that are not generated by or fitted within the similarity model. No equations, parameters, or self-citations are shown to reduce the discovered links or the AUC metric back to the model's own outputs by construction. The reported ideological alignment of aggregate links is an observational result rather than an input used to derive the links. The chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions about what text overlap means and what counts as a usable proxy for influence; no free parameters, mathematical axioms, or new invented entities are introduced.

axioms (2)
  • domain assumption Semantic similarity and textual entailment between lobby position papers and MEP speeches indicate influence or alignment links
    Invoked to discover the links (abstract).
  • domain assumption Retweet links and publicly disclosed MEP-lobby meetings constitute valid indirect proxies for validating influence links
    Invoked for validation in the absence of ground truth (abstract).

pith-pipeline@v0.9.0 · 5761 in / 1460 out tokens · 28719 ms · 2026-05-24T06:38:29.552372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Bedn\'arikov\'a, Z.; and J\'ilkov\'a, J. 2012. Why is the agricultural lobby in the European Union member states so effective? E+M Ekonomie a Management, (2): 26

  4. [4]

    Bouwen, P. 2003. A Theoretical and Empirical Study of Corporate Lobbying in the European Parliament . European integration online papers (EIoP), 7(11)

  5. [5]

    European Parliament . 2019. EP Approves More Transparency and Efficiency in its Internal Rules. Accessed: 2023-08-06

  6. [6]

    European Union . 2011. EU Transparency Register . Accessed: 2023-06-20

  7. [7]

    European Union . 2021. European Data Portal . Accessed: 2021-02-14

  8. [8]

    HTCondor . 2023. HTCondor Overview. Accessed: 2023-08-06

  9. [9]

    Ibenskas, R.; and Bunea, A. 2021. Legislators, organizations and ties: U nderstanding interest group recognition in the European Parliament . European Journal of Political Research, 60(3): 560--582

  10. [10]

    Integrity Watch . 2023. Integrity Watch Data Hub . Accessed: 2023-06-20

  11. [11]

    Jolly, S.; Bakker, R.; Hooghe, L.; Marks, G.; Polk, J.; Rovny, J.; Steenbergen, M.; and Vachudova, M. A. 2022. Chapel Hill Expert Survey trend file, 1999–2019. Electoral Studies, 75: 102420

  12. [12]

    Joulin, A.; Grave, E.; Bojanowski, P.; Douze, M.; J \'e gou, H.; and Mikolov, T. 2016 a . fastText.zip : C ompressing text classification models. arXiv preprint arXiv:1612.03651

  13. [13]

    Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2016 b . fastText : L anguage Identification. https://fasttext.cc/docs/en/language-identification.html

  14. [14]

    Kristof, V.; Suresh, A.; Grossglauser, M.; and Thiran, P. 2021. War of Words II : E nriched Models of Law-Making Processes. In Proceedings of the Web Conference 2021, WWW '21, 2014–2024. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383127

  15. [15]

    Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. 2020. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871--7880. Online: Association for ...

  16. [16]

    Metaxas, P.; Mustafaraj, E.; Wong, K.; Zeng, L.; O'Keefe, M.; and Finn, S. 2015. What do retweets indicate? R esults from user survey and meta-review of research. In Proceedings of the International AAAI Conference on Web and Social Media, volume 9, 658--661

  17. [17]

    Obama White House . 2018. Open Government Initiative . Accessed: 2020-10-19

  18. [18]

    OpenAI . 2023. Chat Completions API . Accessed: 2023-06-20

  19. [19]

    Parltrack. 2023. Parltrack. https://parltrack.org/

  20. [20]

    Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140): 1--67

  21. [21]

    Rasmussen, M. K. 2015. The Battle for Influence: T he Politics of Business Lobbying in the European Parliament . JCMS: Journal of Common Market Studies, 53(2): 365--382

  22. [22]

    Reimers, N. 2022. EasyNMT . https://github.com/UKPLab/EasyNMT

  23. [23]

    Reimers, N.; and Gurevych, I. 2019. Sentence- BERT : S entence Embeddings using Siamese BERT -Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982--3992

  24. [24]

    Swiss Government . 2021. Swiss Open Government Data . Accessed: 2021-02-14

  25. [25]

    Tarrant, A.; and Cowen, T. 2022. Big T ech Lobbying in the EU . The Political Quarterly, 93(2): 218--226

  26. [26]

    Tiedemann, J.; and Thottingal, S. 2020. OPUS-MT — B uilding open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). Lisbon, Portugal

  27. [27]

    Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

  28. [28]

    Transparency International . 1993. Mission, Vision and Values . Accessed: 2023-06-20

  29. [29]

    Twitter. 2023. Twitter API . https://developer.twitter.com/en/docs/twitter-api

  30. [30]

    Zheng, L.; Chiang, W.-L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint arXiv:2306.05685