Recognition: unknown
Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach
Pith reviewed 2026-05-09 21:32 UTC · model grok-4.3
The pith
Computational analysis of over 450,000 Brazilian parliamentary speeches shows a shift to shorter rhetoric, crisis-driven agenda changes, and alignments based more on region and gender than party.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a scalable framework combining diachronic stylometric analysis, contextual topic modeling, and semantic clustering applied to over 450,000 speeches from the Brazilian Chamber of Deputies uncovers a long-term stylistic shift toward shorter and more direct speeches, a legislative agenda that reorients sharply in response to national crises, and a granular map of discursive alignments in which regional and gender identities often prove more salient than formal party affiliation.
What carries the argument
A scalable computational framework that combines diachronic stylometric analysis to track changes in speech style, contextual topic modeling to capture agenda shifts, and semantic clustering to map discursive similarities among deputies.
Load-bearing premise
The diachronic stylometric analysis, contextual topic modeling, and semantic clustering accurately capture rhetorical and semantic content without substantial bias from model choices, preprocessing, or corpus selection.
What would settle it
If an independent manual annotation of a random sample of the speeches or an alternative set of models and preprocessing choices fails to recover the same long-term shortening trend, crisis-linked topic reorientations, and region/gender-dominant clusters, the central results would not hold.
Figures
read the original abstract
Analyses of legislative behavior often rely on voting records, overlooking the rich semantic and rhetorical content of political speech. In this paper, we ask three complementary questions about parliamentary discourse: how things are said, what is being said, and who is speaking in discursively similar ways. To answer these questions, we introduce a scalable and generalizable computational framework that combines diachronic stylometric analysis, contextual topic modeling, and semantic clustering of deputies' speeches. We apply this framework to a large-scale case study of the Brazilian Chamber of Deputies, using a corpus of over 450,000 speeches from 2003 to 2025. Our results show a long-term stylistic shift toward shorter and more direct speeches, a legislative agenda that reorients sharply in response to national crises, and a granular map of discursive alignments in which regional and gender identities often prove more salient than formal party affiliation. More broadly, this work offers a robust methodology for analyzing parliamentary discourse as a multidimensional phenomenon that complements traditional vote-based approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a scalable computational framework combining diachronic stylometric analysis, contextual topic modeling, and semantic clustering to examine parliamentary discourse. Applied to a corpus of over 450,000 speeches from the Brazilian Chamber of Deputies (2003–2025), it claims to document a long-term shift toward shorter and more direct speeches, sharp reorientations in the legislative agenda during national crises, and a map of discursive alignments in which regional and gender identities are often more salient than formal party affiliation.
Significance. If the results hold, this work is significant for offering a multi-dimensional complement to vote-based legislative studies by incorporating rhetorical and semantic content at scale. The use of a large corpus, standard documented pipelines, and basic validation steps such as coherence scores and silhouette metrics are explicit strengths that support generalizability to other parliaments. The findings on identity saliency and crisis responsiveness could inform computational political science and encourage similar multi-faceted analyses elsewhere.
minor comments (2)
- The abstract states the headline results without referencing the validation metrics (coherence scores, silhouette metrics) used in the three core analyses; a brief clause noting these steps would improve transparency and align the summary with the methods section.
- Several figures depicting topic evolution over time and the embedding-based clusters would benefit from additional axis labels, legends, or annotations to enhance clarity and allow readers to assess the claimed patterns without ambiguity.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive summary of our manuscript, as well as the recommendation for minor revision. We are pleased that the significance of the multi-faceted computational framework, the scale of the 450k-speech corpus, and the key findings on stylistic simplification, crisis-driven agenda shifts, and the relative salience of regional/gender identities over party affiliation have been recognized. We will make the necessary minor revisions to improve clarity and presentation.
Circularity Check
No significant circularity; empirical application of standard tools
full rationale
The manuscript presents an empirical case study applying three established, externally documented NLP pipelines (diachronic stylometrics, contextual topic modeling, and embedding-based semantic clustering) to an independent corpus of 450k+ speeches. No equations, parameters, or predictions are derived from the target results themselves; all methods are standard (with reported coherence/silhouette validation) and the claims rest on observable patterns in the data rather than self-referential definitions or fitted quantities renamed as predictions. Self-citations, if any, are non-load-bearing and do not substitute for the core analyses.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
Optuna: A Next-generation Hyperparameter Opti- mization Framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining. Baptista, L.; Mooney, J.; and de Faria, P. 2021. The political- ideological spectrum in the Brazilian Senate: A text analy- sis of senators’ speeches (2015-2018).PLOS ONE, 16(6): e0251MT...
work page internal anchor Pith review arXiv 2021
-
[2]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Congressional-record: A parser for the Congressional Record. Kim, J.; Lee, S.; Kwon, J.; Gu, S.; Kim, Y .; Cho, M.; yong Sohn, J.; and Choi, C. 2024. Linq-Embed-Mistral:Elevating Text Retrieval with Improved GPT Data Through Task- Specific Control and Quality Refinement. Linq AI Research Blog. Laver, M.; Benoit, K.; and Garry, J. 2003. Extracting Policy P...
work page internal anchor Pith review arXiv 2024
-
[3]
For most authors... (a) Would answering this research question advance sci- ence without violating social contracts, such as violat- ing privacy norms, perpetuating unfair profiling, exac- erbating the socio-economic divide, or implying disre- spect to societies or cultures? Yes (b) Do your main claims in the abstract and introduction accurately reflect t...
-
[4]
Additionally, if your study involves hypotheses testing... (a) Did you clearly state the assumptions underlying all theoretical results? Yes (b) Have you provided justifications for all theoretical re- sults? Yes (c) Did you discuss competing hypotheses or theories that might challenge or complement your theoretical re- sults? Yes (d) Have you considered ...
-
[5]
(a) Did you state the full set of assumptions of all theoret- ical results? NA (b) Did you include complete proofs of all theoretical re- sults? NA
Additionally, if you are including theoretical proofs... (a) Did you state the full set of assumptions of all theoret- ical results? NA (b) Did you include complete proofs of all theoretical re- sults? NA
-
[6]
Additionally, if you ran machine learning experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL)? Yes (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Yes (c) Did you report error bars (...
-
[7]
(a) If your work uses existing assets, did you cite the cre- ators? Yes (b) Did you mention the license of the assets? No
Additionally, if you are using existing assets (e.g., code, data, models) or curating/releasing new assets,without compromising anonymity... (a) If your work uses existing assets, did you cite the cre- ators? Yes (b) Did you mention the license of the assets? No. The assets used are public records (c) Did you include any new assets in the supplemental mat...
-
[8]
Additionally, if you used crowdsourcing or conducted research with human subjects,without compromising anonymity... (a) Did you include the full text of instructions given to participants and screenshots? NA (b) Did you describe any potential participant risks, with mentions of Institutional Review Board (IRB) ap- provals? NA (c) Did you include the estim...
-
[9]
A list of exactly six macro-themes
-
[10]
1You are assisting with the thematic analysis of parliamentary speeches from the Brazilian Chamber of Deputies
For each macro-theme: 25- label 26- short description Listing 2: Prompt used in Stage 2 for contextual lexical expansion. 1You are assisting with the thematic analysis of parliamentary speeches from the Brazilian Chamber of Deputies. 2 3You will receive:
-
[11]
A list of macro-themes previously defined for the analysis
-
[12]
An initial manually defined set of seed keywords for each macro-theme
-
[13]
A list of granular BERTopic topics with representative keywords. 7 8Task: 9For each macro-theme, expand the initial seed keyword list with additional terms that are contextually related and likely to co-occur with the seed keywords in Brazilian parliamentary discourse. 10 11Instructions: 12- Preserve the meaning and scope of each macro-theme. 13- Suggest ...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.