Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

Ana P. C. Silva; Carlos H. G. Ferreira; Fabricio Murai; Geovana S. de Oliveira

arxiv: 2604.18586 · v1 · submitted 2026-03-04 · 💻 cs.CY · cs.AI· cs.CL· cs.LG· cs.SI

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

Geovana S. de Oliveira , Ana P. C. Silva , Fabricio Murai , Carlos H. G. Ferreira This is my paper

Pith reviewed 2026-05-15 16:11 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CLcs.LGcs.SI

keywords vaccine discoursestance detectionYouTubeBrazilpolarizationsemi-supervised learningscience communicationmisinformation

0 comments

The pith

Semi-supervised modeling of 1.4 million YouTube comments shows science communicators and digital-native channels host the main pro- and anti-vaccine engagement in Brazil.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a semi-supervised stance detection method combining self-labeling and self-training to classify nearly 1.4 million comments across Brazil's full set of vaccines on YouTube. This approach enables tracking of attitudes over long periods and multiple vaccines rather than single events or English-only data. Polarization rises sharply during crises such as COVID-19 but fragments afterward across different vaccines and interaction styles. Science communication and digital-native outlets emerge as the central spaces where both supportive and opposing voices concentrate. The work provides evidence on how narratives circulate in a hybrid media system and identifies structural points where health communication is most vulnerable.

Core claim

Integrating stance labels from the semi-supervised framework with temporal patterns, engagement metrics, and channel types shows that polarization spikes during epidemiological crises but becomes fragmented across vaccines and interaction patterns in the post-pandemic period, with science communication and digital-native channels serving as the primary loci of both supportive and oppositional engagement.

What carries the argument

Semi-supervised stance detection framework that combines self-labeling and self-training to classify comments as pro- or anti-vaccine while integrating channel taxonomy and temporal engagement data.

If this is right

Public health agencies gain a way to monitor attitude shifts across the entire immunization schedule rather than isolated vaccines.
Polarization patterns can be tracked in real time during future health crises to guide communication timing.
Science communication and digital-native channels become priority targets for both supportive messaging and countering opposition.
Fragmented post-pandemic polarization implies that uniform national strategies may be less effective than vaccine-specific approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same semi-supervised method could be tested on other languages or platforms to check whether science and digital-native channels play similar roles elsewhere.
Engagement metrics combined with stance could serve as early signals for rising misinformation around new vaccines.
Channel taxonomy suggests traditional legacy media play a secondary role, pointing to a structural shift in where health debates now occur.

Load-bearing premise

The semi-supervised stance detection framework produces accurate classifications without substantial bias from the labeling process or from YouTube comments failing to represent broader Brazilian public attitudes.

What would settle it

A manual annotation of a random sample of several thousand comments or a direct comparison against independent national surveys of vaccine attitudes would confirm or refute the accuracy of the automated stance labels.

Figures

Figures reproduced from arXiv: 2604.18586 by Ana P. C. Silva, Carlos H. G. Ferreira, Fabricio Murai, Geovana S. de Oliveira.

**Figure 2.** Figure 2: Distribution of vaccine-specific mentions over the years (z-score normalized per vaccine). [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Conditional reply probabilities by stance in pre, [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Vaccination remains a cornerstone of global public health, yet the COVID-19 pandemic exposed how online misinformation, political polarization, and declining institutional trust can undermine immunization efforts. Most of the prior computational studies that analyzed vaccine discourse on social platforms focus on English-language data, specific vaccines, or short time windows, impairing our understanding of long-term dynamics in high-impact, non-English contexts like Brazil, home to one of the world's most comprehensive immunization systems. We here present the largest longitudinal study of Brazil's vaccine discourse on YouTube, leveraging a semi-supervised stance detection framework that combines self-labeling and self-training to classify nearly 1.4 million comments. By integrating stance with temporal patterns, engagement metrics, and channel taxonomy (legacy media, science communicators, digital-native outlets), we map how pro- and anti-vaccine narratives evolve and circulate within a hybrid media ecosystem. Our results show that semi-supervised learning substantially improves stance classification robustness, enabling fine-grained tracking of public attitudes across Brazil's full immunization schedule. Polarization spikes during epidemiological crises, especially COVID-19, but becomes fragmented across vaccines and interaction patterns in the post-pandemic period. Notably, science communication and digital-native channels emerge as the primary loci of both supportive and oppositional engagement, revealing structural vulnerabilities in contemporary health communication. Thus, our work advances computational methods for large-scale stance modeling while offering actionable evidence for public health agencies, platform governance, and online information ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Large-scale Brazilian YouTube vaccine data is the real draw, but the semi-supervised classifier needs validation numbers before the polarization claims can be trusted.

read the letter

The paper's main value is the scale: nearly 1.4 million comments tracked across Brazil's full vaccine schedule on YouTube, with a breakdown by channel type and time. That longitudinal view in a non-English, high-coverage immunization setting fills a gap left by most prior work that stayed in English or short windows. The integration of stance labels with engagement metrics and a simple channel taxonomy (legacy, science communicators, digital-native) produces some clear patterns, like polarization rising during COVID and then fragmenting afterward, plus science and digital channels hosting most of the back-and-forth. Those descriptive results are the part worth keeping even if other pieces need work. The semi-supervised method itself is presented as the technical step forward, but the write-up gives no held-out human gold labels, no precision-recall numbers, and no ablation showing what self-training actually adds over the seed labels. Without those, any claim that the classifier is robust enough to support fine-grained attitude tracking or polarization spikes stays untested. YouTube comments also carry their own selection bias, so the pro/anti splits may not map cleanly to wider Brazilian opinion. The citation pattern looks standard and the logic avoids circularity. This is useful reading for anyone doing computational work on health discourse outside English platforms or for public-health teams that need country-specific online signals. It is not a methods breakthrough, but the data collection and temporal scope justify sending it to referees. A serious review would focus on adding validation metrics and checking sensitivity to labeling choices; the core descriptive contribution is solid enough to survive that process.

Referee Report

2 major / 1 minor

Summary. The manuscript presents the largest longitudinal study of Brazil's vaccine discourse on YouTube, classifying nearly 1.4 million comments via a semi-supervised stance detection framework that combines self-labeling and self-training. Integrating stance labels with temporal patterns, engagement metrics, and a channel taxonomy (legacy media, science communicators, digital-native outlets), it claims that semi-supervised learning substantially improves classification robustness, that polarization spikes during epidemiological crises (especially COVID-19) but fragments post-pandemic, and that science communication and digital-native channels are the primary loci of both supportive and oppositional engagement.

Significance. If the stance classifications prove accurate and low-bias, the work would constitute a significant contribution by providing the first large-scale, long-term mapping of vaccine attitudes in a non-English, high-impact public-health context, advancing semi-supervised methods for stance modeling while generating actionable evidence on media-ecosystem vulnerabilities for public-health agencies and platform governance.

major comments (2)

[Methods] Methods section: The central claim that semi-supervised learning (self-labeling + self-training) substantially improves stance classification robustness is unsupported by any reported held-out validation metrics. No precision, recall, or F1 scores on a manually annotated test set separate from the seed labels are provided, nor are ablation results isolating the self-training gain or inter-annotator agreement for the initial seeds. This is load-bearing for all downstream polarization and channel-taxonomy findings.
[Results] Results section: Without an error analysis or bias audit of the iterative labeling process, it is impossible to rule out systematic misclassification (e.g., differential performance on anti-vaccine comments), which would propagate into the reported spikes during COVID-19 and the post-pandemic fragmentation across vaccines and interaction patterns.

minor comments (1)

[Abstract] Abstract and §1: The exact number of comments after filtering, the precise definition of the channel taxonomy, and the temporal window boundaries should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important gaps in the validation of our semi-supervised stance detection pipeline and the need for greater transparency regarding potential classification biases. We address each point below and will incorporate the suggested analyses into a revised manuscript.

read point-by-point responses

Referee: [Methods] Methods section: The central claim that semi-supervised learning (self-labeling + self-training) substantially improves stance classification robustness is unsupported by any reported held-out validation metrics. No precision, recall, or F1 scores on a manually annotated test set separate from the seed labels are provided, nor are ablation results isolating the self-training gain or inter-annotator agreement for the initial seeds. This is load-bearing for all downstream polarization and channel-taxonomy findings.

Authors: We acknowledge that the manuscript as submitted does not include held-out validation metrics, ablation studies, or inter-annotator agreement statistics for the seed labels. This omission weakens the support for our claim of improved robustness. In the revision we will add a new subsection to the Methods that describes the creation of a manually annotated held-out test set (distinct from the seed labels), reports inter-annotator agreement, and presents precision, recall, and F1 scores for both a supervised baseline and the full semi-supervised model. We will also include ablation experiments that isolate the contribution of the self-training stage. These additions will directly substantiate the methodological claims before the downstream polarization analyses. revision: yes
Referee: [Results] Results section: Without an error analysis or bias audit of the iterative labeling process, it is impossible to rule out systematic misclassification (e.g., differential performance on anti-vaccine comments), which would propagate into the reported spikes during COVID-19 and the post-pandemic fragmentation across vaccines and interaction patterns.

Authors: We agree that the absence of an error analysis leaves open the possibility of systematic misclassification, particularly for anti-vaccine content. In the revised manuscript we will insert a dedicated error-analysis subsection in the Results. This will include (1) a manual review of a stratified sample of comments labeled pro- and anti-vaccine by the final model, (2) quantitative assessment of differential error rates across stance classes and time periods, and (3) discussion of how any observed biases could affect the reported temporal spikes and post-pandemic fragmentation patterns. We will also add a limitations paragraph that explicitly addresses the implications for the channel-taxonomy findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in semi-supervised stance modeling

full rationale

The paper's core pipeline ingests raw YouTube comments, applies self-labeling plus self-training to produce stance labels, then derives temporal polarization, channel taxonomy, and engagement statistics from those labels. No step equates an output quantity to its own input by definition, renames a fitted parameter as a prediction, or relies on a self-citation chain to establish uniqueness. The semi-supervised process operates on new data without presupposing the polarization or fragmentation results it later reports, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be extracted from the text.

pith-pipeline@v0.9.0 · 5594 in / 1100 out tokens · 52649 ms · 2026-05-15T16:11:55.481618+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

[1]

d.].DATASUS - Informações de Saúde (TABNET)

[n. d.].DATASUS - Informações de Saúde (TABNET). https://datasus.saude.gov.br/ informacoes-de-saude-tabnet/

work page
[2]

Roland P Abao, Ma Regina Justina E Estuar, Anna Angeline M Cataluña, Jelly P Aureus, and Dorothy C Mapua. 2021. Emotion analysis of comments from vaccine-related YouTube videos: Understanding the public’s response to COVID- 19 vaccination. InIEEE SNAMS. 1–7

work page 2021
[3]

Malak Alsabban. 2021. Comparing two sentiment analysis approaches by under- stand the hesitancy to COVID-19 vaccine based on Twitter data in two cultures. InCompanion of ACM WebSci. 143–144

work page 2021
[4]

Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Lies Hadjadj, Emilie Devijver, and Yury Maximov. 2025. Self-training: A survey.Neurocomputing616 (2025), 128904

work page 2025
[5]

Irfan Aygün, Buket Kaya, and Mehmet Kaya. 2021. Aspect based twitter senti- ment analysis on vaccination and vaccine types in covid-19 pandemic with deep learning.IEEE J-BHI26, 5 (2021), 2360–2369

work page 2021
[6]

Ann J Barbier, Allen Yujie Jiang, Peng Zhang, Richard Wooster, and Daniel G Anderson. 2022. The clinical progress of mRNA vaccines and immunotherapies. Nat. Biotechnol.(2022)

work page 2022
[7]

2017.Deep learning

Yoshua Bengio, Ian Goodfellow, Aaron Courville, et al. 2017.Deep learning. Vol. 1. MIT press Cambridge, MA, USA

work page 2017
[8]

Guillermo Blanco, Rubén Yáñez Martínez, and Anália Lourenço. 2025. Leveraging deep learning to detect stance in Spanish tweets on COVID-19 vaccination.JAMIA open8, 1 (2025), ooaf007

work page 2025
[9]

Rebecca M Casey, Jennifer B Harris, Steve Ahuka-Mundeke, Meredith G Dixon, Gabriel M Kizito, Pierre M Nsele, Grace Umutesi, Janeen Laven, Olga Kosoy, Gilson Paluku, et al. 2019. Immunogenicity of fractional-dose vaccine during a yellow fever outbreak.N. Engl. J. Med.(2019)

work page 2019
[10]

Jessica Costa, Geovana Oliveira, Guilherme Fonseca, Davi Reis, Giancarlo Oliveira Teixeira, Washington Cunha, Leonardo Rocha, and Carlos HG Ferreira

work page
[11]

InProceedings of the 17th ACM Web Science Conference 2025

Characterizing YouTube’s Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil. InProceedings of the 17th ACM Web Science Conference 2025. 42–51

work page 2025
[12]

Saul Sousa da Rocha, Carlos Henrique do Vale, Carlos HG Ferreira, Glauber Dias Gonçalves, Jussara Marques de Almeida, et al . 2024. Monitorando a opinião pública sobre operações policiais no brasil via comentários de vídeos no youtube. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 158–171

work page 2024
[13]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLORA: efficient finetuning of quantized LLMs(NeurIPS). Article 441, 28 pages. WebSci ’26, May 26–29, 2026, Braunschweig, Germany Oliveira, et al

work page 2023
[14]

Aline Dias, Richardy R Tanure, Jussara M Almeida, Helen CSC Lima, and Car- los HG Ferreira. 2024. Análise da Percepção do Uso de Cigarros Eletrônicos no Brasil por meio de Comentários no YouTube. InBrazilian Symposium on Multimedia and the Web

work page 2024
[15]

Jingcheng Du, Chongliang Luo, Ross Shegog, Jiang Bian, Rachel M Cunningham, Julie A Boom, Gregory A Poland, Yong Chen, and Cui Tao. 2020. Use of deep learning to analyze social media discussions about the human papillomavirus vaccine.JAMA Netw. Open.3, 11 (2020), e2022025

work page 2020
[16]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv e-prints(2024), arXiv–2407

work page 2024
[17]

Alessandra Fallucca, Walter Priano, Alessandro Carubia, Patrizia Ferro, Vincenzo Pisciotta, Alessandra Casuccio, and Vincenzo Restivo. 2024. Effectiveness of Catch-Up Vaccination Interventions Versus Standard or Usual Care Procedures in Increasing Adherence to Recommended Vaccinations Among Different Age Groups: Systematic Review and Meta-Analysis of Rand...

work page 2024
[18]

Medina Ferreira, Ana Paula Couto da Silva, and Fabricio Murai

Rafael S. Medina Ferreira, Ana Paula Couto da Silva, and Fabricio Murai. 2022. Risk Perception and Misinformation in Brazilian Twitter during COVID-19 Infodemic. InIEEE SocialCom

work page 2022
[19]

Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull.76, 5 (1971), 378

work page 1971
[20]

Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The comparison of proportions from several independent samples.Statistical methods for rates and proportions(1981)

work page 1981
[21]

Guilherme Fonseca, Washington Cunha, Gabriel Prenassi, Marcos André Gonçalves, and Leonardo Chaves Dutra Da Rocha. 2025. Instance-selection- inspired undersampling strategies for bias reduction in small and large language models for binary text classification. InACL. 9323–9340

work page 2025
[22]

Da Fonseca, Carlos Henrique Gomes Ferreira, and Julio Ce- sar Soares Dos Reis

Luis Guilherme G. Da Fonseca, Carlos Henrique Gomes Ferreira, and Julio Ce- sar Soares Dos Reis. 2024. The Role of News Source Certification in Shaping Tweet Content: Textual and Dissemination Patterns in Brazil’s 2022 Elections. In Brazilian Symp. on Inform. Syst.1–10

work page 2024
[23]

2011.Entropy and information theory

Robert M Gray. 2011.Entropy and information theory. Springer

work page 2011
[24]

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. 2024. A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE TPAMI46, 12 (2024), 9052–9071

work page 2024
[25]

Liang-Chin Huang, Amanda L Eiden, Long He, Augustine Annan, Siwei Wang, Jingqi Wang, Frank J Manion, Xiaoyan Wang, Jingcheng Du, Lixia Yao, et al

work page
[26]

Inform.12, 1 (2024), e57164

Natural Language Processing–Powered Real-Time Monitoring Solution for Vaccine Sentiments and Hesitancy on Social Media: System Development and Validation.JMIR Med. Inform.12, 1 (2024), e57164

work page 2024
[27]

Juwon Hwang, Min-Hsin Su, Xiaoya Jiang, Ruixue Lian, Arina Tveleneva, and Dhavan Shah. 2022. Vaccine discourse during the onset of the COVID-19 pan- demic: Topical structure and source patterns informing efforts to combat vaccine hesitancy.Plos one17, 7 (2022), e0271394

work page 2022
[28]

IBOPE. 2023. Video Audience Share Percentage in Brazil. https:// kantaribopemedia.com/conteudo/relatorios/april-2023/

work page 2023
[29]

Florian Kunneman, Mattijs Lambooij, Albert Wong, Antal van den Bosch, and Liesbeth Mollema. 2020. Monitoring stance towards vaccination in twitter mes- sages.BMC Med. Inform. Decis. Mak.20, 1 (2020), 33

work page 2020
[30]

Marin Lahouati, Antoine De Coucy, Jean Sarlangue, and Charles Cazanave. 2020. Spread of vaccine hesitancy in France: What about YouTube™?Vaccine38, 36 (2020), 5779–5782

work page 2020
[31]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data.Biometrics(1977), 159–174

work page 1977
[32]

Heidi J Larson. 2022. Defining and measuring vaccine hesitancy.Nat. Hum. Behav.6, 12 (2022), 1609–1610

work page 2022
[33]

Marcelo Sartori Locatelli, Josemar Caetano, Wagner Meira Jr, and Virgilio Almeida. 2022. Characterizing vaccination movements on YouTube in the United States and Brazil. InACM HT

work page 2022
[34]

Larissa Malagoli, Júlia Stancioli, Carlos HG Ferreira, Marisa Vasconcelos, Ana Paula Couto da Silva, and Jussara Almeida. 2021. Caracterizaçao do debate no twitter sobre a vacinaçao contra a covid-19 no brasil. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 55–66

work page 2021
[35]

Larissa G Malagoli, Julia Stancioli, Carlos HG Ferreira, Marisa Vasconcelos, Ana Paula Couto da Silva, and Jussara M Almeida. 2021. A look into covid- 19 vaccination debate on twitter. InProceedings of the 13th ACM Web Science Conference 2021. 225–233

work page 2021
[36]

Chad A Melton, Olufunto A Olusanya, Nariman Ammar, and Arash Shaban- Nejad. 2021. Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: A call to action for strengthening vaccine confidence.J. Infect. Public Health(2021)

work page 2021
[37]

Chad A Melton, Brianna M White, Robert L Davis, Robert A Bednarczyk, and Arash Shaban-Nejad. 2022. Fine-tuned sentiment analysis of covid-19 vaccine– related social media data: Comparative study.JMIR24, 10 (2022), e40408

work page 2022
[38]

Tamar Mitts, Nilima Pisharody, and Jacob Shapiro. 2022. Removal of anti-vaccine content impacts social media discourse. InACM WebSci. 319–326

work page 2022
[39]

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, and Kazutoshi Sasahara. 2024. The impact of toxic trolling comments on anti-vaccine YouTube videos.Sci. Rep.14, 1 (2024), 5088

work page 2024
[40]

Bjarke Mønsted and Sune Lehmann. 2022. Characterizing polarization in online vaccine discourse—A large-scale study.PloS one17, 2 (2022), e0263746

work page 2022
[41]

Gabriel P Nobre, Carlos HG Ferreira, and Jussara M Almeida. 2022. More of the same? a study of images shared on mastodon’s federated timeline. InInternational Conference on Social Informatics. Springer, 181–195

work page 2022
[42]

Geovana S Oliveira, João Pedro Lobo, Otávio Venâncio, Vinícius da F Vieira, Jussara M Almeida, Ana PC Silva, Ronan S Ferreira, and Carlos HG Ferreira

work page
[43]

A Network-Driven Framework for Bidimensional Analysis of Information Dissemination on Social Media Platforms.Journal on Interactive Systems16, 1 (2025), 773–794

work page 2025
[44]

Geovana S Oliveira, Otávio Venâncio, Vinícius Vieira, Jussara Almeida, Ana PC Silva, Ronan Ferreira, and Carlos HG Ferreira. 2024. Um framework para análise bidimensional de disseminação de informações em plataformas de mídias sociais. InBrazilian Symposium on Multimedia and the Web (WebMedia). SBC, 301–309

work page 2024
[45]

World Health Organization et al. 2022. Behavioural and social drivers of vaccina- tion: tools and practical guidance for achieving high uptake. (2022)

work page 2022
[46]

Yang Pan, Quanyi Wang, Peng Yang, Li Zhang, Shuangsheng Wu, Yi Zhang, Ying Sun, Wei Duan, Chunna Ma, Man Zhang, et al. 2017. Influenza vaccination in preventing outbreaks in schools: A long-term ecological overview.Vaccine35, 51 (2017), 7133–7138

work page 2017
[47]

Jadher Pércio, Eder Gatti Fernandes, Ethel Leonor Maciel, and Nísia Verônica Trindade de Lima. 2023. 50 years of the Brazilian National Immunization Program and the Immunization Agenda 2030.Epidemiologia e Serviços de Saúde32 (2023), e20231009

work page 2023
[48]

Miftahul Qorib, Timothy Oladunni, Max Denis, Esther Ososanya, and Paul Cotae

work page
[49]

Appl.212 (2023), 118715

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.Expert Syst. Appl.212 (2023), 118715

work page 2023
[50]

Guilherme O Santos, Lucas S Vieira, Giulio Rossetti, Carlos HG Ferreira, and Gladston JP Moreira. 2025. A high-performance evolutionary multiobjective community detection algorithm.Social Network Analysis and Mining15, 1 (2025), 110

work page 2025
[51]

Romy Sauvayre, Jessica Vernier, and Cédric Chauvière. 2022. An analysis of French-language tweets about COVID-19 vaccines: Supervised learning approach. JMIR Med. Inform.10, 5 (2022), e37831

work page 2022
[52]

Brener Santos Silva, Eliete Albano de Azevedo Guimarães, Valéria Conceição de Oliveira, Ricardo Bezerra Cavalcante, Marta Macedo Kerr Pinheiro, Tarcísio Laerte Gontijo, Samuel Barroso Rodrigues, Ana Paula Ferreira, Humberto Ferreira de Oliveira Quites, and Ione Carvalho Pinto. 2020. National immunization program information system: implementation context ...

work page 2020
[53]

Melodie Yun-Ju Song and Anatoliy Gruzd. 2017. Examining sentiments and popularity of pro-and anti-vaccination videos on YouTube. InSocial Media + Society. 1–8

work page 2017
[54]

Nadiya Straton. 2023. COVID vaccine stigma: detecting stigma across social media platforms with computational model based on deep learning.Appl. Intell. 53, 13 (2023), 16398–16423

work page 2023
[55]

Fahim K Sufi, Imran Razzak, and Ibrahim Khalil. 2022. Tracking anti-vax social movement using AI-based social media monitoring.IEEE-TTS3, 4 (2022), 290– 299

work page 2022
[56]

Richardy R Tanure, Aline M Dias, Lucas A Camelo, Jussara Almeida, Helen CSC Lima, and Carlos HG Ferreira. 2025. Caracterização do debate online sobre cigarro eletrônico no Brasil: Uma análise de tópicos de discussão no YouTube. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 54–64

work page 2025
[57]

Dayane Fumiyo Tokojima Machado, Alexandre Fioravante de Siqueira, and Leda Gitahy. 2020. Natural stings: Selling distrust about vaccines on Brazilian YouTube. Front. Comm.5 (2020), 577941

work page 2020
[58]

Jia Xue, Junxiang Chen, Ran Hu, Chen Chen, Chengda Zheng, Yue Su, and Tingshao Zhu. 2020. Twitter discussions and emotions about the COVID-19 pandemic: Machine learning approach.JMIR22, 11 (2020), e20550

work page 2020
[59]

Sihong Zhao, Simeng Hu, Xiaoyu Zhou, Suhang Song, Qian Wang, Hongqiu Zheng, Ying Zhang, Zhiyuan Hou, et al. 2023. The prevalence, features, influ- encing factors, and solutions for COVID-19 vaccine misinformation: systematic review.JPHS9, 1 (2023), e40201

work page 2023
[60]

Paola Zola, Costantino Ragno, and Paulo Cortez. 2020. A Google Trends spatial clustering approach for a worldwide Twitter user geolocation.IPM57, 6 (2020), 102312

work page 2020

[1] [1]

d.].DATASUS - Informações de Saúde (TABNET)

[n. d.].DATASUS - Informações de Saúde (TABNET). https://datasus.saude.gov.br/ informacoes-de-saude-tabnet/

work page

[2] [2]

Roland P Abao, Ma Regina Justina E Estuar, Anna Angeline M Cataluña, Jelly P Aureus, and Dorothy C Mapua. 2021. Emotion analysis of comments from vaccine-related YouTube videos: Understanding the public’s response to COVID- 19 vaccination. InIEEE SNAMS. 1–7

work page 2021

[3] [3]

Malak Alsabban. 2021. Comparing two sentiment analysis approaches by under- stand the hesitancy to COVID-19 vaccine based on Twitter data in two cultures. InCompanion of ACM WebSci. 143–144

work page 2021

[4] [4]

Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Lies Hadjadj, Emilie Devijver, and Yury Maximov. 2025. Self-training: A survey.Neurocomputing616 (2025), 128904

work page 2025

[5] [5]

Irfan Aygün, Buket Kaya, and Mehmet Kaya. 2021. Aspect based twitter senti- ment analysis on vaccination and vaccine types in covid-19 pandemic with deep learning.IEEE J-BHI26, 5 (2021), 2360–2369

work page 2021

[6] [6]

Ann J Barbier, Allen Yujie Jiang, Peng Zhang, Richard Wooster, and Daniel G Anderson. 2022. The clinical progress of mRNA vaccines and immunotherapies. Nat. Biotechnol.(2022)

work page 2022

[7] [7]

2017.Deep learning

Yoshua Bengio, Ian Goodfellow, Aaron Courville, et al. 2017.Deep learning. Vol. 1. MIT press Cambridge, MA, USA

work page 2017

[8] [8]

Guillermo Blanco, Rubén Yáñez Martínez, and Anália Lourenço. 2025. Leveraging deep learning to detect stance in Spanish tweets on COVID-19 vaccination.JAMIA open8, 1 (2025), ooaf007

work page 2025

[9] [9]

Rebecca M Casey, Jennifer B Harris, Steve Ahuka-Mundeke, Meredith G Dixon, Gabriel M Kizito, Pierre M Nsele, Grace Umutesi, Janeen Laven, Olga Kosoy, Gilson Paluku, et al. 2019. Immunogenicity of fractional-dose vaccine during a yellow fever outbreak.N. Engl. J. Med.(2019)

work page 2019

[10] [10]

Jessica Costa, Geovana Oliveira, Guilherme Fonseca, Davi Reis, Giancarlo Oliveira Teixeira, Washington Cunha, Leonardo Rocha, and Carlos HG Ferreira

work page

[11] [11]

InProceedings of the 17th ACM Web Science Conference 2025

Characterizing YouTube’s Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil. InProceedings of the 17th ACM Web Science Conference 2025. 42–51

work page 2025

[12] [12]

Saul Sousa da Rocha, Carlos Henrique do Vale, Carlos HG Ferreira, Glauber Dias Gonçalves, Jussara Marques de Almeida, et al . 2024. Monitorando a opinião pública sobre operações policiais no brasil via comentários de vídeos no youtube. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 158–171

work page 2024

[13] [13]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLORA: efficient finetuning of quantized LLMs(NeurIPS). Article 441, 28 pages. WebSci ’26, May 26–29, 2026, Braunschweig, Germany Oliveira, et al

work page 2023

[14] [14]

Aline Dias, Richardy R Tanure, Jussara M Almeida, Helen CSC Lima, and Car- los HG Ferreira. 2024. Análise da Percepção do Uso de Cigarros Eletrônicos no Brasil por meio de Comentários no YouTube. InBrazilian Symposium on Multimedia and the Web

work page 2024

[15] [15]

Jingcheng Du, Chongliang Luo, Ross Shegog, Jiang Bian, Rachel M Cunningham, Julie A Boom, Gregory A Poland, Yong Chen, and Cui Tao. 2020. Use of deep learning to analyze social media discussions about the human papillomavirus vaccine.JAMA Netw. Open.3, 11 (2020), e2022025

work page 2020

[16] [16]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv e-prints(2024), arXiv–2407

work page 2024

[17] [17]

Alessandra Fallucca, Walter Priano, Alessandro Carubia, Patrizia Ferro, Vincenzo Pisciotta, Alessandra Casuccio, and Vincenzo Restivo. 2024. Effectiveness of Catch-Up Vaccination Interventions Versus Standard or Usual Care Procedures in Increasing Adherence to Recommended Vaccinations Among Different Age Groups: Systematic Review and Meta-Analysis of Rand...

work page 2024

[18] [18]

Medina Ferreira, Ana Paula Couto da Silva, and Fabricio Murai

Rafael S. Medina Ferreira, Ana Paula Couto da Silva, and Fabricio Murai. 2022. Risk Perception and Misinformation in Brazilian Twitter during COVID-19 Infodemic. InIEEE SocialCom

work page 2022

[19] [19]

Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull.76, 5 (1971), 378

work page 1971

[20] [20]

Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The comparison of proportions from several independent samples.Statistical methods for rates and proportions(1981)

work page 1981

[21] [21]

Guilherme Fonseca, Washington Cunha, Gabriel Prenassi, Marcos André Gonçalves, and Leonardo Chaves Dutra Da Rocha. 2025. Instance-selection- inspired undersampling strategies for bias reduction in small and large language models for binary text classification. InACL. 9323–9340

work page 2025

[22] [22]

Da Fonseca, Carlos Henrique Gomes Ferreira, and Julio Ce- sar Soares Dos Reis

Luis Guilherme G. Da Fonseca, Carlos Henrique Gomes Ferreira, and Julio Ce- sar Soares Dos Reis. 2024. The Role of News Source Certification in Shaping Tweet Content: Textual and Dissemination Patterns in Brazil’s 2022 Elections. In Brazilian Symp. on Inform. Syst.1–10

work page 2024

[23] [23]

2011.Entropy and information theory

Robert M Gray. 2011.Entropy and information theory. Springer

work page 2011

[24] [24]

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. 2024. A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE TPAMI46, 12 (2024), 9052–9071

work page 2024

[25] [25]

Liang-Chin Huang, Amanda L Eiden, Long He, Augustine Annan, Siwei Wang, Jingqi Wang, Frank J Manion, Xiaoyan Wang, Jingcheng Du, Lixia Yao, et al

work page

[26] [26]

Inform.12, 1 (2024), e57164

Natural Language Processing–Powered Real-Time Monitoring Solution for Vaccine Sentiments and Hesitancy on Social Media: System Development and Validation.JMIR Med. Inform.12, 1 (2024), e57164

work page 2024

[27] [27]

Juwon Hwang, Min-Hsin Su, Xiaoya Jiang, Ruixue Lian, Arina Tveleneva, and Dhavan Shah. 2022. Vaccine discourse during the onset of the COVID-19 pan- demic: Topical structure and source patterns informing efforts to combat vaccine hesitancy.Plos one17, 7 (2022), e0271394

work page 2022

[28] [28]

IBOPE. 2023. Video Audience Share Percentage in Brazil. https:// kantaribopemedia.com/conteudo/relatorios/april-2023/

work page 2023

[29] [29]

Florian Kunneman, Mattijs Lambooij, Albert Wong, Antal van den Bosch, and Liesbeth Mollema. 2020. Monitoring stance towards vaccination in twitter mes- sages.BMC Med. Inform. Decis. Mak.20, 1 (2020), 33

work page 2020

[30] [30]

Marin Lahouati, Antoine De Coucy, Jean Sarlangue, and Charles Cazanave. 2020. Spread of vaccine hesitancy in France: What about YouTube™?Vaccine38, 36 (2020), 5779–5782

work page 2020

[31] [31]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data.Biometrics(1977), 159–174

work page 1977

[32] [32]

Heidi J Larson. 2022. Defining and measuring vaccine hesitancy.Nat. Hum. Behav.6, 12 (2022), 1609–1610

work page 2022

[33] [33]

Marcelo Sartori Locatelli, Josemar Caetano, Wagner Meira Jr, and Virgilio Almeida. 2022. Characterizing vaccination movements on YouTube in the United States and Brazil. InACM HT

work page 2022

[34] [34]

Larissa Malagoli, Júlia Stancioli, Carlos HG Ferreira, Marisa Vasconcelos, Ana Paula Couto da Silva, and Jussara Almeida. 2021. Caracterizaçao do debate no twitter sobre a vacinaçao contra a covid-19 no brasil. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 55–66

work page 2021

[35] [35]

Larissa G Malagoli, Julia Stancioli, Carlos HG Ferreira, Marisa Vasconcelos, Ana Paula Couto da Silva, and Jussara M Almeida. 2021. A look into covid- 19 vaccination debate on twitter. InProceedings of the 13th ACM Web Science Conference 2021. 225–233

work page 2021

[36] [36]

Chad A Melton, Olufunto A Olusanya, Nariman Ammar, and Arash Shaban- Nejad. 2021. Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: A call to action for strengthening vaccine confidence.J. Infect. Public Health(2021)

work page 2021

[37] [37]

Chad A Melton, Brianna M White, Robert L Davis, Robert A Bednarczyk, and Arash Shaban-Nejad. 2022. Fine-tuned sentiment analysis of covid-19 vaccine– related social media data: Comparative study.JMIR24, 10 (2022), e40408

work page 2022

[38] [38]

Tamar Mitts, Nilima Pisharody, and Jacob Shapiro. 2022. Removal of anti-vaccine content impacts social media discourse. InACM WebSci. 319–326

work page 2022

[39] [39]

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, and Kazutoshi Sasahara. 2024. The impact of toxic trolling comments on anti-vaccine YouTube videos.Sci. Rep.14, 1 (2024), 5088

work page 2024

[40] [40]

Bjarke Mønsted and Sune Lehmann. 2022. Characterizing polarization in online vaccine discourse—A large-scale study.PloS one17, 2 (2022), e0263746

work page 2022

[41] [41]

Gabriel P Nobre, Carlos HG Ferreira, and Jussara M Almeida. 2022. More of the same? a study of images shared on mastodon’s federated timeline. InInternational Conference on Social Informatics. Springer, 181–195

work page 2022

[42] [42]

Geovana S Oliveira, João Pedro Lobo, Otávio Venâncio, Vinícius da F Vieira, Jussara M Almeida, Ana PC Silva, Ronan S Ferreira, and Carlos HG Ferreira

work page

[43] [43]

A Network-Driven Framework for Bidimensional Analysis of Information Dissemination on Social Media Platforms.Journal on Interactive Systems16, 1 (2025), 773–794

work page 2025

[44] [44]

Geovana S Oliveira, Otávio Venâncio, Vinícius Vieira, Jussara Almeida, Ana PC Silva, Ronan Ferreira, and Carlos HG Ferreira. 2024. Um framework para análise bidimensional de disseminação de informações em plataformas de mídias sociais. InBrazilian Symposium on Multimedia and the Web (WebMedia). SBC, 301–309

work page 2024

[45] [45]

World Health Organization et al. 2022. Behavioural and social drivers of vaccina- tion: tools and practical guidance for achieving high uptake. (2022)

work page 2022

[46] [46]

Yang Pan, Quanyi Wang, Peng Yang, Li Zhang, Shuangsheng Wu, Yi Zhang, Ying Sun, Wei Duan, Chunna Ma, Man Zhang, et al. 2017. Influenza vaccination in preventing outbreaks in schools: A long-term ecological overview.Vaccine35, 51 (2017), 7133–7138

work page 2017

[47] [47]

Jadher Pércio, Eder Gatti Fernandes, Ethel Leonor Maciel, and Nísia Verônica Trindade de Lima. 2023. 50 years of the Brazilian National Immunization Program and the Immunization Agenda 2030.Epidemiologia e Serviços de Saúde32 (2023), e20231009

work page 2023

[48] [48]

Miftahul Qorib, Timothy Oladunni, Max Denis, Esther Ososanya, and Paul Cotae

work page

[49] [49]

Appl.212 (2023), 118715

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.Expert Syst. Appl.212 (2023), 118715

work page 2023

[50] [50]

Guilherme O Santos, Lucas S Vieira, Giulio Rossetti, Carlos HG Ferreira, and Gladston JP Moreira. 2025. A high-performance evolutionary multiobjective community detection algorithm.Social Network Analysis and Mining15, 1 (2025), 110

work page 2025

[51] [51]

Romy Sauvayre, Jessica Vernier, and Cédric Chauvière. 2022. An analysis of French-language tweets about COVID-19 vaccines: Supervised learning approach. JMIR Med. Inform.10, 5 (2022), e37831

work page 2022

[52] [52]

Brener Santos Silva, Eliete Albano de Azevedo Guimarães, Valéria Conceição de Oliveira, Ricardo Bezerra Cavalcante, Marta Macedo Kerr Pinheiro, Tarcísio Laerte Gontijo, Samuel Barroso Rodrigues, Ana Paula Ferreira, Humberto Ferreira de Oliveira Quites, and Ione Carvalho Pinto. 2020. National immunization program information system: implementation context ...

work page 2020

[53] [53]

Melodie Yun-Ju Song and Anatoliy Gruzd. 2017. Examining sentiments and popularity of pro-and anti-vaccination videos on YouTube. InSocial Media + Society. 1–8

work page 2017

[54] [54]

Nadiya Straton. 2023. COVID vaccine stigma: detecting stigma across social media platforms with computational model based on deep learning.Appl. Intell. 53, 13 (2023), 16398–16423

work page 2023

[55] [55]

Fahim K Sufi, Imran Razzak, and Ibrahim Khalil. 2022. Tracking anti-vax social movement using AI-based social media monitoring.IEEE-TTS3, 4 (2022), 290– 299

work page 2022

[56] [56]

Richardy R Tanure, Aline M Dias, Lucas A Camelo, Jussara Almeida, Helen CSC Lima, and Carlos HG Ferreira. 2025. Caracterização do debate online sobre cigarro eletrônico no Brasil: Uma análise de tópicos de discussão no YouTube. InBrazilian Workshop on Social Network Analysis and Mining (BraSNAM). SBC, 54–64

work page 2025

[57] [57]

Dayane Fumiyo Tokojima Machado, Alexandre Fioravante de Siqueira, and Leda Gitahy. 2020. Natural stings: Selling distrust about vaccines on Brazilian YouTube. Front. Comm.5 (2020), 577941

work page 2020

[58] [58]

Jia Xue, Junxiang Chen, Ran Hu, Chen Chen, Chengda Zheng, Yue Su, and Tingshao Zhu. 2020. Twitter discussions and emotions about the COVID-19 pandemic: Machine learning approach.JMIR22, 11 (2020), e20550

work page 2020

[59] [59]

Sihong Zhao, Simeng Hu, Xiaoyu Zhou, Suhang Song, Qian Wang, Hongqiu Zheng, Ying Zhang, Zhiyuan Hou, et al. 2023. The prevalence, features, influ- encing factors, and solutions for COVID-19 vaccine misinformation: systematic review.JPHS9, 1 (2023), e40201

work page 2023

[60] [60]

Paola Zola, Costantino Ragno, and Paulo Cortez. 2020. A Google Trends spatial clustering approach for a worldwide Twitter user geolocation.IPM57, 6 (2020), 102312

work page 2020