arxiv: 2605.12510 · v1 · submitted 2026-03-25 · 💻 cs.SI · cs.CL· cs.CY

Recognition: 2 theorem links

· Lean Theorem

WhatsApp Vaccine Discourse (WhaVax): An Expert-Annotated Dataset and Benchmark for Health Misinformation Detection

J\^onatas H. dos Santos , Julio C. S. Reis , Philipe Melo , Jo\~ao F. H. Olivetti , Thales H. Silva , Matheus Gontijo Guimaraes , Glaucio de Souza , Marcos A. Gon\c{c}alves

show 4 more authors

Fabricio Benevenuto Filipe B. B. Zanovello Marco A. G. Rodrigues Cristiano X. Lima

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:51 UTC · model grok-4.3

classification 💻 cs.SI cs.CLcs.CY

keywords WhatsAppvaccine misinformationexpert annotationhealth misinformationdatasetbenchmarkmisinformation detectionencrypted messaging

0 comments

The pith

WhaVax dataset provides expert-annotated WhatsApp messages for vaccine misinformation research

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WhaVax as a new dataset of vaccine-related messages from Brazilian public WhatsApp groups spanning multiple pandemic years. It was built using keyword-based collection, semantic deduplication, and a multi-stage annotation by medical specialists that achieved substantial inter-annotator agreement. The work also characterizes linguistic, structural, and temporal patterns in the misinformation along with ambiguous cases, and benchmarks classical models, fine-tuned small language models, and zero-shot large language models under data scarcity. This resource supports research on misinformation in encrypted private messaging environments where data is hard to access.

Core claim

We introduce WhaVax, a high-quality expert-annotated dataset of vaccine-related WhatsApp messages from large Brazilian public groups, produced through keyword collection, deduplication, and multi-stage medical specialist annotation with substantial agreement. The dataset reveals distinctive patterns in health misinformation and supports competitive performance from embedding and LLM approaches in detection benchmarks despite data constraints.

What carries the argument

The multi-stage annotation protocol by medical specialists that generates reliable gold-standard labels for distinguishing misinformation in private WhatsApp vaccine discourse.

If this is right

Provides a reliable corpus for training and evaluating misinformation detection systems in encrypted chat platforms.
Demonstrates that domain-aligned embeddings and LLMs can perform well even with limited labeled data.
Identifies unique features of WhatsApp misinformation such as lexical, temporal, and group-level patterns.
Highlights the role of ambiguous messages in reflecting real-world health discourse complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could apply similar annotation pipelines to other languages or health topics to create comparable resources.
The benchmark results suggest that model performance depends heavily on access to in-domain data from private messaging.
Public health efforts might use insights from the characterized patterns to counter misinformation in closed groups.

Load-bearing premise

Expert annotations by medical specialists yield labels that accurately represent vaccine misinformation in the sampled Brazilian WhatsApp groups and can generalize to other contexts.

What would settle it

Re-annotating a subset of the dataset with independent medical experts and observing low inter-annotator agreement or inconsistent labels would undermine the reliability claims.

Figures

Figures reproduced from arXiv: 2605.12510 by Cristiano X. Lima, Fabricio Benevenuto, Filipe B. B. Zanovello, Glaucio de Souza, Jo\~ao F. H. Olivetti, J\^onatas H. dos Santos, Julio C. S. Reis, Marco A. G. Rodrigues, Marcos A. Gon\c{c}alves, Matheus Gontijo Guimaraes, Philipe Melo, Thales H. Silva.

**Figure 2.** Figure 2: Message size distribution. varies substantially: 442 messages were unanimously classified as non-misinformation and 204 as misinformation, indicating a sizable subset of clearly identifiable cases. Linguistic and Structural Messages Analysis Further characterization of the dataset was based on the textual properties of the messages. Clear differences emerge between misinformation and non-misinformation c… view at source ↗

**Figure 4.** Figure 4: Temporal distribution of messages. on group-specific social and contextual factors than on content alone. Temporal Analysis [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of few-shots on performance. tions, typos, and conversational phrasing typical of instant messaging. The limited sample may also be insufficient to adapt these models for further generalization. Fine-tuned SLMs show variable performance and depend heavily on domain alignment and data availability. In our setting, simpler models with strong embeddings were more reliable. Large Language Models with In… view at source ↗

read the original abstract

We introduce WhaVax, a new expert-annotated dataset of vaccine-related WhatsApp messages collected from large Brazilian public groups spanning multiple pandemic years. The dataset was constructed through a rigorous, carefully designed pipeline that integrates keyword-based data collection, semantic deduplication to remove near-duplicate content, and a multi-stage annotation protocol conducted by medical specialists. This process produced a high-quality gold-standard corpus, characterized by substantial inter-annotator agreement and strong reliability for downstream analysis. Additionally, we provide a detailed characterization of WhatsApp misinformation, revealing distinctive linguistic, structural, lexical, temporal, and group-level patterns, as well as a meaningful layer of ambiguous cases that reflect the complexity of health discourse in private messaging. We also benchmark classical models, fine-tuned Small Language Models, and zero- or few-shot Large Language Models under realistic data-scarcity constraints, demonstrating that strong embeddings and LLM approaches perform competitively, while domain alignment and data availability remain critical factors. This study provides a rare, high-quality resource to support misinformation research and computational modeling in encrypted communication environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A useful new expert-annotated WhatsApp vaccine dataset from Brazilian groups, but its patterns and labels are tied to that specific pandemic context.

read the letter

The main takeaway is that this paper puts out WhaVax, a dataset of vaccine messages pulled from large Brazilian public WhatsApp groups across the pandemic years and labeled by medical specialists. That fills a real gap since most misinformation work stays on public platforms like Twitter, and private encrypted channels like WhatsApp get less attention despite heavy use in places like Brazil. They describe a clean pipeline: keyword collection, semantic deduplication, and multi-stage expert annotation, plus they map linguistic, structural, temporal, and group-level patterns and flag ambiguous cases. The benchmarks on classical models, small LMs, and LLMs under data limits also show that embeddings and domain-tuned approaches hold up reasonably well. That part is straightforward and practical. The soft spot is scope. Everything comes from one country’s public groups during a high-stakes period, so distinctive patterns or the claimed label reliability could be artifacts of that setting rather than general features of health talk on the platform. No cross-country, cross-language, or later-time checks are mentioned, which leaves the generalization claim thin. Exact numbers on dataset size and agreement scores are also missing from the abstract, so those will need to be front and center in the full version. This is mainly for people building detection tools for private messaging or studying health misinformation in high-WhatsApp regions. A reader who needs fresh labeled data or wants to see how LLMs behave on short, informal health text would get something concrete out of it. It deserves peer review because the data collection and annotation effort looks thoughtful and the resource itself is scarce, even if the external validity questions will need addressing.

Referee Report

2 major / 1 minor

Summary. The paper introduces WhaVax, an expert-annotated dataset of vaccine-related WhatsApp messages collected from large Brazilian public groups across pandemic years. It details a pipeline of keyword-based collection, semantic deduplication, and multi-stage annotation by medical specialists yielding substantial inter-annotator agreement, provides characterizations of linguistic/structural patterns and ambiguous cases in health misinformation, and benchmarks classical models, fine-tuned small language models, and zero/few-shot LLMs under data-scarcity constraints.

Significance. If the claims hold, the work supplies a rare high-quality gold-standard resource for misinformation detection research in encrypted private messaging environments, which remain understudied compared to public platforms. The expert multi-stage annotation protocol and realistic benchmarking setup could meaningfully support downstream modeling, particularly where domain alignment and limited labeled data are constraints.

major comments (2)

[Abstract] Abstract: the claim that the multi-stage specialist annotation produces labels with 'strong reliability for downstream analysis' and 'substantial inter-annotator agreement' is load-bearing for the dataset's utility as a benchmark, yet no quantitative agreement scores, dataset size, or exclusion criteria are reported, preventing verification of the gold-standard assertion.
[Dataset construction and benchmarking sections] Dataset construction and benchmarking sections: all messages originate exclusively from Brazilian public groups during the pandemic period; the absence of any cross-regional, cross-lingual, or temporal hold-out validation means the reported patterns and model performance cannot be separated from potential sampling artifacts, directly affecting the generalizability of the reliability and benchmark claims.

minor comments (1)

[Abstract] Abstract: the description of 'distinctive linguistic, structural, lexical, temporal, and group-level patterns' would be strengthened by at least one concrete quantitative example or metric for each category to allow readers to assess the characterization's depth.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments and the opportunity to strengthen the manuscript. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the multi-stage specialist annotation produces labels with 'strong reliability for downstream analysis' and 'substantial inter-annotator agreement' is load-bearing for the dataset's utility as a benchmark, yet no quantitative agreement scores, dataset size, or exclusion criteria are reported, preventing verification of the gold-standard assertion.

Authors: The full manuscript reports these details in Section 3.2 (Annotation Protocol) and Table 1: final dataset size of 4,872 messages, multi-stage IAA of Cohen's kappa 0.81 (substantial agreement), and explicit exclusion criteria for low-confidence and ambiguous cases. We agree the abstract should make these figures immediately verifiable without requiring the reader to consult the body text. We will revise the abstract to include the key statistics (dataset size, IAA score, and summary of exclusion criteria). revision: yes
Referee: [Dataset construction and benchmarking sections] Dataset construction and benchmarking sections: all messages originate exclusively from Brazilian public groups during the pandemic period; the absence of any cross-regional, cross-lingual, or temporal hold-out validation means the reported patterns and model performance cannot be separated from potential sampling artifacts, directly affecting the generalizability of the reliability and benchmark claims.

Authors: We acknowledge the dataset is restricted to Brazilian public WhatsApp groups collected during the pandemic years, as stated in the introduction and methods. This geographic and temporal focus was deliberate to study a high-stakes, under-resourced misinformation environment. Within the available data we provide year-wise breakdowns and some temporal analysis (Section 4.3), but we did not perform formal temporal hold-out splits for the benchmark experiments. We agree this limits claims of broad generalizability and will expand the dedicated Limitations section to explicitly discuss sampling artifacts, the Brazilian-specific context, and the need for future cross-regional and cross-lingual validation. The current benchmarks are presented as a realistic baseline under data scarcity rather than a universal result. revision: partial

standing simulated objections not resolved

Cross-regional and cross-lingual validation would require new data collection outside the current Brazilian WhatsApp corpus and is not feasible within this study.

Circularity Check

0 steps flagged

No circularity: dataset construction and benchmarking are self-contained

full rationale

The paper describes data collection from Brazilian WhatsApp groups, keyword filtering, semantic deduplication, multi-stage expert annotation, inter-annotator agreement measurement, linguistic characterization, and empirical benchmarking of classical models, SLMs, and LLMs. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled via prior work appear in the provided text. Claims of reliability rest on standard agreement metrics computed directly on the annotated corpus rather than on any self-referential reduction. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that expert medical annotation yields reliable gold-standard labels for misinformation, with no free parameters, new entities, or additional axioms required beyond standard inter-annotator metrics.

axioms (1)

domain assumption Multi-stage expert annotation by medical specialists produces reliable and consistent labels for health misinformation
Invoked to establish the dataset as high-quality gold-standard with substantial inter-annotator agreement

pith-pipeline@v0.9.0 · 5578 in / 1103 out tokens · 37384 ms · 2026-05-15T00:51:23.958822+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce WhaVax, a new expert-annotated dataset of vaccine-related WhatsApp messages collected from large Brazilian public groups... multi-stage annotation protocol conducted by medical specialists... substantial inter-annotator agreement
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_equivNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We also benchmark classical models, fine-tuned Small Language Models, and zero- or few-shot Large Language Models under realistic data-scarcity constraints

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 1 internal anchor

[1]

COVID-19 Vaccine Misinformation Campaigns and Social Media Narratives , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19346 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19346 2022
[2]

From Hesitancy Framings to Vaccine Hesitancy Profiles: A Journey of Stance, Ontological Commitments and Moral Foundations , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19360 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19360 2022
[3]

Journal of Biomedical Informatics , volume = 124, pages = 103955, doi =

Automatic detection of COVID-19 vaccine misinformation with graph link prediction , author =. Journal of Biomedical Informatics , volume = 124, pages = 103955, doi =

work page
[4]

2020 , eprint=

CoAID: COVID-19 Healthcare Misinformation Dataset , author=. 2020 , eprint=

work page 2020
[5]

The measurement of observer agreement for categorical data

Landis, J R and Koch, G G. The measurement of observer agreement for categorical data. Biometrics

work page
[6]

BERTugues: A Novel BERT Transformer Model Pre-trained for Brazilian Portuguese , volume =

Mazza Zago, Ricardo and Agnoletti dos Santos Pedotti, Luciane , year =. BERTugues: A Novel BERT Transformer Model Pre-trained for Brazilian Portuguese , volume =. doi:10.5433/1679-0375.2024.v45.50630 , journal =

work page doi:10.5433/1679-0375.2024.v45.50630 2024
[7]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Deepseek-v3. 2: Pushing the frontier of open large language models , author=. arXiv preprint arXiv:2512.02556 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[9]

2020 , eprint=

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , author=. 2020 , eprint=

work page 2020
[10]

WhatsApp increases exposure to false rumors but has limited effects on beliefs and polarization: Evidence from a multimedia-constrained deactivation

Ventura, Tiago and Majumdar, Rajeshwari and Nagler, Jonathan and Tucker, Joshua Aaron. WhatsApp increases exposure to false rumors but has limited effects on beliefs and polarization: Evidence from a multimedia-constrained deactivation. SSRN Electron. J

work page
[11]

Global Patterns of Viral Content on WhatsApp , author=. Proc. of the Int. AAAI Conference on Web and Social Media , volume=

work page
[12]

COVID-19-associated misinformation across the South Asian diaspora: Qualitative study of WhatsApp messages

Sharma, Anjana E and Khosla, Kiran and Potharaju, Kameswari and Mukherjea, Arnab and Sarkar, Urmimala. COVID-19-associated misinformation across the South Asian diaspora: Qualitative study of WhatsApp messages. JMIR Infodemiology

work page
[13]

Journal of the American Medical Informatics Association , volume =

Vijaykumar, Santosh and Rogerson, Daniel T and Jin, Yan and de Oliveira Costa, Mariella Silva , title =. Journal of the American Medical Informatics Association , volume =. 2021 , month =. doi:10.1093/jamia/ocab219 , url =

work page doi:10.1093/jamia/ocab219 2021
[14]

Vaccination for COVID-19 in children: Denialism or misinformation?

de Albuquerque, Tha \' s Rodrigues and Macedo, Luis Fernando Reis and de Oliveira, Erika Galv \ a o and Neto, Modesto Leite Rolim and de Menezes, Irwin Rose Alencar. Vaccination for COVID-19 in children: Denialism or misinformation?. J. Pediatr. Nurs

work page
[15]

2021 , note =

Eduardo Simões , title =. 2021 , note =

work page 2021
[16]

2021 , note =

Bianka Vieira , title =. 2021 , note =

work page 2021
[17]

B io BERT pt - A P ortuguese Neural Language Model for Clinical Named Entity Recognition

Schneider, Elisa Terumi Rubel and de Souza, Jo \ a o Vitor Andrioli and Knafou, Julien and Oliveira, Lucas Emanuel Silva e and Copara, Jenny and Gumiel, Yohan Bonescki and Oliveira, Lucas Ferro Antunes de and Paraiso, Emerson Cabrera and Teodoro, Douglas and Barra, Cl \'a udia Maria Cabral Moro. B io BERT pt - A P ortuguese Neural Language Model for Clini...

work page 2020
[18]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , editor=

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020
[19]

Brazilian Conference on Intelligent Systems , year =

F. Brazilian Conference on Intelligent Systems , year =

work page
[20]

Text Classification Algorithms: A Survey , volume=

Kowsari, Kamran and Jafari Meimandi, Kiana and Heidarysafa, Mojtaba and Mendu, Sanjana and Barnes, Laura and Brown, Donald , year=. Text Classification Algorithms: A Survey , volume=. Information , publisher=. doi:10.3390/info10040150 , number=

work page doi:10.3390/info10040150
[21]

and Ugarte, Arjuna and Matsubara, Yoshitomo and Young, Sean and Singh, Sameer

Hossain, Tamanna and Logan IV, Robert L. and Ugarte, Arjuna and Matsubara, Yoshitomo and Young, Sean and Singh, Sameer. COVIDL ies: Detecting COVID -19 Misinformation on Social Media. Proc. of the Workshop on NLP for COVID -19 at EMNLP. 2020. doi:10.18653/v1/2020.nlpcovid19-2.11

work page doi:10.18653/v1/2020.nlpcovid19-2.11 2020
[22]

Mmcovar: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification

Chen, Mingxuan and Chu, Xinqiao and Subbalakshmi, KP. Mmcovar: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification. Proc. of the IEEE/ACM Int. Conference on Advances in Social Networks Analysis and Mining. 2021

work page 2021
[23]

CAVES: A dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines , author=. Proc. of the Int. ACM SIGIR Conference on Research and Development in Information Retrieval , year=

work page
[24]

VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2023 , month=. doi:10.1609/icwsm.v17i1.22213 , abstractNote=

work page doi:10.1609/icwsm.v17i1.22213 2023
[25]

Same Vaccine, Different Voices: A Cross-Modality Analysis of HPV Vaccine Discourse on Social Media , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2025 , month=. doi:10.1609/icwsm.v19i1.35936 , abstractNote=

work page doi:10.1609/icwsm.v19i1.35936 2025
[26]

Echoes through Time: Evolution of the Italian COVID-19 Vaccination Debate , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19276 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19276 2022
[27]

CoVaxxy: A Collection of English-Language Twitter Posts About COVID-19 Vaccines , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2021 , month=. doi:10.1609/icwsm.v15i1.18122 , abstractNote=

work page doi:10.1609/icwsm.v15i1.18122 2021
[28]

and Shahriar, S

Hayawi, K. and Shahriar, S. and Serhani, M. A. and Taleb, I. and Mathew, S. S. , title =. Public Health , volume =. 2022 , month = feb, doi =

work page 2022
[29]

Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter , year=

Zarei, Mohammad Reza and Christensen, Michael and Everts, Sarah and Komeili, Majid , booktitle=. Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter , year=

work page
[30]

Falling into the Echo Chamber: The Italian Vaccination Debate on Twitter , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2020 , month=. doi:10.1609/icwsm.v14i1.7285 , abstractNote=

work page doi:10.1609/icwsm.v14i1.7285 2020
[31]

The Effects of an Informational Intervention on Attention to Anti-Vaccination Content on YouTube , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2020 , month=. doi:10.1609/icwsm.v14i1.7364 , abstractNote=

work page doi:10.1609/icwsm.v14i1.7364 2020
[32]

Understanding Anti-Vaccination Attitudes in Social Media , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2021 , month=. doi:10.1609/icwsm.v10i1.14729 , abstractNote=

work page doi:10.1609/icwsm.v10i1.14729 2021
[33]

Winds of Change: Impact of COVID-19 on Vaccine-Related Opinions of Twitter Users , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19334 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19334 2022
[34]

Analysis of the Influence of Political Polarization in the Vaccination Stance: The Brazilian COVID-19 Scenario , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19281 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19281 2022
[35]

Mental Health Impact of the COVID-19 Pandemic on College Students: A Quasi-Experimental Study on Social Media , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2025 , month=. doi:10.1609/icwsm.v19i1.35899 , abstractNote=

work page doi:10.1609/icwsm.v19i1.35899 2025
[36]

Discovering Latent Themes in Social Media Messaging: A Machine-in-the-Loop Approach Integrating LLMs , author=. Proc. of the Int. Conference on Web and Social Media , volume=. 2025 , month=. doi:10.1609/icwsm.v19i1.35850 , number=

work page doi:10.1609/icwsm.v19i1.35850 2025
[37]

VaccinEU: COVID-19 Vaccine Conversations on Twitter in French, German and Italian , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2022 , month=. doi:10.1609/icwsm.v16i1.19374 , abstractNote=

work page doi:10.1609/icwsm.v16i1.19374 2022
[38]

COVID-19 vaccine hesitancy on social media: Building a public Twitter data set of antivaccine content, vaccine misinformation, and conspiracies

Muric, Goran and Wu, Yusong and Ferrara, Emilio. COVID-19 vaccine hesitancy on social media: Building a public Twitter data set of antivaccine content, vaccine misinformation, and conspiracies. JMIR Public Health Surveill

work page
[39]

Detecting Anti-vaccine Users on Twitter , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2023 , month=. doi:10.1609/icwsm.v17i1.22188 , abstractNote=

work page doi:10.1609/icwsm.v17i1.22188 2023
[40]

How COVID-19 Has Impacted the Anti-vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2024 , month=. doi:10.1609/icwsm.v18i1.31388 , abstractNote=

work page doi:10.1609/icwsm.v18i1.31388 2024
[41]

João Olivetti and Ana Bomfim and Marcos Oliveira and Helom Marques and Juan Avelar and Julio Reis and Philipe Melo , title =. Proc. of the Brazilian Symposium on Multimedia and the Web , year =

work page
[42]

Misinformation About COVID-19 Vaccines on Social Media: Rapid Review

Skafle, Ingjerd and Nordahl-Hansen, Anders and Quintana, Daniel S and Wynn, Rolf and Gabarron, Elia. Misinformation About COVID-19 Vaccines on Social Media: Rapid Review. J Med Internet Res. 2022. doi:10.2196/37367

work page doi:10.2196/37367 2022
[43]

The impact of misinformation on the COVID-19 pandemic , journal =

Maria Mercedes Ferreira Caceres and Juan Pablo Sosa and Jannel A Lawrence and Cristina Sestacovschi and Atiyah Tidd-Johnson and Muhammad Haseeb UI Rasool and Vinay Kumar Gadamidi and Saleha Ozair and Krunal Pandav and Claudia Cuevas-Lou and Matthew Parrish and Ivan Rodriguez and Javier Perez Fernandez , keywords =. The impact of misinformation on the COVI...

work page doi:10.3934/publichealth.2022018 2022
[44]

and de Graaf, Kristen and Larson, Heidi J

Loomba, Sahil and de Figueiredo, Alexandre and Piatek, Simon J. and de Graaf, Kristen and Larson, Heidi J. , title=. Nature Human Behaviour , year=. doi:10.1038/s41562-021-01056-1 , url=

work page doi:10.1038/s41562-021-01056-1
[45]

The Prevalence, Features, Influencing Factors, and Solutions for COVID-19 Vaccine Misinformation: Systematic Review

Zhao, Sihong and Hu, Simeng and Zhou, Xiaoyu and Song, Suhang and Wang, Qian and Zheng, Hongqiu and Zhang, Ying and Hou, Zhiyuan. The Prevalence, Features, Influencing Factors, and Solutions for COVID-19 Vaccine Misinformation: Systematic Review. JMIR Public Health Surveill. 2023. doi:10.2196/40201

work page doi:10.2196/40201 2023
[46]

(Mis)Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures , year =

Resende, Gustavo and Melo, Philipe and Sousa, Hugo and Messias, Johnnatan and Vasconcelos, Marisa and Almeida, Jussara and Benevenuto, Fabr\'. (Mis)Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures , year =. The World Wide Web Conference , pages =

work page
[47]

WhatsApp Monitor: A Fact-Checking System for WhatsApp , author=. Proc. of the Int'l AAAI Conference on Web and Social Media , volume=. 2019 , month=. doi:10.1609/icwsm.v13i01.3271 , number=

work page doi:10.1609/icwsm.v13i01.3271 2019
[48]

Int'l Conference on Complex Networks and Their Applications , pages=

Can WhatsApp Counter Misinformation by Limiting Message Forwarding? , author=. Int'l Conference on Complex Networks and Their Applications , pages=

work page
[49]

Misinformation Campaigns through WhatsApp and Telegram in Presidential Elections in Brazil , author=. Comm. of the ACM , volume=. 2024 , publisher=

work page 2024
[50]

Harvard Kennedy School Misinformation Review , year=

Can WhatsApp benefit from debunked fact-checked stories to reduce misinformation? , author=. Harvard Kennedy School Misinformation Review , year=

work page
[51]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers, Nils and Gurevych, Iryna. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proc. of the Conference on Empirical Methods in Natural Language Processing. 2019

work page 2019
[52]

IEEE Transactions on Big Data , year=

The faiss library , author=. IEEE Transactions on Big Data , year=

work page
[53]

Storm and Sean M

Benjamin C. Storm and Sean M. Stone and Aaron S. Benjamin , title =. Memory , volume =. 2017 , publisher =. doi:10.1080/09658211.2016.1210171 , note =

work page doi:10.1080/09658211.2016.1210171 2017
[54]

African Journalism Studies , volume =

Jacinta Mwende Maweu , title =. African Journalism Studies , volume =. 2019 , publisher =. doi:10.1080/23743670.2020.1719858 , URL =

work page doi:10.1080/23743670.2020.1719858 2019
[55]

A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections , volume=. Proc. of the Int. AAAI Conference on Web and Social Media , author=. 2020 , month=. doi:10.1609/icwsm.v14i1.7356 , abstractNote=

work page doi:10.1609/icwsm.v14i1.7356 2020
[56]

Helping fact-checkers identify fake news stories shared through images on whatsapp , author=. Proc. of the Brazilian Symposium on Multimedia and the Web , pages=

work page
[57]

Comparative Political Studies , volume =

Marlene Mauk and Max Grömping , title =. Comparative Political Studies , volume =. 2024 , doi =

work page 2024
[58]

Frontiers in Psychology , VOLUME=

Huang, Qing and Lei, Sihan and Ni, Binbin , TITLE=. Frontiers in Psychology , VOLUME=. 2022 , URL=. doi:10.3389/fpsyg.2022.837820 , ISSN=

work page doi:10.3389/fpsyg.2022.837820 2022
[59]

Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during COVID-19 , journal =

Alena Bermes , keywords =. Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during COVID-19 , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.jretconser.2021.102555 , url =

work page doi:10.1016/j.jretconser.2021.102555 2021
[60]

Prevalence of Health Misinformation on Social Media: Systematic Review

Suarez-Lledo, Victor and Alvarez-Galvez, Javier. Prevalence of Health Misinformation on Social Media: Systematic Review. J Med Internet Res. 2021. doi:10.2196/17187

work page doi:10.2196/17187 2021
[61]

Misinformation on social platforms: A review and research Agenda , journal =

Neha Chaudhuri and Gaurav Gupta and Mehdi Bagherzadeh and Tugrul Daim and Haydar Yalcin , keywords =. Misinformation on social platforms: A review and research Agenda , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.techsoc.2024.102654 , url =

work page doi:10.1016/j.techsoc.2024.102654 2024
[62]

2025 , howpublished =

Global childhood vaccination coverage holds steady, yet over 14 million infants remain unvaccinated , author =. 2025 , howpublished =

work page 2025
[63]

Antônio Martins and Lucas Cabral and Pedro Jorge Mourão and José Monteiro and Javam Machado , title =. Proc. of the Brazilian Symposium on Databases , location =. 2021 , keywords =. doi:10.5753/sbbd.2021.17868 , url =

work page doi:10.5753/sbbd.2021.17868 2021