pith. sign in

arxiv: 2604.09626 · v1 · submitted 2026-03-18 · 💻 cs.CY · cs.LG· cs.SI

Investigating Vaccine Buyer's Remorse: Post-Vaccination Decision Regret in COVID-19 Social Media Using Politically Diverse Human Annotation

Pith reviewed 2026-05-15 08:25 UTC · model grok-4.3

classification 💻 cs.CY cs.LGcs.SI
keywords vaccine regretCOVID-19 vaccinationsocial media analysisYouTube corpusdecision regretvaccine hesitancyLLM detectionpublic discourse
0
0 comments X

The pith

Vaccine buyer's remorse appears in under 2% of COVID-19 discourse but clusters in skeptic communities via personal health stories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper curates a large YouTube corpus of COVID-19 vaccination discussions and builds a benchmark subset of regret expressions annotated by a politically diverse human panel. Large language models are then used to detect and classify posts showing vaccine buyer's remorse, distinguishing first-person accounts from secondhand ones and cataloging cited reasons such as adverse health events. The central finding is that such remorse remains rare overall yet concentrates in specific communities and relies mostly on direct personal narratives. These patterns matter for public health because they show the actual scale of regret and its dominant triggers rather than assuming broad dissatisfaction.

Core claim

The authors establish that vaccine buyer's remorse appears in only less than 2% of public discourse on COVID-19 vaccination. It is disproportionately concentrated in vaccine-skeptic influencer communities and is predominantly expressed through first-person narratives citing adverse health events. The study also measures differences between personal and vicarious experiences and checks for biases across different LLMs used for detection.

What carries the argument

A curated YouTube news corpus on COVID-19 vaccination paired with politically diverse human annotations that serve as ground truth for LLM-based identification of regret posts.

If this is right

  • Public health messaging can prioritize the small share of cases tied to adverse events instead of assuming widespread regret.
  • First-person stories of health issues should receive specific attention in communication to address the main form of expressed remorse.
  • Quantifying vicarious versus direct regret shows how personal experiences may amplify online compared with shared accounts.
  • Evaluating LLM detection biases on this topic highlights the need for calibration when monitoring politicized health discussions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The annotated dataset could be cross-checked against official adverse event reporting systems to test whether self-reported health issues align with documented outcomes.
  • The same curation and diverse-annotation approach could be applied to study decision regret around other interventions such as treatments or preventive measures.
  • Concentration in particular online communities suggests that broad messaging may be less effective than community-specific outreach.
  • Politically balanced annotation panels offer a practical way to reduce slant when analyzing sensitive or polarized social media topics.

Load-bearing premise

The curated YouTube corpus and LLM-based identification accurately reflect the true prevalence and nature of vaccine regret without major biases from platform selection or model limitations.

What would settle it

A large-scale representative survey of vaccinated individuals that directly asks about regret, its reasons, and personal versus observed experiences would confirm or contradict the social-media prevalence and concentration patterns.

Figures

Figures reproduced from arXiv: 2604.09626 by Ashiqur R. KhudaBukhsh, Ashutosh Kumar, Miles Stanley, Soumyajit Datta.

Figure 1
Figure 1. Figure 1: Percentage of regret comments across news sources and influencer categories. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Distribution of Relationships in Vicarious Regret Comments [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of Regret Reasons by Source Type. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Temporal Distribution of Regret Comments over time across zones [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of Regret Reasons by Narrative Perspective. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt used for the Llama-3.1 "Expert Reasoner" model and zero-shot prompting. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prompt used for the political group-specific models [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Few shot prompt [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Prompt used for the "Reasons for Regret" extraction task. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Prompt used for the "Relationship to Author" extraction task. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
read the original abstract

A significant gap exists in datasets regarding post-COVID-19 vaccination experiences, particularly ``vaccine buyer's remorse''. Understanding the prevalence and nature of vaccine regret, whether based on personal or vicarious experiences, is vital for addressing vaccine hesitancy and refining public health communication. In this paper, we curate a novel dataset from a large YouTube news corpus capturing COVID-19 vaccination experiences, and construct a benchmark subset focused on vaccine regret, annotated by a politically diverse panel to account for the subjective and often politicized nature of the topic. We utilize large language models (LLMs) to identify posts expressing vaccine regret, analyze the reasons behind this regret, and quantify its occurrence in both first and second-person accounts. This paper aims to (1) quantify the prevalence of vaccine regret; (2) identify common reasons for this sentiment; (3) analyze differences between first-person and vicarious experiences; and (4) assess potential biases introduced by different LLMs. We find that while vaccine buyer's remorse appears in only $<2\%$ of public discourse, it is disproportionately concentrated in vaccine-skeptic influencer communities and is predominantly expressed through first-person narratives citing adverse health events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper curates a novel YouTube news corpus on COVID-19 vaccination experiences, builds a politically diverse human-annotated benchmark subset for vaccine regret, and applies LLMs to detect regret expressions. It reports that vaccine buyer's remorse occurs in <2% of the discourse, is disproportionately concentrated in vaccine-skeptic influencer communities, and is mostly conveyed via first-person narratives citing adverse health events. The work also examines differences between first- and second-person accounts and potential LLM biases.

Significance. If the core methodological gaps are closed, the study supplies a useful annotated dataset and quantitative evidence on the low overall prevalence yet community-specific concentration of vaccine regret. The politically diverse annotation panel is a clear strength for a politicized topic. The findings could help calibrate public-health messaging, provided the prevalence and concentration claims rest on validated detection.

major comments (3)
  1. [Methods (LLM-based identification and benchmark construction)] The manuscript states that a human-annotated benchmark subset was created but reports neither precision, recall, F1, nor a confusion matrix for the LLM regret classifier on that subset. Because the headline <2% prevalence figure is produced by this classifier, the absence of these metrics makes the absolute prevalence and the 'disproportionate concentration' claims impossible to evaluate.
  2. [Corpus curation and data collection] No sampling frame, channel-selection criteria, or coverage statistics are supplied for the curated YouTube corpus. Without these, it is impossible to determine whether the corpus over-samples high-engagement skeptic channels, which would directly affect both the <2% prevalence estimate and the claim of disproportionate concentration.
  3. [Results (prevalence quantification)] The <2% prevalence result is presented without stating the exact regret-detection threshold, prompt template, or exclusion rules applied to the LLM output. Sensitivity of the headline figure to these choices is therefore unknown.
minor comments (1)
  1. [Abstract] The abstract mentions 'politically diverse human annotation' but does not report inter-annotator agreement statistics (e.g., Fleiss' kappa or pairwise agreement).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that greater methodological transparency is required and will revise the manuscript to address the concerns about LLM evaluation metrics, corpus documentation, and detection parameters. These changes will strengthen the interpretability of the prevalence and concentration findings without altering the core claims.

read point-by-point responses
  1. Referee: The manuscript states that a human-annotated benchmark subset was created but reports neither precision, recall, F1, nor a confusion matrix for the LLM regret classifier on that subset. Because the headline <2% prevalence figure is produced by this classifier, the absence of these metrics makes the absolute prevalence and the 'disproportionate concentration' claims impossible to evaluate.

    Authors: We agree that performance metrics are essential for validating the classifier. In the revised manuscript we will report precision, recall, F1-score, and a confusion matrix for the LLM regret detector on the human-annotated benchmark subset. This will allow direct assessment of the reliability of the <2% prevalence estimate and the concentration claims. revision: yes

  2. Referee: No sampling frame, channel-selection criteria, or coverage statistics are supplied for the curated YouTube corpus. Without these, it is impossible to determine whether the corpus over-samples high-engagement skeptic channels, which would directly affect both the <2% prevalence estimate and the claim of disproportionate concentration.

    Authors: We acknowledge the omission. The revised methods section will specify the sampling frame, explicit channel-selection criteria (including subscriber thresholds, content focus, and efforts to achieve political diversity), and available coverage statistics. We will also discuss potential selection effects and how they were mitigated. revision: yes

  3. Referee: The <2% prevalence result is presented without stating the exact regret-detection threshold, prompt template, or exclusion rules applied to the LLM output. Sensitivity of the headline figure to these choices is therefore unknown.

    Authors: We will add the exact prompt templates, classification threshold, and exclusion rules to the methods. We will also include a sensitivity analysis showing how the prevalence estimate varies with reasonable changes to these parameters, confirming robustness of the <2% figure. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical data-driven analysis with no derivations or fitted inputs

full rationale

The paper is a self-contained empirical study that curates a YouTube corpus, obtains human annotations from a politically diverse panel on a benchmark subset, and applies LLMs to classify vaccine regret. No equations, parameters, or predictions are derived; prevalence (<2%), concentration in skeptic communities, and first-person narrative patterns are computed directly from the annotated outputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear. The derivation chain consists only of explicit data collection and classification steps that do not reduce to their inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on assumptions about data representativeness and the reliability of LLM and human labeling processes.

free parameters (1)
  • Regret detection threshold
    Implicit threshold used by LLMs to classify posts as expressing regret.
axioms (2)
  • domain assumption YouTube comments represent a valid sample of public discourse on vaccination.
    Used to generalize prevalence to 'public discourse'.
  • domain assumption Politically diverse annotation mitigates bias in subjective labeling.
    Basis for the benchmark subset construction.

pith-pipeline@v0.9.0 · 5540 in / 1106 out tokens · 56250 ms · 2026-05-15T08:25:24.012471+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 7 internal anchors

  1. [1]

    J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, Y . Fan, W. Ge, Y . Han, F. Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023

  2. [2]

    Becerra-Perez, M

    M.-M. Becerra-Perez, M. Menear, S. Turcotte, M. Labrecque, and F. Légaré. More primary care patients regret health decisions if they experienced decisional conflict in the consultation: a secondary analysis of a multicenter descriptive study.BMC Family Practice, 17(1):156, 2016

  3. [3]

    J. C. Brehaut, A. M. O’Connor, T. J. Wood, T. F. Hack, L. Siminoff, E. Gordon, and D. Feldman-Stewart. Validation of a decision regret scale.Medical decision making, 23(4):281–292, 2003. 10 INVESTIGATING V ACCINE BUYER’S REMORSEA PREPRINT

  4. [4]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.NeurIPS, 33:1877–1901, 2020

  5. [5]

    E. K. Brunson. The impact of social networks on parents’ vaccination decisions.Pediatrics, 131(5):e1397–e1404, 2013

  6. [6]

    K. S. Clemens, K. Faasse, W. Tan, B. Colagiuri, L. Colloca, R. Webster, L. Vase, E. Jason, and A. L. Geers. Social communication pathways to COVID-19 vaccine side-effect expectations and experience.Journal of Psychosomatic Research, 164:111081, 2023

  7. [7]

    Crowl, S

    L. Crowl, S. Dutta, A. R. KhudaBukhsh, E. Severnini, and D. S. Nagin. Measuring criticism of the police in the local news media using large language models.Proceedings of the National Academy of Sciences, 122(9):e2418821122, 2025

  8. [8]

    Davidson, D

    T. Davidson, D. Warmsley, M. Macy, and I. Weber. Automated hate speech detection and the problem of offensive language. InProceedings of the international AAAI conference on web and social media, volume 11, pages 512–515, 2017

  9. [9]

    I. J. B. do Nascimento, A. B. Pizarro, J. M. Almeida, N. Azzopardi-Muscat, M. A. Gonçalves, M. Björklund, and D. Novillo-Ortiz. Infodemics and health misinformation: a systematic review of reviews.Bulletin of the World Health Organization, 100(8):544–561, 2022

  10. [10]

    A. J. Dolman, T. Fraser, C. Panagopoulos, D. P. Aldrich, and D. Kim. Opposing views: associations of political polarization, political party affiliation, and social trust with covid-19 vaccination intent and receipt.Journal of Public Health, 45(1):36–39, 2023

  11. [11]

    Dutta, D

    S. Dutta, D. Pandita, T. C. Weerasooriya, M. Zampieri, C. M. Homan, and A. R. KhudaBukhsh. ARTICLE: annotator reliability through in-context learning. InAAAI-25, Association for the Advancement of Artificial Intelligence, pages 14230–14237. AAAI Press, 2025

  12. [12]

    Gao and R

    L. Gao and R. Huang. Detecting online hate speech using context aware models. In R. Mitkov and G. Angelova, editors,RANLP 2017, pages 260–266. INCOMA Ltd., 2017

  13. [13]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  14. [14]

    L. He, S. Omranian, S. McRoy, and K. Zheng. Using large language models for sentiment analysis of health- related social media data: empirical evaluation and practical tips.Journal of the American Medical Informatics Association, 2024. Working paper/Preprint

  15. [15]

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arxiv 2021.arXiv preprint arXiv:2106.09685, 10, 2021

  16. [16]

    GPT-4o System Card

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  17. [17]

    Islam and D

    T. Islam and D. Goldwasser. Understanding covid-19 vaccine campaign on Facebook using minimal supervision. In2022 IEEE-Big Data, pages 585–595. IEEE, 2022

  18. [18]

    Islam and D

    T. Islam and D. Goldwasser. Discovering latent themes in social media messaging: A machine-in-the-loop approach integrating LLMs. InICWSM, volume 19, pages 859–884, 2025

  19. [19]

    Islam and D

    T. Islam and D. Goldwasser. Uncovering latent arguments in social media messaging by employing llms-in-the- loop strategy. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7397–7429, 2025

  20. [20]

    D. Jain, S. Rai, J. Mittal, A. Andy, A. M. Buttenheim, and S. C. Guntuku. Twitter reveals spatio-temporal variation in vaccine concerns in sub-saharan africa.medRxiv, pages 2025–08, 2025

  21. [21]

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de Las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b.ArXiv, abs/2310.06825, 2023

  22. [23]

    A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de Las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mixtral of experts.ArXiv, abs/24...

  23. [24]

    A. R. KhudaBukhsh, R. Sarkar, M. S. Kamlet, and T. Mitchell. We don’t speak the same language: Interpreting polarization through machine translation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14893–14901, 2021

  24. [25]

    A. R. KhudaBukhsh, R. Sarkar, M. S. Kamlet, and T. M. Mitchell. Fringe news networks: Dynamics of US news viewership following the 2020 presidential election. InWebSci ’22: 14th ACM Web Science Conference 2022, pages 269–278. ACM, 2022

  25. [26]

    Y . Li, D. Viswaroopan, W. He, J. Li, X. Zuo, H. Xu, and C. Tao. Enhancing relation extraction for COVID-19 vaccine shot-adverse event associations with large language models.Research Square, 2025. Preprint

  26. [27]

    E. N. Line, S. Jaramillo, M. Goldwater, and Z. Horne. Anecdotes impact medical decisions even when presented with statistical information or decision aids.Cognitive Research: Principles and Implications, 9(1):51, 2024

  27. [28]

    Y . Liu, Y . Wang, A. Sun, X. Meng, J. Li, and J. Guo. A dual-channel framework for sarcasm recognition by detecting sentiment conflict. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1670–1680, 2022

  28. [29]

    The llama 3 herd of models

    Llama Team, AI @ Meta. The llama 3 herd of models. Technical report, Meta, 2024. Technical Report

  29. [30]

    C. Luo, W. Jiang, H.-X. Chen, and T.-H. Tung. Post-vaccination adverse reactions, decision regret, and willingness to pay for the booster dose of COVID-19 vaccine among healthcare workers: A mediation analysis.Human Vaccines & Immunotherapeutics, 18(6):e2146964, 2022

  30. [31]

    Mistral small 3.2 24b instruct (2506)

    MistralAI. Mistral small 3.2 24b instruct (2506). https://huggingface.co/mistralai/Mistral-Small-3. 2-24B-Instruct-2506, 2025

  31. [32]

    Mittal, T

    S. Mittal, T. Chawla, and A. R. KhudaBukhsh. You must be a trump supporter: Political identity projections on the social web. InSocial Networks Analysis and Mining, ASONAM 2024, pages 391–404. Springer, 2024

  32. [33]

    M. L. Pacheco, T. Islam, M. Mahajan, A. Shor, M. Yin, L. Ungar, and D. Goldwasser. A holistic framework for analyzing the covid-19 vaccine debate.arXiv preprint arXiv:2205.01817, 2022

  33. [34]

    M. L. Pacheco, T. Islam, L. Ungar, M. Yin, and D. Goldwasser. Interactive concept learning for uncovering latent themes in large text collections.arXiv preprint arXiv:2305.05094, 2023

  34. [35]

    Pandita, T

    D. Pandita, T. C. Weerasooriya, S. Dutta, S. Luger, T. Ranasinghe, A. R. KhudaBukhsh, M. Zampieri, and C. Homan. Rater cohesion and quality from a vicarious perspective. InFindings of the Association for Computa- tional Linguistics: EMNLP 2024, pages 5149–5162, 2024

  35. [36]

    Pofcher, C

    J. Pofcher, C. M. Homan, R. Sell, and A. R. KhudaBukhsh. Hope vs. hate: Understanding user interactions with lgbtq+ news content in mainstream us news media through the lens of hope speech. InEMNLP 2025, pages 19873–19899, 2025

  36. [37]

    Portelli, S

    B. Portelli, S. Scaboro, R. Tonino, E. Chersoni, E. Santus, and G. Serra. Monitoring user opinions and side effects on COVID-19 vaccines in the twittersphere: Infodemiology study of tweets.Journal of Medical Internet Research, 24(5):e35115, 2022

  37. [38]

    N. K. Sehgal, S. Rai, M. Tonneau, A. K. Agarwal, J. Cappella, M. Kornides, L. Ungar, A. Buttenheim, and S. C. Guntuku. Conversations with ai chatbots increase short-term vaccine intentions but do not outperform standard public health messaging.arXiv preprint arXiv:2504.20519, 2025

  38. [39]

    T. T. Shimabukuro, M. Nguyen, D. Martin, and F. DeStefano. Safety monitoring in the vaccine adverse event reporting system (V AERS).Vaccine, 33(36):4398–4405, 2015

  39. [40]

    D. Sileo. tasksource: A large collection of NLP tasks with a structured dataset preprocessing framework. In LREC-COLING 2024, pages 15655–15684, May 2024

  40. [41]

    Souvatzi, M

    E. Souvatzi, M. Katsikidou, A. Arvaniti, S. Plakias, A. Tsiakiri, and M. Samakouri. Trust in healthcare, medical mistrust, and health outcomes in times of health crisis: A narrative review.Societies, 14(12):269, 2024

  41. [42]

    Tayhan, E

    A. Tayhan, E. B. Tayhan, and D. ¸ S. Büyük. Nursing and midwifery students’ COVID-19 vaccine regrets and future vaccination intentions: A mixed methods study.Nursing & Health Sciences, 27:e70039, 2025

  42. [43]

    G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Ramé, M. Rivière, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

  43. [44]

    Wawrzuta, M

    D. Wawrzuta, M. Jaworski, J. Gotlib, and M. Panczyk. What arguments against covid-19 vaccines run on facebook in poland: content analysis of comments.Vaccines, 9(5):481, 2021

  44. [45]

    T. C. Weerasooriya, S. Dutta, T. Ranasinghe, M. Zampieri, C. Homan, and A. R. KhudaBukhsh. Vicarious offense and noise audit of offensive speech classifiers: Unifying human and machine disagreement on what is offensive. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, pages 11648–11668, 2023. 12 INVESTI...

  45. [46]

    Wiegand, J

    M. Wiegand, J. Ruppenhofer, and T. Kleinbauer. Detection of abusive language: the problem of biased datasets. InNAACL-HLT, pages 602–608, 2019

  46. [47]

    L. Yin, M. Han, and X. Nie. Unlocking blended emotions and underlying drivers: A deep dive into COVID-19 vaccination insights on twitter across digital and physical realms in new york, using ChatGPT.Urban Science, 8(4):222, 2024

  47. [48]

    C. H. Yoo and A. R. KhudaBukhsh. Auditing and robustifying covid-19 misinformation datasets via anticontent sampling. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15260–15268, 2023

  48. [49]

    Zeelenberg and J

    M. Zeelenberg and J. Beattie. Consequences of regret aversion 2: Additional evidence for effects of feedback on decision making.Organizational Behavior and Human Decision Processes, 72(1):63–78, 1997

  49. [50]

    J. Zhou, L. Zhang, M. Li, B. D. Horne, and M. De Choudhury. Ai as we describe it: How large language models and their applications in health are represented across channels of public discourse.arXiv preprint arXiv:2511.03174, 2025

  50. [51]

    Ziems, W

    C. Ziems, W. Held, O. Shaikh, J. Chen, Z. Zhang, and D. Yang. Can large language models transform computational social science?Comput. Linguistics, 50(1):237–291, 2024

  51. [52]

    wishes? \w+ (never|hadn’t)

    D. Zimmermann, A. Klee, and K. Kaspar. Political news on instagram: influencer versus traditional magazine and the role of their expertise in consumers’ credibility perceptions and news engagement.Frontiers in Psychology, 14:1257994, 2023. 7 Supplementary Information A Data Collection and Filtering Details Full List of YouTube Channels Table 6 and Table 7...

  52. [53]

    GoalThe purpose of this project is to carefully read user comments about vaccines and classify them based on three key pieces of information: who the comment is about, their vaccination status, and their feelings about their decision

  53. [54]

    I got the shot and I feel fine

    The Annotation TaskFor each comment you are shown, you will answer a series of up to three questions. Please note that some questions will only appear based on your answer to the previous question. Question 1: Who is the subject of the comment?This question asks you to identify the main person or group being discussed in the comment. •self:The author of t...

  54. [55]

    unspecified

    Final Reminders • Prioritization:If a comment mentions multiple subjects, prioritize the subject who was vaccinated and expresses regret. • Uncertainty:When in doubt, choose the "unspecified" or "unclear" option. It is better to choose an unclear option than to guess an incorrect label. • Focus on Text:Base your judgment only on the text provided. Do not ...

  55. [56]

    self"‘ or ‘

    **Regret Scope:** ‘regret‘ = 1 may ONLY be assigned to a specific subject (‘"self"‘ or ‘"other"‘) that has taken the vaccine (‘vaccinated‘ = 1). For general statements about groups, regret MUST be 0

  56. [58]

    self"‘ or ‘

    **Vaccination Status:** ‘vaccinated‘ = 1 may ONLY be assigned to a specific subject (‘"self"‘ or ‘"other"‘) that has taken the vaccine. **JSON Schema & Values:** - ‘subject‘: "self" | "other" | "unspecified" - ‘vaccinated‘: 0 | 1 - ‘regret‘: 0 | 1 **Regret Definitions:** - 1: A specific subject (‘self‘/‘other‘) explicitly states regret, warns others based...

  57. [59]

    For general statements about groups, regret MUST be 0

    **Regret Scope:** ‘regret‘ = 1 may ONLY be assigned to a specific subject that has taken the vaccine. For general statements about groups, regret MUST be 0

  58. [60]

    **Regret Perspective:** The regret must be the subject’s own reported feeling, not the commenter’s projected opinion. **JSON Schema:** - ‘regret‘: 0 | 1 **Regret Definitions:** - 1: A specific subject explicitly states regret, warns others based on their outcome, or describes severe negative health results from the vaccine. - 0: The subject doesn’t expres...

  59. [61]

    self\"‘ or ‘\

    **Regret Scope:** ‘regret‘ = 1 may ONLY be assigned to a specific subject (‘\"self\"‘ or ‘\"other\"‘) that has taken the vaccine (‘vaccinated‘ = 1). For general statements about groups, regret MUST be 0

  60. [62]

    **Regret Perspective:** The regret must be the subject’s own reported feeling, not the commenter’s projected opinion

  61. [63]

    self\"‘ or ‘\

    **Vaccination Status:** ‘vaccinated‘ = 1 may ONLY be assigned to a specific subject (‘\"self\"‘ or ‘\"other\"‘) that has taken the vaccine. **JSON Schema & Values:** - ‘subject‘: "self" | "other" | "unspecified" - ‘vaccinated‘: 0 | 1 - ‘regret‘: 0 | 1 **Regret Definitions:** - 1: A specific subject (‘self‘/‘other‘) explicitly states regret, warns others b...

  62. [64]

    my friend regrets getting the vaccine. her period every 2 weeks isn’t normal

    **Comment:** "my friend regrets getting the vaccine. her period every 2 weeks isn’t normal." **JSON:** {"subject": "other", "vaccinated": 1, "regret": 1}

  63. [65]

    My brother got his second dose last week

    **Comment:** "My brother got his second dose last week." **JSON:** {"subject": "other", "vaccinated": 1, "regret": 0}

  64. [66]

    don’t get the vaccine, ever since i got it i have felt tired and had rashes

    **Comment:** "don’t get the vaccine, ever since i got it i have felt tired and had rashes." **JSON:** {"subject": "self", "vaccinated": 1, "regret": 1}

  65. [67]

    I wonder if the people getting the vaccine will regret it in 10 years when they have serious side effects

    **Comment:** "I wonder if the people getting the vaccine will regret it in 10 years when they have serious side effects" **JSON:** {"subject": "unspecified", "vaccinated": 0, "regret": 0}

  66. [68]

    cnn should interview the people who got the vaccine and regret it after serious side effect

    **Comment:** "cnn should interview the people who got the vaccine and regret it after serious side effect" **JSON:** {"subject": "unspecified", "vaccinated": 0, "regret": 0} Now, analyze the following user comment and provide only the JSON output without any commentary. **Comment:** {comment} Figure 8: Few shot prompt Table 12: Examples of Pipeline Classi...

  67. [70]

    reason_for_regret

    The JSON object must contain a single key: ‘"reason_for_regret"‘

  68. [71]

    Adverse_Health_Event

    The value for this key must be ONE of the following exact strings: * ‘"Adverse_Health_Event"‘ * ‘"Perceived_Coercion"‘ * ‘"Lack_of_Efficacy"‘ * ‘"Shift_in_Beliefs"‘ * ‘"Vague_or_Unspecified"‘ --- **CATEGORY DEFINITIONS & RULES:** * **‘"Adverse_Health_Event"‘** * Assign this category if the regret is linked to any negative physical health outcome. * **Look...

  69. [72]

    Your output MUST be ONLY a single, raw JSON object

  70. [73]

    relationship_to_author

    The JSON object must contain a single key: ‘"relationship_to_author"‘

  71. [74]

    Spouse_or_Partner

    The value for this key must be ONE of the following exact strings: * ‘"Spouse_or_Partner"‘ * ‘"Family_Member"‘ * ‘"Friend"‘ * ‘"Health_Care_Provider"‘ * ‘"Public_Figure"‘ * ‘"Other_Acquaintance"‘ * ‘"Unspecified"‘ --- **CATEGORY DEFINITIONS & RULES:** * **‘"Spouse_or_Partner"‘** * Assign this for a spouse or romantic partner. * **Look for:** "husband," "w...