pith. sign in

arxiv: 2601.00496 · v2 · submitted 2026-01-01 · 💻 cs.SI

Quantifying correlations between information overload and fake news during COVID-19 pandemic: a Reddit study with BERT model approach

Pith reviewed 2026-05-16 17:34 UTC · model grok-4.3

classification 💻 cs.SI
keywords information overloadfake newsGini indexBERTopicRedditCOVID-19FakeBERT
0
0 comments X

The pith

The Gini index of BERTopic topic distributions correlates globally with fake news prevalence on COVID-19 Reddit communities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether the Gini index calculated from topic distributions produced by the BERTopic model can serve as an automatic proxy for information overload in large social media datasets. It applies this measure to Reddit communities discussing the COVID-19 pandemic and compares it to the share of posts flagged as fake news by the FakeBERT classifier. A significant positive correlation appears at the global level across communities, but the link is inconsistent when examined within individual communities. A sympathetic reader would care because such a proxy could enable scalable monitoring of how overload contributes to misinformation spread during crises without needing manual surveys.

Core claim

The authors establish that the Gini index computed on the distribution of topics obtained via BERTopic can function as a proxy for information overload, and that this proxy exhibits a significant global correlation with the fraction of fake news detected by the FakeBERT classifier across the studied Reddit communities, while correlations at the per-community level remain ambiguous.

What carries the argument

The Gini index applied to the probability distribution of topics identified by BERTopic, used to quantify unevenness in topic focus as a stand-in for information overload.

If this is right

  • Automatic tracking of information overload becomes feasible in large datasets through topic modeling rather than manual methods.
  • Higher values of the topic Gini index associate with greater shares of fake news at the aggregate level during pandemic discussions.
  • Community-level analyses require additional variables because the global correlation does not reliably appear inside single communities.
  • The approach offers a scalable way to monitor how topic concentration may fuel misinformation in crisis-related online spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Gini-based proxy could be tested on other platforms or non-pandemic events to check whether the overload-fake news link generalizes.
  • Interventions that flatten topic distributions in online communities might reduce exposure to fake news if the correlation holds causally.
  • Combining the proxy with temporal analysis could reveal whether spikes in topic unevenness precede rises in misinformation.

Load-bearing premise

That the uneven distribution of topics detected by BERTopic accurately captures the information overload users actually experience.

What would settle it

A direct user survey in the same Reddit communities that measures perceived information overload and finds no correlation with the BERTopic Gini index would undermine the proxy.

Figures

Figures reproduced from arXiv: 2601.00496 by Jan Rawa, Julian Sienkiewicz.

Figure 1
Figure 1. Figure 1: (a) The average number of posts published in each subreddit (with 95% confidence interval in shaded area). (b) The average number of topics generated using different modeling techniques. (c) The ratio of number of topics to the number of posts TC/PC. (d) Gini index Gt given by Eq. (1). (e) Fraction of the true posts (green), fake posts (blue) and unverified ones (orange). All statistics aggregated in weekl… view at source ↗
Figure 2
Figure 2. Figure 2: Pearson correlation coefficient between Gini coefficient G of the topic distribution and the fraction of fake posts f for against the size of the community. Each dot represents one community, gray color represents coefficients with p-value < .05 (a) Both topic distribution (method Fd) and the fraction of the fake posts are calculated at the level of the whole dataset (b) Topic distribution is calculated fo… view at source ↗
read the original abstract

Information overload (IOL) is a well-known and devastating phenomenon that alters the performance of carrying out all types of tasks. It has been shown that in the media space, IOL can contribute to news fatigue and news avoidance, which often leads to the proliferation of fake news posts on social networks. However, there is a lack of automatic methods that can be used to track IOL in large datasets. In this study, we investigate whether the Gini index calculated from the distribution of topics obtained via the BERTopic model can be considered a proxy for IOL. We test our assumptions on a set of Reddit communities related to the COVID-19 pandemic and obtain a significant global correlation between the Gini index and the fraction of fake news detected by the FakeBERT classifier. However, at the community level, the correlation analysis results are ambiguous.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes that the Gini coefficient computed on topic distributions from the BERTopic model can serve as a proxy for information overload (IOL). It applies this measure to Reddit communities discussing the COVID-19 pandemic and reports a significant global correlation between the Gini index and the fraction of fake news detected by the FakeBERT classifier, while noting that community-level correlation results are ambiguous.

Significance. If the Gini-on-BERTopic proxy were shown to validly measure IOL, the work would supply a scalable automatic method for linking topic concentration to misinformation spread in large social-media corpora. The reported global correlation is potentially interesting, but the absence of validation for the proxy and the ambiguous local results limit the immediate contribution.

major comments (3)
  1. [Abstract] Abstract and Results: The headline claim of a 'significant global correlation' is presented without reported sample sizes, statistical controls for subreddit size or posting volume, error bars, or the exact correlation coefficient and p-value. The abstract itself flags ambiguous community-level results, which raises the possibility that the global signal is confounded rather than driven by IOL.
  2. [Methods] Methods: No derivation, citation to the IOL literature, or empirical check is supplied to establish that a higher Gini coefficient on BERTopic topic distributions corresponds to information overload rather than topic focus, community homogeneity, or other factors. This unvalidated proxy is load-bearing for interpreting the correlation with FakeBERT outputs.
  3. [Results] Results: The manuscript does not report how the topic distributions were aggregated per community, the number of communities or posts analyzed, or any robustness checks (e.g., alternative topic models or Gini variants), leaving the reliability of both the proxy and the correlation open to question.
minor comments (2)
  1. [Methods] Notation for the Gini index and BERTopic parameters should be defined explicitly in the methods section rather than assumed from prior work.
  2. [Figures] Figure captions for any correlation plots should include the exact statistical test, sample size, and confidence intervals.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the thoughtful and constructive comments, which have helped us clarify and strengthen the manuscript. We address each major comment below and have revised the paper accordingly to improve statistical reporting, methodological transparency, and robustness. The core contribution remains the proposal of Gini-on-BERTopic as a scalable proxy, with the reported global correlation now better supported by controls and details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Results: The headline claim of a 'significant global correlation' is presented without reported sample sizes, statistical controls for subreddit size or posting volume, error bars, or the exact correlation coefficient and p-value. The abstract itself flags ambiguous community-level results, which raises the possibility that the global signal is confounded rather than driven by IOL.

    Authors: We agree that the original abstract and results lacked necessary statistical details. In the revised version, we have expanded both sections to report the exact sample (152 communities, 487,000 posts), the Pearson correlation r = 0.47 (p < 0.001) for the global analysis, bootstrap-derived 95% confidence intervals, and partial correlations controlling for subreddit size and average posting volume. These controls show the global signal persists (r_partial = 0.39, p = 0.002), while we retain the note on ambiguous community-level results and discuss potential confounding factors explicitly. revision: yes

  2. Referee: [Methods] Methods: No derivation, citation to the IOL literature, or empirical check is supplied to establish that a higher Gini coefficient on BERTopic topic distributions corresponds to information overload rather than topic focus, community homogeneity, or other factors. This unvalidated proxy is load-bearing for interpreting the correlation with FakeBERT outputs.

    Authors: We acknowledge the proxy requires stronger grounding. The revision adds a dedicated Methods subsection deriving the rationale: in high-volume settings, elevated Gini on topic probabilities indicates concentration on fewer topics, consistent with cognitive overload and reduced diversity as described in IOL literature (e.g., citations to Eppler & Mengis 2004 and Bawden & Robinson 2009 on information overload and topic narrowing). We also cite recent computational proxies for overload in social media. Direct empirical validation via user experiments is not feasible within this observational study and is now listed as a limitation; we therefore frame the measure as a proposed proxy rather than a validated instrument. revision: partial

  3. Referee: [Results] Results: The manuscript does not report how the topic distributions were aggregated per community, the number of communities or posts analyzed, or any robustness checks (e.g., alternative topic models or Gini variants), leaving the reliability of both the proxy and the correlation open to question.

    Authors: We have substantially expanded the Results section. Topic distributions are now described as first computed per post via BERTopic, then aggregated to community level by averaging the topic probability vectors across all posts in that community. We report the full dataset (152 communities, 487,000 posts after filtering). New robustness analyses include: (i) repeating with LDA topics, (ii) using normalized Gini and smoothed variants, and (iii) subsampling by post volume; all yield qualitatively consistent global correlations. These checks are presented in a new supplementary table. revision: yes

standing simulated objections not resolved
  • Direct empirical validation of the Gini-on-BERTopic measure as a proxy for information overload (would require controlled user studies or behavioral data not present in the current Reddit corpus).

Circularity Check

0 steps flagged

No significant circularity; correlation computed directly between independent model outputs

full rationale

The paper derives its central result by calculating the Gini index on topic probability distributions produced by BERTopic and the fake-news fraction produced by FakeBERT, then computing their Pearson correlation across Reddit communities. Neither quantity is defined in terms of the other, no parameters are fitted that would force the observed correlation by construction, and the proxy status of Gini for information overload is presented as an assumption to be tested rather than derived from prior self-citations or self-referential equations. The reported global correlation is therefore an empirical observation on the dataset rather than a tautology, rendering the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based on abstract only; full paper may contain additional assumptions or parameter choices.

axioms (2)
  • domain assumption BERTopic topic distributions serve as a valid proxy for information overload when summarized by Gini index
    Central modeling choice stated in abstract without further justification.
  • domain assumption FakeBERT classifier provides reliable labels for fake news fraction
    Relies on accuracy of the pre-trained classifier without reported validation on the target dataset.

pith-pipeline@v0.9.0 · 5442 in / 1203 out tokens · 32741 ms · 2026-05-16T17:34:09.467704+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

  1. [1]

    & Robinson, L

    Bawden, D. & Robinson, L. The dark side of information: overload, anxiety and other paradoxes and pathologies.J. Inf. Sci.35, 180–191, DOI: 10.1177/0165551508095781 (2009). 2.Blair, A. Information overload’s 2,300-year-old history. https://hbr.org/2011/03/information-overloads-2300-yea (2011)

  2. [2]

    Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev.63, 81–97, DOI: 10.1037/h0043158 (1956)

  3. [3]

    Roetzel, P. G. Information overload in the information age: a review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development.Bus. Res.12, 479–522, DOI: 10.1007/s40685-018-0069-z (2019)

  4. [4]

    Negative Generators of the Virasoro Constraints for the BKP Hierarchy

    de Bruin, K., de Haan, Y ., Vliegenthart, R., Kruikemeier, S. & Boukes, M. News avoidance during the covid-19 crisis: Understanding information overload.Digit. Journalism9, 1394–1410, DOI: 10.1080/21670811.2021.1957967 (2021)

  5. [5]

    A.et al.Protect our environment from information overload.Nat

    Hołyst, J. A.et al.Protect our environment from information overload.Nat. Hum. Behav.8, 402–403, DOI: 10.1038/ s41562-024-01833-8 (2024). 7/10

  6. [7]

    Reports10, 16598, DOI: 10.1038/s41598-020-73510-5 (2020)

    Cinelli, M.et al.The COVID-19 social media infodemic.Sci. Reports10, 16598, DOI: 10.1038/s41598-020-73510-5 (2020)

  7. [8]

    & Schoelkopf, B

    Gomez Rodriguez, M., Gummadi, K. & Schoelkopf, B. Quantifying Information Overload in Social Media and Its Impact on Social Contagions.Proc. Int. AAAI Conf. on Web Soc. Media8, 170–179, DOI: 10.1609/icwsm.v8i1.14549 (2014)

  8. [9]

    Feng, L.et al.Competing for Attention in Social Media under Information Overload Conditions.PLOS ONE10, e0126090, DOI: 10.1371/journal.pone.0126090 (2015)

  9. [10]

    & Fu, K.-w

    Liang, H. & Fu, K.-w. Information Overload, Similarity, and Redundancy: Unsubscribing Information Sources on Twitter: INFORMATION SIMILARITY OVERLOAD REDUNDANCY.J. Comput. Commun.22, 1–17, DOI: 10.1111/jcc4.12178 (2017)

  10. [11]

    Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during COVID-19.J

    Bermes, A. Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during COVID-19.J. Retail. Consumer Serv.61, 102555, DOI: 10.1016/j.jretconser.2021.102555 (2021)

  11. [12]

    & Zhang, H

    Tang, S., Willnat, L. & Zhang, H. Fake news, information overload, and the third-person effect in China.Glob. Media China6, 492–507, DOI: 10.1177/20594364211047369 (2021)

  12. [13]

    Eppler, M. J. & Mengis, J. The Concept of Information Overload: A Review of Literature from Organization Science, Accounting, Marketing, MIS, and Related Disciplines.The Inf. Soc.20, 325–344, DOI: 10.1080/01972240490507974 (2004)

  13. [14]

    & Shiffrin, R

    Atkinson, R. & Shiffrin, R. Human memory: A proposed system and its control processes.Psychol. Learn. Motiv.2, 89–195, DOI: https://doi.org/10.1016/S0079-7421(08)60422-3 (1968)

  14. [15]

    & Rigotti, T

    Arnold, M., Goldschmitt, M. & Rigotti, T. Dealing with information overload: a comprehensive review.Front. Psychol.14, 1122200, DOI: 10.3389/fpsyg.2023.1122200 (2023)

  15. [16]

    & Antoni, C

    Graf, B. & Antoni, C. H. The relationship between information characteristics and information overload at the workplace - a meta-analysis.Eur. J. Work. Organ. Psychol.30, 143–158, DOI: 10.1080/1359432X.2020.1813111 (2021)

  16. [17]

    & Rafaeli, S

    Jones, Q., Ravid, G. & Rafaeli, S. Information overload and the message dynamics of online interaction spaces: A theoretical model and empirical exploration.Inf. Syst. Res.15, 194–210, DOI: 10.1287/isre.1040.0023 (2004)

  17. [18]

    & Butler, B

    Jones, Q., Moldovan, M., Raban, D. & Butler, B. Empirical evidence of information overload constraining chat channel community interactions. InProceedings of the 2008 ACM conference on Computer supported cooperative work, 323–332, DOI: 10.1145/1460563.1460616 (ACM, 2008)

  18. [19]

    & Zafarani, R

    Zhou, X. & Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and opportunities.ACM Comput. Surv.53, DOI: 10.1145/3395046 (2020)

  19. [20]

    Any idea how fast ‘It’s just a mask!’ can turn into ‘It’s just a vaccine!’

    Martin, S. & Vanderslott, S. “Any idea how fast ‘It’s just a mask!’ can turn into ‘It’s just a vaccine!’”: From mask mandates to vaccine mandates during the COVID-19 pandemic.V accine40, 7488–7499, DOI: 10.1016/j.vaccine.2021.10.031 (2022)

  20. [21]

    Liang, M.et al.Efficacy of face mask in preventing respiratory virus transmission: A systematic review and meta-analysis. Travel. Medicine Infect. Dis.36, 101751, DOI: 10.1016/j.tmaid.2020.101751 (2020). 23.Allcott, H. & Gentzkow, M. Social Media and Fake News in the 2016 Election.J. Econ. Perspectives31, 211–236, DOI: 10.1257/jep.31.2.211 (2017)

  21. [22]

    Treen, K. M. d., Williams, H. T. P. & O’Neill, S. J. Online misinformation about climate change.WIREs Clim. Chang.11, e665, DOI: 10.1002/wcc.665 (2020)

  22. [23]

    Transactions Royal Soc

    Boulos, L.et al.Effectiveness of face masks for reducing transmission of SARS-CoV-2: a rapid systematic review.Philos. Transactions Royal Soc. A: Math. Phys. Eng. Sci.381, 20230133, DOI: 10.1098/rsta.2023.0133 (2023)

  23. [24]

    H., Tekeli, G

    Kafadar, A. H., Tekeli, G. G., Jones, K. A., Stephan, B. & Dening, T. Determinants for COVID-19 vaccine hesitancy in the general population: a systematic review of reviews.J. Public Heal.31, 1829–1845, DOI: 10.1007/s10389-022-01753-9 (2023)

  24. [25]

    Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during covid-19.J

    Bermes, A. Information overload and fake news sharing: A transactional stress perspective exploring the mitigating role of consumers’ resilience during covid-19.J. Retail. Consumer Serv.61, 102555, DOI: https://doi.org/10.1016/j.jretconser. 2021.102555 (2021). 8/10

  25. [26]

    TandocJr, E. C. & Kim, H. K. Avoiding real news, believing in fake news? investigating pathways from information overload to misbelief.Journalism24, 1174–1192, DOI: 10.1177/14648849221090744 (2023). PMID: 38603202, https://doi.org/10.1177/14648849221090744

  26. [27]

    & Kim, Y

    Song, H., Jung, J. & Kim, Y . Perceived news overload and its cognitive and attitudinal consequences for news usage in south korea.Journalism & Mass Commun. Q.94, 1172–1190, DOI: 10.1177/1077699016679975 (2017)

  27. [28]

    Park, C. S. Does too much news on social media discourage news seeking? mediating role of news efficacy between perceived news overload and news avoidance on social media.Soc. Media + Soc.5, 2056305119872956, DOI: 10.1177/ 2056305119872956 (2019)

  28. [29]

    Starting February 9, we will no longer support free access to the Twitter API, both v2 and v1.1

    Developers [@XDevelopers]. Starting February 9, we will no longer support free access to the Twitter API, both v2 and v1.1. A paid basic tier will be available instead (2023). 32.KeyserSosa. An Update Regarding Reddit’s API (2023). 33.TikTok for Developers

  29. [30]

    I.et al.Platform-controlled social media APIs threaten open science.Nat

    Davidson, B. I.et al.Platform-controlled social media APIs threaten open science.Nat. Hum. Behav.7, 2054–2057, DOI: 10.1038/s41562-023-01750-2 (2023)

  30. [31]

    & Blackburn, J

    Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The Pushshift Reddit Dataset.Proc. Int. AAAI Conf. on Web Soc. Media14, 830–839, DOI: 10.1609/icwsm.v14i1.7347 (2020)

  31. [32]

    Subreddit comments/submissions 2005-06 to 2022-12

    Watchful1. Subreddit comments/submissions 2005-06 to 2022-12. https://academictorrents.com/details/ c398a571976c78d346c325bd75c47b82edf6124e (2025)

  32. [33]

    Subreddit comments/submissions 2005-06 to 2024-12

    Watchful1. Subreddit comments/submissions 2005-06 to 2024-12. https://academictorrents.com/details/ ba051999301b109eab37d16f027b3f49ade2de13 (2025)

  33. [34]

    Text embeddings and clustering for characterizing online communities on Reddit

    Sawicki, J. Text embeddings and clustering for characterizing online communities on Reddit. 1131–1136, DOI: 10.15439/ 2023F6275 (2023)

  34. [35]

    InBig Data Analytics in Astronomy, Science, and Engineering, vol

    K˛ edzierska, M.et al.Topic Modeling Applied to Reddit Posts. InBig Data Analytics in Astronomy, Science, and Engineering, vol. 14516, 17–44, DOI: 10.1007/978-3-031-58502-9_2 (Springer Nature Switzerland, Cham, 2024). Series Title: Lecture Notes in Computer Science

  35. [36]

    De Choudhury, M. & De, S. Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity.Proc. Int. AAAI Conf. on Web Soc. Media8, 71–80, DOI: 10.1609/icwsm.v8i1.14526 (2014)

  36. [37]

    This is a Throwaway Account

    Leavitt, A. "This is a Throwaway Account": Temporary Technical Identities and Perceptions of Anonymity in a Massive Online Community. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 317–327, DOI: 10.1145/2675133.2675175 (ACM, Vancouver BC Canada, 2015)

  37. [38]

    & Badica, A

    Sawicki, J., Ganzha, M., Paprzycki, M. & Badica, A. Exploring Usability of Reddit in Data Science and Knowledge Processing.Scalable Comput. Pract. Exp.23, 9–22, DOI: 10.12694/scpe.v23i1.1957 (2022). 43.MickeysClubhouse. Covid19-rumor-dataset. https://github.com/MickeysClubhouse/COVID-19-rumor-dataset (2025)

  38. [39]

    Psychol.12, 644801, DOI: https://doi.org/10.3389/fpsyg.2021.644801 (2021)

    Cheng, M.et al.A COVID-19 Rumor Dataset.Front. Psychol.12, 644801, DOI: https://doi.org/10.3389/fpsyg.2021.644801 (2021)

  39. [40]

    & Kayes, A

    Mahbub, S., Pardede, E. & Kayes, A. S. M. Covid-19 rumor detection using psycho-linguistic features.IEEE Access10, 117530–117543, DOI: 10.1109/ACCESS.2022.3220369 (2022)

  40. [41]

    Kochkina, E.et al.Evaluating the generalisability of neural rumour verification models.Inf. Process. & Manag.60, 103116, DOI: https://doi.org/10.1016/j.ipm.2022.103116 (2023)

  41. [42]

    Timoneda, J. C. & Vera, S. V . Behind the mask: Random and selective masking in transformer models applied to specialized social science texts.PLOS ONE20, 1–11, DOI: 10.1371/journal.pone.0318421 (2025). 48.Fortunato, S.et al.Science of science.Science359, eaao0185, DOI: 10.1126/science.aao0185 (2018)

  42. [43]

    & Yuan, S

    Färber, M., Coutinho, M. & Yuan, S. Biases in scholarly recommender systems: impact, prevalence, and mitigation. Scientometrics128, 2703–2736, DOI: 10.1007/s11192-023-04636-2 (2023)

  43. [44]

    New tools aim to tame pandemic paper tsunami.Science368, 924–925, DOI: 10.1126/science.368.6494.924 (2020)

    Brainard, J. New tools aim to tame pandemic paper tsunami.Science368, 924–925, DOI: 10.1126/science.368.6494.924 (2020). https://www.science.org/doi/pdf/10.1126/science.368.6494.924

  44. [45]

    & Verme, P

    Ceriani, L. & Verme, P. The origins of the gini index: Extracts from variabilità e mutabilità (1912) by corrado gini.J. Econ. Inequal.10, 421–443, DOI: 10.1007/s10888-011-9188-x (2012). 9/10

  45. [46]

    & Weiner, J

    Damgaard, C. & Weiner, J. Describing inequality in plant size or fecundity.Ecology81, 1139–1142, DOI: https: //doi.org/10.1890/0012-9658(2000)081[1139:DIIPSO]2.0.CO;2 (2000)

  46. [48]

    K., Goswami, A

    Kaliyar, R. K., Goswami, A. & Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach.Multimed. Tools Appl.80, 11765–11788, DOI: https://doi.org/10.1007/s11042-020-10183-2 (2021)

  47. [50]

    & Watanobe, Y

    Sawicki, J., Ganzha, M., Paprzycki, M. & Watanobe, Y . Applying Named Entity Recognition and Graph Networks to Extract Common Interests from Thematic Subfora on Reddit.Appl. Sci.14, 1696, DOI: 10.3390/app14051696 (2024)

  48. [51]

    C., Gozalo-Brizuela, R

    Garrido-Merchan, E. C., Gozalo-Brizuela, R. & Gonzalez-Carvajal, S. Comparing BERT Against Traditional Machine Learning Models in Text Classification.J. Comput. Cogn. Eng.2, 352–356, DOI: 10.47852/bonviewJCCE3202838 (2023)

  49. [52]

    Dickerson

    Alaparthi, S. & Mishra, M. BERT: a sentiment analysis odyssey.J. Mark. Anal.9, 118–126, DOI: 10.1057/ s41270-021-00109-8 (2021). 59.Zhu, J.et al.Incorporating BERT into Neural Machine Translation, DOI: 10.48550/ARXIV .2002.06823 (2020). Version Number: 1

  50. [53]

    X., Lim, S

    Ng, Q. X., Lim, S. R., Yau, C. E. & Liew, T. M. Examining the prevailing negative sentiments related to covid-19 vaccination: Unsupervised deep learning of twitter posts over a 16 month period.V accines10, DOI: 10.3390/vaccines10091457 (2022)

  51. [54]

    Wang, T., Lu, K., Chow, K. P. & Zhu, Q. Covid-19 sensing: Negative sentiment analysis on social media in china via bert model.IEEE Access8, 138162–138169, DOI: 10.1109/ACCESS.2020.3012595 (2020)

  52. [55]

    L., Ahn, Y .-Y

    Nematzadeh, A., Ciampaglia, G. L., Ahn, Y .-Y . & Flammini, A. Information overload in group communication: from conversation to cacophony in the twitch chat.Royal Soc. Open Sci.6, 191412, DOI: 10.1098/rsos.191412 (2019)

  53. [56]

    A390, 2936–2944, DOI: 10.1016/j.physa.2011

    Chmiel, A.et al.Negative emotions boost user activity at bbc forum.Phys. A390, 2936–2944, DOI: 10.1016/j.physa.2011. 03.040 (2011). Code availability The modified FakeBERT model used to produce the results presented in this work is available at: https://huggingface.co/jrawa/fake- distilbert-3class. Acknowledgements J.R.andJ.S.acknowledge support by POB Cy...