FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media

Ahmed Mahrous; Roberto Di Pietro

arxiv: 2605.09469 · v1 · submitted 2026-05-10 · 💻 cs.CL

FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media

Ahmed Mahrous , Roberto Di Pietro This is my paper

Pith reviewed 2026-05-12 03:10 UTC · model grok-4.3

classification 💻 cs.CL

keywords financial sentiment analysisemojiStockTwitssentiment classificationmachine learningcomputational efficiencysocial mediamarket prediction

0 comments

The pith

Emojis alone can classify financial sentiment from StockTwits posts at an F1 score of about 0.75, with far lower computational cost than text-inclusive models and with some emoji pairs exceeding 90 percent accuracy for bullish or bearish预测.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether emojis function as compact, standalone signals of investor sentiment on the StockTwits platform. It trains logistic regression and transformer models on a balanced set of roughly 528,000 emoji-containing posts to compare emoji-only performance against combined text-and-emoji approaches. Emoji models reach lower accuracy but run much faster, which matters for real-time market monitoring. The work also shows that financial emoji usage differs statistically from general social media and that particular emojis and pairs carry strong directional signals. These findings support building lighter, domain-specific tools for gauging market mood.

Core claim

Using a balanced dataset of about 528,000 emoji-containing StockTwits posts, emoji-only models achieve an F1 score of approximately 0.75, lower than the 0.88 of text-emoji combined models but at far lower computational cost. Certain emojis and emoji pairs exhibit strong predictive power for market sentiment, reaching over 90 percent accuracy in predicting bullish or bearish trends. The study further reveals large statistical differences in emoji usage between financial and general social media contexts, indicating that domain-specific models are required.

What carries the argument

Emoji-only and emoji-plus-text sentiment classifiers built with logistic regression and transformers, together with per-emoji and per-pair accuracy analysis on the StockTwits corpus.

If this is right

Emoji-only models enable sentiment tracking in high-frequency trading or other latency-sensitive settings where full text processing is too slow.
A small set of high-accuracy emojis and pairs can be used as lightweight filters or features in any market-monitoring pipeline.
Financial sentiment systems must be trained on platform-specific data because emoji distributions differ markedly from those in non-financial social media.
Lower data and compute requirements make emoji-based methods practical for resource-constrained environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These signals could be fused with price or volume data to create hybrid early-warning indicators for market moves.
A compact financial emoji lexicon derived from the high-accuracy pairs could be ported to other short-form financial text sources.
Periodic retraining on fresh StockTwits data would be needed to detect shifts in emoji meaning during different market regimes.
The same methodology could be applied to emoji use in corporate or regulatory social media to gauge institutional sentiment.

Load-bearing premise

The emoji patterns and accuracy levels found in the 528,000 StockTwits posts will hold for investor sentiment on other platforms or in future time periods.

What would settle it

Running the same emoji-only models on a new, balanced dataset drawn from a different financial social media platform or from a later period and obtaining F1 scores below 0.65 or emoji-pair accuracies below 70 percent for trend prediction.

Figures

Figures reproduced from arXiv: 2605.09469 by Ahmed Mahrous, Roberto Di Pietro.

**Figure 2.** Figure 2: Data Filtration Process. This series of pie charts illustrates the progressive stages of data filtering from the initial dataset to [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Demographic Breakdown of StockTwits Users. This figure illustrates the demographic composition of StockTwits users, high [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Methodology at a Glance. Pre-processing (yellow) generates three datasets containing only text, only emojis, and text with [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Emoji Usage in our Dataset. (a) Bar charts illustrating the percentage of posts containing various numbers of emojis, com [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Emoji Clouds for Financial Microblogs. (a) Bullish (green [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: shows emoji clouds of Twitter (blue/left, now X) and of StockTwits (purple/right). Looking at the clouds, noticeable differences in emoji usage between the two platforms are immediately apparent. For instance, Twitter’s cloud contains more heart-related emojis (such as or ), while StockTwits’ contains more money-related emojis (such as or ). Different emojis are also much more prevalent on one platform tha… view at source ↗

**Figure 8.** Figure 8: Sentiment Score of Individual Emojis. This bar chart displays the proportion of bullish (green) and bearish (red) posts associ [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Sentiment Score of Emoji Pairs. This bar chart displays the proportion of bullish (green) and bearish (red) posts associated [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Emoji Count versus Sentiment and Frequency. This chart displays the percentage of bullish (green) and bearish (red) posts [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Confusion Matrices of Logistic Regression Models. This figure displays confusion matrices for logistic regression models [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Impact of Training Sample Size on Logistic Regression Model Accuracy. This figure illustrates the relationship between [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Confusion Matrices of Transformer-based Twitter-roBERTa Models. This figure displays confusion matrices for Twitter [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: Impact of Training Sample Size on Transformer-based Twitter-RoBERTa Model Accuracy. This figure illustrates the relation [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗

read the original abstract

This paper explores the use of emojis in financial sentiment analysis, focusing on the social media platform StockTwits. Emojis, increasingly prevalent in digital communication, have potential as compact indicators of investor sentiment, which can be critical for predicting market trends. Our study examines whether emojis alone can serve as reliable proxies for financial sentiment and how they compare with traditional text-based analysis. We conduct a series of experiments using logistic regression and transformer models. We further analyze the performance, computational efficiency, and data requirements of emoji-based versus text-based sentiment classification. Using a balanced dataset of about 528,000 emoji-containing StockTwits posts, we find that emoji-only models achieve F1 approximately 0.75, lower than text-emoji combined models, which achieve F1 approximately 0.88, but with far lower computational cost. This is a useful feature in time-sensitive settings such as high-frequency trading. Furthermore, certain emojis and emoji pairs exhibit strong predictive power for market sentiment, demonstrating over 90 percent accuracy in predicting bullish or bearish trends. Finally, our research reveals large statistical differences in emoji usage between financial and general social media contexts, stressing the need for domain-specific sentiment analysis models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Emoji-only models hit usable F1 on a filtered StockTwits set while text boosts it further, but the balancing and emoji-only restriction make the headline numbers hard to trust outside that slice.

read the letter

The core finding is that emoji-only logistic regression and transformer runs reach roughly 0.75 F1 on 528k balanced emoji-containing StockTwits posts, text-plus-emoji climbs to 0.88, and a handful of emoji pairs alone clear 90 percent accuracy on bullish versus bearish labels. The paper also documents measurable differences in emoji patterns between financial and general social media. That comparison is the actual new piece; prior work has looked at emojis in sentiment but not with this scale or this direct finance-versus-general contrast on the same platform style of data. The efficiency note is useful too—lower compute for the emoji-only path is a practical point for real-time monitoring. The authors run both classical and modern models, which is straightforward and lets them show the trade-off clearly. The dataset size itself is respectable for this kind of study. The main weakness is the data construction. Restricting to emoji-containing posts and then balancing the labels selects exactly the cases where emojis are already present and informative, so the reported F1 and accuracy figures are likely optimistic relative to the full unfiltered StockTwits stream. No checks against non-emoji posts or different time windows appear in the abstract, and the stress-test concern about representativeness lands. The write-up also gives headline metrics without train-test details, baseline comparisons, or any mention of how the bullish/bearish labels were obtained or validated. Those gaps make it difficult to judge whether the 0.75 and 0.88 numbers would replicate or transfer. This is a narrow but cleanly scoped empirical paper. Readers working on social-media signals for trading or domain-adapted sentiment models could extract the numbers as a reference point. It is not methodologically novel, yet the scale and the finance-specific comparison give it enough substance that a referee should see it. I would send it to review and ask for the missing splits, full-stream baselines, and a short robustness section on the filtering choices.

Referee Report

4 major / 2 minor

Summary. The manuscript introduces FinMoji, a framework for emoji-driven sentiment analysis on the StockTwits financial social media platform. Using logistic regression and transformer models on a balanced dataset of approximately 528,000 emoji-containing posts, it claims emoji-only models achieve F1 scores of approximately 0.75 while text-emoji combined models reach approximately 0.88. Certain emojis and emoji pairs are reported to predict bullish or bearish trends with over 90% accuracy, and large statistical differences in emoji usage are found between financial and general social media contexts, supporting the need for domain-specific models. The work also notes computational efficiency advantages of emoji-only approaches for time-sensitive applications such as high-frequency trading.

Significance. If the empirical results prove robust, the paper offers concrete evidence that emojis can function as compact, low-cost proxies for investor sentiment in financial social media. This has potential practical value for real-time market monitoring and high-frequency trading due to reduced computational demands. The findings on domain-specific emoji usage patterns also contribute to understanding context-dependent sentiment signals, which could inform future work on specialized NLP models for finance.

major comments (4)

[Dataset Construction] Dataset section: The construction of the balanced 528,000-post dataset is restricted to emoji-containing StockTwits posts with no reported analysis of selection effects, comparison to the full unfiltered stream, non-emoji posts, or different time windows. This directly impacts the generalizability of the F1 scores (~0.75 and ~0.88) and the >90% accuracy claims for specific emojis/pairs.
[Experimental Methodology] Experimental Methodology: No details are provided on train/validation/test splits, hyperparameter selection, baseline models, training procedures for the logistic regression and transformer models, or statistical tests supporting the reported F1 scores and accuracy figures. These omissions are load-bearing for assessing the reliability of the central performance claims.
[Results and Analysis] Results section: The assertion that certain emojis and emoji pairs achieve over 90% accuracy for bullish/bearish prediction lacks specification of the exact evaluation protocol (e.g., held-out test set, frequency thresholds, or per-emoji breakdowns), preventing verification of robustness.
[Results and Analysis] Computational Efficiency discussion: The claim of far lower computational cost for emoji-only models (useful for high-frequency trading) is stated qualitatively in the abstract but without quantitative metrics such as training/inference times, memory usage, or hardware specifications to support the comparison.

minor comments (2)

[Abstract] Abstract: Approximate phrasing ('approximately 0.75', 'approximately 0.88', 'over 90 percent') without exact values, confidence intervals, or standard deviations reduces precision; reporting exact metrics would strengthen the presentation.
[Related Work] Related Work: Additional citations to prior emoji sentiment analysis studies (both general and financial) would better situate the domain-specific contributions and novelty.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important areas for improving the clarity and rigor of our manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Dataset Construction] Dataset section: The construction of the balanced 528,000-post dataset is restricted to emoji-containing StockTwits posts with no reported analysis of selection effects, comparison to the full unfiltered stream, non-emoji posts, or different time windows. This directly impacts the generalizability of the F1 scores (~0.75 and ~0.88) and the >90% accuracy claims for specific emojis/pairs.

Authors: We agree that the restriction to emoji-containing posts requires further justification and analysis to support generalizability claims. The dataset was constructed this way to isolate emoji-driven signals, but we will revise the Dataset section to include: (1) the proportion of emoji posts in the full StockTwits stream during the collection period, (2) basic comparisons of post length, user activity, and sentiment distribution between emoji and non-emoji posts, and (3) explicit details on the time window (e.g., start and end dates). These additions will better contextualize potential selection effects. revision: yes
Referee: [Experimental Methodology] Experimental Methodology: No details are provided on train/validation/test splits, hyperparameter selection, baseline models, training procedures for the logistic regression and transformer models, or statistical tests supporting the reported F1 scores and accuracy figures. These omissions are load-bearing for assessing the reliability of the central performance claims.

Authors: We acknowledge these critical omissions and will expand the Experimental Methodology section substantially. Revisions will specify the train/validation/test split ratios and stratification method, hyperparameter search procedures (grid search ranges for logistic regression; learning rate, batch size, and epochs for transformers), baseline models (e.g., majority-class and text-only variants), full training details (optimizer, loss function, early stopping), and statistical tests (e.g., bootstrap confidence intervals or McNemar's test for F1 comparisons). revision: yes
Referee: [Results and Analysis] Results section: The assertion that certain emojis and emoji pairs achieve over 90% accuracy for bullish/bearish prediction lacks specification of the exact evaluation protocol (e.g., held-out test set, frequency thresholds, or per-emoji breakdowns), preventing verification of robustness.

Authors: The >90% figures were obtained on the held-out test set for emojis and pairs meeting a minimum frequency threshold to ensure statistical reliability. We will revise the Results section to add a dedicated table or subsection providing per-emoji and per-pair breakdowns, the exact frequency cutoff used, confirmation of held-out evaluation, and any additional robustness checks (e.g., performance stratified by market conditions). revision: yes
Referee: [Results and Analysis] Computational Efficiency discussion: The claim of far lower computational cost for emoji-only models (useful for high-frequency trading) is stated qualitatively in the abstract but without quantitative metrics such as training/inference times, memory usage, or hardware specifications to support the comparison.

Authors: We agree that qualitative statements alone are insufficient. The revised manuscript will include quantitative benchmarks in a new subsection or table, reporting training and inference times, peak memory usage, and hardware details (CPU/GPU specifications) for emoji-only versus combined models under identical conditions. This will provide concrete support for the efficiency claims. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation on held-out data

full rationale

The paper conducts standard supervised learning experiments (logistic regression and transformers) on a collected and balanced dataset of 528k emoji-containing StockTwits posts, reporting F1 scores and per-emoji accuracies measured on held-out test data. No equations, derivations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. All performance claims (emoji-only F1 ~0.75, combined ~0.88, certain emojis >90% accuracy) are direct empirical measurements rather than reductions to the inputs by construction. The study is self-contained against external benchmarks and contains no derivation chain that could be circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard NLP assumptions plus one key domain assumption about emojis as sentiment carriers in finance; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Emojis convey consistent and domain-specific sentiment signals usable as proxies for investor mood in financial social media
Invoked throughout to justify emoji-only classification and the comparison to general social media.

pith-pipeline@v0.9.0 · 5507 in / 1355 out tokens · 64985 ms · 2026-05-12T03:10:01.539737+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

emoji-only models achieve F1 approximately 0.75, lower than text-emoji combined models, which achieve F1 approximately 0.88

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 2 internal anchors

[1]

arms race

Matteo Aquilina, Eric Budish, and Peter O’neill. 2022. Quantifying the high-frequency trading “arms race”. The Quarterly Journal of Economics 137, 1 (2022), 493–564

work page 2022
[2]

Dogu Araci. 2019. FinBERT: Financial sentiment analysis with pre-trained language models. , 10 pages

work page 2019
[3]

Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan, Kaiser Lukasz, and Polosukhin Illia. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems (NeurIPS) , Vol. 30. Curran Associates, Inc., Red Hook, NY, USA, 11 pages

work page 2017
[4]

Francesco Barbieri, Jose Camacho-Collados, Luis Espinosa-Anke, and Leonardo Neves. 2020. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. In Proceedings of Findings of EMNLP . Association of Computational Linguistics, Stroudsburg, PA, USA, 1644– 1650

work page 2020
[5]

Usman Bashir, Umar Nawaz Kayani, Shoaib Khan, Ali Polat, Muntazir Hussain, and Ahmet Faruk Aysan. 2024. Investor sentiment and stock price crash risk: The mediating role of analyst herding. Computers in Human Behavior Reports 13 (2024), 100371

work page 2024
[6]

Hadis Bashiri and Hassan Naderi. 2024. Comprehensive review and comparative analysis of transformer models in sentiment analysis. Knowledge and Information Systems 66, 12 (2024), 7305–7361

work page 2024
[7]

Vance W Berger and YanYan Zhou. 2014. Kolmogorov–Smirnov test: Overview. Wiley statsref: Statistics reference online (2014)

work page 2014
[8]

Mohamed Reda Bouadjenek, Scott Sanner, and Ga Wu. 2023. A user-centric analysis of social media for stock market prediction. ACM Transactions on the Web 17, 2 (2023), 1–22

work page 2023
[9]

Keith Broni. 2021. Emoji Use at All-Time High. Emojipedia Blog. https://blog.emojipedia.org/emoji-use-at-all-time-high/ Accessed on October 28, 2025

work page 2021
[10]

Keith Broni. 2023. 10 Years of Emojipedia. Emojipedia Blog. https://blog.emojipedia.org/10-years-of-emojipedia-10-years-of-record-breaking- emoji-popularity/ Accessed on October 28, 2025

work page 2023
[11]

Jason Brownlee. 2021. Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost. Machine Learning Mastery. https:// machinelearningmastery.com/gradient-boosting-with-scikit-learn-xgboost-lightgbm-and-catboost/ Accessed on October 28, 2025

work page 2021
[12]

Jeremy Burge. 2017. 5 Billion Emojis Sent Daily on Messenger. Emojipedia Blog. https://blog.emojipedia.org/5-billion-emojis-sent-daily-on- messenger/ Accessed on October 28, 2025. FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media 29

work page 2017
[13]

Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2018. NTUSD-Fin: a market sentiment dictionary for financial social media data applica- tions. In Proceedings of the 1st Financial Narrative Processing Workshop (FNP 2018). European Language Resources Association (ELRA), Paris, France, 37–43

work page 2018
[14]

Cathy Yi-Hsuan Chen, Li Guo, and Thomas Renault. 2019. What Makes Cryptocurrencies Special? Investor Sentiment and Return Predictability. Investor Sentiment and Return Predictability (2019), 36 pages

work page 2019
[15]

Yihua Chen, Xingchen Yang, Hannah Howman, and Ruth Filik. 2024. Individual differences in emoji comprehension: Gender, age, and culture. Plos one 19, 2 (2024), e0297379

work page 2024
[16]

Z. Chen, X. Lu, W. Ai, H. Li, Mei, Q., and X. Liu. 2018. Through a gender lens: Learning usage patterns of emojis from large-scale android users. In Proc. World Wide Web Conference. ACM, New York, NY, 763–772

work page 2018
[17]

Emre Cicekyurt and Gokhan Bakal. 2025. Enhancing Sentiment Analysis in Stock Market Tweets Through BERT-Based Knowledge Transfer. Computational Economics 48 (2025), 1–23. Issue 1

work page 2025
[18]

Harald Cramér. 1946. Mathematcal Methods of Statistics . Princeton University Press, Princeton, NJ, USA, Chapter 21. The two-dimensional case, 282

work page 1946
[19]

Jennifer Daniel. 2021. Emoji Frequency. Unicode Consortium. https://home.unicode.org/emoji/emoji-frequency/ Accessed on October 28, 2025

work page 2021
[20]

Nguyen Dat Quoc, Vu Thanh, and Nguyen Anh Tuan. 2020. BERTweet: A pre-trained language model for English Tweets. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations . Association of Computational Linguistics, Stroudsburg, PA, USA, 9–14

work page 2020
[21]

Sadettin Demirel, Elif Kahraman, and Uğur Gündüz. 2024. A text mining analysis of the change in status of the Hagia Sophia on Twitter: the political discourse and its reflections on the public opinion. Atlantic Journal of Communication 32, 1 (2024), 63–90

work page 2024
[22]

Sadettin Demirel, Elif Kahraman-Gokalp, and Uğur Gündüz. 2025. From optimism to concern: Unveiling sentiments and perceptions surrounding ChatGPT on Twitter. International Journal of Human–Computer Interaction 41, 12 (2025), 7292–7314

work page 2025
[23]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv Preprint arXiv:1810.04805v2. , 16 pages

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Thomas Dimson. 2015. Emojineering Part 1: Machine Learning for Emoji Trends. Instagram Engineering Blog. https://instagram-engineering. com/emojineering-part-1-machine-learning-for-emoji-trends-7f5f9cb979ad Accessed on October 28, 2025

work page 2015
[25]

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. 2024. Financial sentiment analysis: Techniques and applications. Comput. Surveys 56, 9 (2024), 1–42

work page 2024
[26]

European Parliament and the Council of the European Union. 2016. Regulaton (EU) 2016/679 (General Date Protection Regulation). https://gdpr- info.eu/. Official Journal of the European Union (2016), L119/1–88

work page 2016
[27]

Ramiro H Gálvez and Agustín Gravano. 2017. Assessing the usefulness of online message board mining in automatic stock prediction systems. Journal of Computational Science 19 (2017), 43–56

work page 2017
[28]

Nadia Mushtaq Gardazi, Ali Daud, Muhammad Kamran Malik, Amal Bukhari, Tariq Alsahfi, and Bader Alshemaimri. 2025. BERT applications in natural language processing: a review. Artificial Intelligence Review 58, 6 (2025), 1–49

work page 2025
[29]

Felix Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (2000), 2451–2471

work page 2000
[30]

Uğur Gündüz and Sadettin Demirel. 2025. Metaverse-related perceptions and sentiments on Twitter: evidence from text mining and network analysis. Electronic Commerce Research 25, 3 (2025), 1453–1483

work page 2025
[31]

Sharath Chandra Guntuku, Mingyang Li, Louis Tay, and Lyle H. Ungar. 2019. Studying Cultural Differences in Emoji Usage across the East and the West. In Proceedings of the International AAAI Conference on Web and Social Media , Vol. 13. AAAI, Westminster, UK, 226–235

work page 2019
[32]

Sharath Chandra Guntuku Guntuku, Mingyang Li, Louis Tay, and Lyle H. Ungar. 2019. Studying Cultural Differences in Emoji Usage across the East and the West. In International AAAI Conference on Web and Social Media (ICWSM) . AAAI, Westminster, UK, 226–235

work page 2019
[33]

Ahmed Hazourli. 2022. FinBERT: a pretrained language model for financial text mining. International Joint Conference on Artificial Intelligence Organization 2 (2022), 4513–4519

work page 2022
[34]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780

work page 1997
[35]

Clayton Hutto and Eric Gilbert. 2014. V ADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media , Vol. 8. AAAI, Washington, D.C., USA, 216–225. Issue 1

work page 2014
[36]

Wen jun Gu, Yi hao Zhong, Shi zun Li, Chang song Wei, Li ting Dong, Zhuo yue Wang, and Chao Yan. 2024. Predicting stock prices with finbert- lstm: Integrating news sentiment analysis. In Proceedings of the 2024 8th International Conference on Cloud and Big Data Computing . ACM, New York City, NY, USA, 67–72

work page 2024
[37]

Elif Kahraman, Sadettin Demirel, and Uğur Gündüz. 2025. COVID-19 vaccines in twitter ecosystem: Analyzing perceptions and attitudes by sentiment and text analysis method. Journal of Public Health 33, 5 (2025), 965–979

work page 2025
[38]

Elif Kahraman-Gokalp, Sadettin Demirel, and Uğur Gündüz. 2024. Exploring the surge of negativity during the COVID-19 pandemic: computational text and sentiment analysis across eight newsrooms’ tweets. Atlantic Journal of Communication 32, 2 (2024), 298–324

work page 2024
[39]

Mayank Kejriwal, Qile Wang, Hongyu Li, and Lu Wang. 2021. An empirical study of emoji usage on Twitter in linguistic and national contexts. Online Social Networks and Media 24 (2021), #100149. 30 Mahrous, A., Schneider, J., and Di Pietro, R

work page 2021
[40]

Feyza Duman Keles, Pruthuvi Mahesakya Wijewardena, and Chinmay Hegde. 2023. On the computational complexity of self-attention. In Inter- national Conference on Algorithmic Learning Theory . PMLR, Westminster, UK, 597–619

work page 2023
[41]

Amit Khan, Dipankar Majumdar, and Bikromadittya Mondal. 2025. Sentiment analysis of emoji fused reviews using machine learning and Bert. Scientific Reports 15, 1 (2025), 7538

work page 2025
[42]

Mikolaj Kulakowski and Flavius Frasincar. 2023. Sentiment Classification of Cryptocurrency-Related Social Media Posts. IEEE Intelligent Systems 38, 4 (2023), 5–9

work page 2023
[43]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692v1. , 13 pages

work page internal anchor Pith review Pith/arXiv arXiv 2019
[44]

Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of finance 66, 1 (2011), 35–65

work page 2011
[45]

Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-Collados. 2022. TimeLMs: Diachronic language models from Twitter. arXiv preprint arXiv:2202.03829. , 10 pages

work page arXiv 2022
[46]

Manish Barath Mahendran, Aswin Kumar Gokul, Poornima Lakshmi, and S Pavithra. 2025. Comparative Advances in Financial Sentiment Analysis: A Review of BERT, FinBert, and Large Language Models. In 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT). IEEE, New York City, NY, USA, 39–45

work page 2025
[47]

Nader Mahmoudi, Paul Docherty, and Pablo Moscato. 2018. Deep neural networks understand investors better. Decision Support Systems 112 (2018), 23–34

work page 2018
[48]

Nader Mahmoudi, Łukasz P Olech, and Paul Docherty. 2022. A comprehensive study of domain-specific emoji meanings in sentiment classification. Computational Management Science 19, 2 (2022), 159–197

work page 2022
[49]

Ahmed Mahrous, Jens Schneider, and Roberto Di Pietro. 2023. The Role of Emojis in Sentiment Analysis of Financial Microblogs. In 2023 Fourth International Conference on Intelligent Data Science Technologies and Applications (IDSTA) . IEEE, New York City, NY, USA, 76–84

work page 2023
[50]

Henry Mann and Donald Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger then the Other. Annals of Mathematical Statistics 18, 1 (1947), 50–60

work page 1947
[51]

Matthews

B.W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 2 (1975), 442–451

work page 1975
[52]

Sergey Nasekin and Cathy Yi-Hsuan Chen. 2020. Deep learning-based cryptocurrency sentiment construction. Digital Finance 2, 1 (2020), 39–67

work page 2020
[53]

Hibaq Omar and Lester Allan Lasrado. 2023. Uncover Social Media Interactions On Cryptocurrencies Using Social Set Analysis (SSA). Procedia Computer Science 219 (2023), 161–169

work page 2023
[54]

Thomas Renault. 2020. Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance 2, 1-2 (2020), 1–13

work page 2020
[55]

Felix Reschke and Jan-Oliver Strych. 2024. Emojis and stock returns. Review of Behavioral Finance 16, 2 (2024), 223–233

work page 2024
[56]

Alexander Robertson, Farhana Ferdousi Liza, Dong Nguyen, Barbara McGillivray, and Scott A. Hale. 2021. Semantic Journeys: Quantifying Change in Emoji Meaning from 2012–2018. , 10 pages

work page 2021
[57]

Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 task 4: Sentiment analysis in Twitter. arXiv preprint arXiv:1912.00741. , 17 pages

work page arXiv 2019
[58]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. , 5 pages

work page 2019
[59]

Claude E Shannon. 1951. Prediction and entropy of printed English. Bell system technical journal 30, 1 (1951), 50–64

work page 1951
[60]

Mohammad Shiri, Oleksii Dubovyk, Golbarg Roghaniaraghi, and Sampath Jayarathna. 2023. Meme it Up: Patterns of Emoji Usage on Twitter. In 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI) . IEEE, New York City, NY, USA, 205–210

work page 2023
[61]

SimilarWeb. 2023. StockTwits Website Traffic Statistics. SimilarWeb. https://www.similarweb.com/website/stocktwits.com/#geography Accessed on October 28, 2025

work page 2023
[62]

Ivan Smirnov. 2017. The digital flynn effect: Complexity of posts on social media increases over time. InInternational Conference on Social Informatics. Springer, 24–30

work page 2017
[63]

Nickolay Smirnov. 1948. Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19, 2 (1948), 279–281

work page 1948
[64]

Spärck Jones

K. Spärck Jones. 1972. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation 28, 1 (1972), 11––21

work page 1972
[65]

Tim Koornstra Stephan Akkerman. 2023. FinTwitBERT: A Specialized Language Model for Financial Tweets. https://github.com/TimKoornstra/ FinTwitBERT. Accessed October 28, 2025

work page 2023
[66]

Domonkos F Vamossy. 2021. Investor emotions and earnings announcements. Journal of Behavioral and Experimental Finance 30 (2021), 100474

work page 2021
[67]

A Maurits van der Veen and Erik Bleich. 2025. The advantages of lexicon-based sentiment analysis in an age of machine learning. PloS one 20, 1 (2025), e0313092

work page 2025
[68]

Moritz Wilksch and Olga Abramova. 2023. PyFin-sentiment: Towards a machine-learning-based model for deriving sentiment from financial tweets. International Journal of Information Management Data Insights 3, 1 (2023), #100171

work page 2023
[69]

Giulio Zhou, Sydelle De Souza, Ella Markham, Oghenetekevwe Kwakpovwe, and Sumin Zhao. 2024. Semantics and Sentiment: Cross-lingual Variations in Emoji Use. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media 31...

work page doi:10.18653/v1/2024.emnlp-main.1041 2024
[70]

Xiaorui Zuo, Yao-Tsung Chen, and Wolfgang Härdle. 2024. Emoji Driven Crypto Assets Market Reactions. arXiv preprint arXiv:2402.10481. , 29 pages

work page arXiv 2024

[1] [1]

arms race

Matteo Aquilina, Eric Budish, and Peter O’neill. 2022. Quantifying the high-frequency trading “arms race”. The Quarterly Journal of Economics 137, 1 (2022), 493–564

work page 2022

[2] [2]

Dogu Araci. 2019. FinBERT: Financial sentiment analysis with pre-trained language models. , 10 pages

work page 2019

[3] [3]

Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan, Kaiser Lukasz, and Polosukhin Illia. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems (NeurIPS) , Vol. 30. Curran Associates, Inc., Red Hook, NY, USA, 11 pages

work page 2017

[4] [4]

Francesco Barbieri, Jose Camacho-Collados, Luis Espinosa-Anke, and Leonardo Neves. 2020. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. In Proceedings of Findings of EMNLP . Association of Computational Linguistics, Stroudsburg, PA, USA, 1644– 1650

work page 2020

[5] [5]

Usman Bashir, Umar Nawaz Kayani, Shoaib Khan, Ali Polat, Muntazir Hussain, and Ahmet Faruk Aysan. 2024. Investor sentiment and stock price crash risk: The mediating role of analyst herding. Computers in Human Behavior Reports 13 (2024), 100371

work page 2024

[6] [6]

Hadis Bashiri and Hassan Naderi. 2024. Comprehensive review and comparative analysis of transformer models in sentiment analysis. Knowledge and Information Systems 66, 12 (2024), 7305–7361

work page 2024

[7] [7]

Vance W Berger and YanYan Zhou. 2014. Kolmogorov–Smirnov test: Overview. Wiley statsref: Statistics reference online (2014)

work page 2014

[8] [8]

Mohamed Reda Bouadjenek, Scott Sanner, and Ga Wu. 2023. A user-centric analysis of social media for stock market prediction. ACM Transactions on the Web 17, 2 (2023), 1–22

work page 2023

[9] [9]

Keith Broni. 2021. Emoji Use at All-Time High. Emojipedia Blog. https://blog.emojipedia.org/emoji-use-at-all-time-high/ Accessed on October 28, 2025

work page 2021

[10] [10]

Keith Broni. 2023. 10 Years of Emojipedia. Emojipedia Blog. https://blog.emojipedia.org/10-years-of-emojipedia-10-years-of-record-breaking- emoji-popularity/ Accessed on October 28, 2025

work page 2023

[11] [11]

Jason Brownlee. 2021. Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost. Machine Learning Mastery. https:// machinelearningmastery.com/gradient-boosting-with-scikit-learn-xgboost-lightgbm-and-catboost/ Accessed on October 28, 2025

work page 2021

[12] [12]

Jeremy Burge. 2017. 5 Billion Emojis Sent Daily on Messenger. Emojipedia Blog. https://blog.emojipedia.org/5-billion-emojis-sent-daily-on- messenger/ Accessed on October 28, 2025. FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media 29

work page 2017

[13] [13]

Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2018. NTUSD-Fin: a market sentiment dictionary for financial social media data applica- tions. In Proceedings of the 1st Financial Narrative Processing Workshop (FNP 2018). European Language Resources Association (ELRA), Paris, France, 37–43

work page 2018

[14] [14]

Cathy Yi-Hsuan Chen, Li Guo, and Thomas Renault. 2019. What Makes Cryptocurrencies Special? Investor Sentiment and Return Predictability. Investor Sentiment and Return Predictability (2019), 36 pages

work page 2019

[15] [15]

Yihua Chen, Xingchen Yang, Hannah Howman, and Ruth Filik. 2024. Individual differences in emoji comprehension: Gender, age, and culture. Plos one 19, 2 (2024), e0297379

work page 2024

[16] [16]

Z. Chen, X. Lu, W. Ai, H. Li, Mei, Q., and X. Liu. 2018. Through a gender lens: Learning usage patterns of emojis from large-scale android users. In Proc. World Wide Web Conference. ACM, New York, NY, 763–772

work page 2018

[17] [17]

Emre Cicekyurt and Gokhan Bakal. 2025. Enhancing Sentiment Analysis in Stock Market Tweets Through BERT-Based Knowledge Transfer. Computational Economics 48 (2025), 1–23. Issue 1

work page 2025

[18] [18]

Harald Cramér. 1946. Mathematcal Methods of Statistics . Princeton University Press, Princeton, NJ, USA, Chapter 21. The two-dimensional case, 282

work page 1946

[19] [19]

Jennifer Daniel. 2021. Emoji Frequency. Unicode Consortium. https://home.unicode.org/emoji/emoji-frequency/ Accessed on October 28, 2025

work page 2021

[20] [20]

Nguyen Dat Quoc, Vu Thanh, and Nguyen Anh Tuan. 2020. BERTweet: A pre-trained language model for English Tweets. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations . Association of Computational Linguistics, Stroudsburg, PA, USA, 9–14

work page 2020

[21] [21]

Sadettin Demirel, Elif Kahraman, and Uğur Gündüz. 2024. A text mining analysis of the change in status of the Hagia Sophia on Twitter: the political discourse and its reflections on the public opinion. Atlantic Journal of Communication 32, 1 (2024), 63–90

work page 2024

[22] [22]

Sadettin Demirel, Elif Kahraman-Gokalp, and Uğur Gündüz. 2025. From optimism to concern: Unveiling sentiments and perceptions surrounding ChatGPT on Twitter. International Journal of Human–Computer Interaction 41, 12 (2025), 7292–7314

work page 2025

[23] [23]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv Preprint arXiv:1810.04805v2. , 16 pages

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Thomas Dimson. 2015. Emojineering Part 1: Machine Learning for Emoji Trends. Instagram Engineering Blog. https://instagram-engineering. com/emojineering-part-1-machine-learning-for-emoji-trends-7f5f9cb979ad Accessed on October 28, 2025

work page 2015

[25] [25]

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. 2024. Financial sentiment analysis: Techniques and applications. Comput. Surveys 56, 9 (2024), 1–42

work page 2024

[26] [26]

European Parliament and the Council of the European Union. 2016. Regulaton (EU) 2016/679 (General Date Protection Regulation). https://gdpr- info.eu/. Official Journal of the European Union (2016), L119/1–88

work page 2016

[27] [27]

Ramiro H Gálvez and Agustín Gravano. 2017. Assessing the usefulness of online message board mining in automatic stock prediction systems. Journal of Computational Science 19 (2017), 43–56

work page 2017

[28] [28]

Nadia Mushtaq Gardazi, Ali Daud, Muhammad Kamran Malik, Amal Bukhari, Tariq Alsahfi, and Bader Alshemaimri. 2025. BERT applications in natural language processing: a review. Artificial Intelligence Review 58, 6 (2025), 1–49

work page 2025

[29] [29]

Felix Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (2000), 2451–2471

work page 2000

[30] [30]

Uğur Gündüz and Sadettin Demirel. 2025. Metaverse-related perceptions and sentiments on Twitter: evidence from text mining and network analysis. Electronic Commerce Research 25, 3 (2025), 1453–1483

work page 2025

[31] [31]

Sharath Chandra Guntuku, Mingyang Li, Louis Tay, and Lyle H. Ungar. 2019. Studying Cultural Differences in Emoji Usage across the East and the West. In Proceedings of the International AAAI Conference on Web and Social Media , Vol. 13. AAAI, Westminster, UK, 226–235

work page 2019

[32] [32]

Sharath Chandra Guntuku Guntuku, Mingyang Li, Louis Tay, and Lyle H. Ungar. 2019. Studying Cultural Differences in Emoji Usage across the East and the West. In International AAAI Conference on Web and Social Media (ICWSM) . AAAI, Westminster, UK, 226–235

work page 2019

[33] [33]

Ahmed Hazourli. 2022. FinBERT: a pretrained language model for financial text mining. International Joint Conference on Artificial Intelligence Organization 2 (2022), 4513–4519

work page 2022

[34] [34]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780

work page 1997

[35] [35]

Clayton Hutto and Eric Gilbert. 2014. V ADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media , Vol. 8. AAAI, Washington, D.C., USA, 216–225. Issue 1

work page 2014

[36] [36]

Wen jun Gu, Yi hao Zhong, Shi zun Li, Chang song Wei, Li ting Dong, Zhuo yue Wang, and Chao Yan. 2024. Predicting stock prices with finbert- lstm: Integrating news sentiment analysis. In Proceedings of the 2024 8th International Conference on Cloud and Big Data Computing . ACM, New York City, NY, USA, 67–72

work page 2024

[37] [37]

Elif Kahraman, Sadettin Demirel, and Uğur Gündüz. 2025. COVID-19 vaccines in twitter ecosystem: Analyzing perceptions and attitudes by sentiment and text analysis method. Journal of Public Health 33, 5 (2025), 965–979

work page 2025

[38] [38]

Elif Kahraman-Gokalp, Sadettin Demirel, and Uğur Gündüz. 2024. Exploring the surge of negativity during the COVID-19 pandemic: computational text and sentiment analysis across eight newsrooms’ tweets. Atlantic Journal of Communication 32, 2 (2024), 298–324

work page 2024

[39] [39]

Mayank Kejriwal, Qile Wang, Hongyu Li, and Lu Wang. 2021. An empirical study of emoji usage on Twitter in linguistic and national contexts. Online Social Networks and Media 24 (2021), #100149. 30 Mahrous, A., Schneider, J., and Di Pietro, R

work page 2021

[40] [40]

Feyza Duman Keles, Pruthuvi Mahesakya Wijewardena, and Chinmay Hegde. 2023. On the computational complexity of self-attention. In Inter- national Conference on Algorithmic Learning Theory . PMLR, Westminster, UK, 597–619

work page 2023

[41] [41]

Amit Khan, Dipankar Majumdar, and Bikromadittya Mondal. 2025. Sentiment analysis of emoji fused reviews using machine learning and Bert. Scientific Reports 15, 1 (2025), 7538

work page 2025

[42] [42]

Mikolaj Kulakowski and Flavius Frasincar. 2023. Sentiment Classification of Cryptocurrency-Related Social Media Posts. IEEE Intelligent Systems 38, 4 (2023), 5–9

work page 2023

[43] [43]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692v1. , 13 pages

work page internal anchor Pith review Pith/arXiv arXiv 2019

[44] [44]

Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of finance 66, 1 (2011), 35–65

work page 2011

[45] [45]

Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-Collados. 2022. TimeLMs: Diachronic language models from Twitter. arXiv preprint arXiv:2202.03829. , 10 pages

work page arXiv 2022

[46] [46]

Manish Barath Mahendran, Aswin Kumar Gokul, Poornima Lakshmi, and S Pavithra. 2025. Comparative Advances in Financial Sentiment Analysis: A Review of BERT, FinBert, and Large Language Models. In 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT). IEEE, New York City, NY, USA, 39–45

work page 2025

[47] [47]

Nader Mahmoudi, Paul Docherty, and Pablo Moscato. 2018. Deep neural networks understand investors better. Decision Support Systems 112 (2018), 23–34

work page 2018

[48] [48]

Nader Mahmoudi, Łukasz P Olech, and Paul Docherty. 2022. A comprehensive study of domain-specific emoji meanings in sentiment classification. Computational Management Science 19, 2 (2022), 159–197

work page 2022

[49] [49]

Ahmed Mahrous, Jens Schneider, and Roberto Di Pietro. 2023. The Role of Emojis in Sentiment Analysis of Financial Microblogs. In 2023 Fourth International Conference on Intelligent Data Science Technologies and Applications (IDSTA) . IEEE, New York City, NY, USA, 76–84

work page 2023

[50] [50]

Henry Mann and Donald Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger then the Other. Annals of Mathematical Statistics 18, 1 (1947), 50–60

work page 1947

[51] [51]

Matthews

B.W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 2 (1975), 442–451

work page 1975

[52] [52]

Sergey Nasekin and Cathy Yi-Hsuan Chen. 2020. Deep learning-based cryptocurrency sentiment construction. Digital Finance 2, 1 (2020), 39–67

work page 2020

[53] [53]

Hibaq Omar and Lester Allan Lasrado. 2023. Uncover Social Media Interactions On Cryptocurrencies Using Social Set Analysis (SSA). Procedia Computer Science 219 (2023), 161–169

work page 2023

[54] [54]

Thomas Renault. 2020. Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance 2, 1-2 (2020), 1–13

work page 2020

[55] [55]

Felix Reschke and Jan-Oliver Strych. 2024. Emojis and stock returns. Review of Behavioral Finance 16, 2 (2024), 223–233

work page 2024

[56] [56]

Alexander Robertson, Farhana Ferdousi Liza, Dong Nguyen, Barbara McGillivray, and Scott A. Hale. 2021. Semantic Journeys: Quantifying Change in Emoji Meaning from 2012–2018. , 10 pages

work page 2021

[57] [57]

Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 task 4: Sentiment analysis in Twitter. arXiv preprint arXiv:1912.00741. , 17 pages

work page arXiv 2019

[58] [58]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. , 5 pages

work page 2019

[59] [59]

Claude E Shannon. 1951. Prediction and entropy of printed English. Bell system technical journal 30, 1 (1951), 50–64

work page 1951

[60] [60]

Mohammad Shiri, Oleksii Dubovyk, Golbarg Roghaniaraghi, and Sampath Jayarathna. 2023. Meme it Up: Patterns of Emoji Usage on Twitter. In 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI) . IEEE, New York City, NY, USA, 205–210

work page 2023

[61] [61]

SimilarWeb. 2023. StockTwits Website Traffic Statistics. SimilarWeb. https://www.similarweb.com/website/stocktwits.com/#geography Accessed on October 28, 2025

work page 2023

[62] [62]

Ivan Smirnov. 2017. The digital flynn effect: Complexity of posts on social media increases over time. InInternational Conference on Social Informatics. Springer, 24–30

work page 2017

[63] [63]

Nickolay Smirnov. 1948. Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19, 2 (1948), 279–281

work page 1948

[64] [64]

Spärck Jones

K. Spärck Jones. 1972. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation 28, 1 (1972), 11––21

work page 1972

[65] [65]

Tim Koornstra Stephan Akkerman. 2023. FinTwitBERT: A Specialized Language Model for Financial Tweets. https://github.com/TimKoornstra/ FinTwitBERT. Accessed October 28, 2025

work page 2023

[66] [66]

Domonkos F Vamossy. 2021. Investor emotions and earnings announcements. Journal of Behavioral and Experimental Finance 30 (2021), 100474

work page 2021

[67] [67]

A Maurits van der Veen and Erik Bleich. 2025. The advantages of lexicon-based sentiment analysis in an age of machine learning. PloS one 20, 1 (2025), e0313092

work page 2025

[68] [68]

Moritz Wilksch and Olga Abramova. 2023. PyFin-sentiment: Towards a machine-learning-based model for deriving sentiment from financial tweets. International Journal of Information Management Data Insights 3, 1 (2023), #100171

work page 2023

[69] [69]

Giulio Zhou, Sydelle De Souza, Ella Markham, Oghenetekevwe Kwakpovwe, and Sumin Zhao. 2024. Semantics and Sentiment: Cross-lingual Variations in Emoji Use. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media 31...

work page doi:10.18653/v1/2024.emnlp-main.1041 2024

[70] [70]

Xiaorui Zuo, Yao-Tsung Chen, and Wolfgang Härdle. 2024. Emoji Driven Crypto Assets Market Reactions. arXiv preprint arXiv:2402.10481. , 29 pages

work page arXiv 2024