pith. sign in

arxiv: 2603.05917 · v3 · pith:TRU7OJY3new · submitted 2026-03-06 · 💻 cs.LG · cs.AI· q-fin.ST

Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis

Pith reviewed 2026-05-21 11:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-fin.ST
keywords stock market predictionnode transformerBERT sentiment analysisgraph neural networksfinancial time series forecastingS&P 500 stocksattention mechanismsinter-stock dependencies
0
0 comments X

The pith

A node transformer on a market graph fused with BERT sentiment reaches 0.80 percent MAPE for one-day stock forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to establish that modeling stocks as nodes in a graph and fusing BERT-derived sentiment from social media into a node transformer produces more accurate short-term price predictions than conventional methods. The graph edges encode company relationships such as sectors and supply chains, while the transformer handles both time evolution and cross-stock influences. Sentiment adds behavioral context that proves especially helpful around earnings events. A reader would care because even modest gains in forecast precision matter for decisions in markets filled with noise and hidden interconnections.

Core claim

The paper claims that representing the stock market as a graph with stocks as nodes and edges for sectoral affiliations, correlated price movements, and supply chain connections, then processing historical data with a node transformer while fusing sentiment extracted by a fine-tuned BERT model through attention mechanisms, yields superior one-day-ahead forecasts. Experiments on 20 S&P 500 stocks from January 1982 to March 2025 produce a mean absolute percentage error of 0.80 percent, compared with 1.20 percent for ARIMA and 1.00 percent for LSTM. Sentiment analysis accounts for a 10 percent overall error reduction and 25 percent during earnings announcements, the graph architecture adds a 15

What carries the argument

Node transformer architecture applied to a graph of stocks whose edges encode sectoral, correlation, and supply-chain relationships, with attention-based fusion of BERT sentiment features from social media posts.

If this is right

  • The model achieves a mean absolute percentage error of 0.80 percent for one-day-ahead predictions on the 20 tested S&P 500 stocks.
  • Sentiment integration reduces overall prediction error by 10 percent and by 25 percent during earnings announcements.
  • The graph-based architecture contributes an additional 15 percent improvement by capturing inter-stock dependencies.
  • Directional accuracy reaches 65 percent and error stays lower than baselines during high-volatility periods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-plus-sentiment structure could be redefined for other asset classes such as commodities by substituting appropriate relationship edges.
  • Real-time updating of the graph with fresh supply-chain data might support live trading systems that react to changing company linkages.
  • Testing the model on multi-day horizons would show how long the relational and sentiment signals continue to provide value.

Load-bearing premise

The constructed graph edges and the social media sentiment scores supply genuine forward-looking predictive information rather than reflecting past patterns or introducing data leakage.

What would settle it

Re-training and testing the model on data from April 2025 onward to verify whether the 10 percent error reduction from sentiment and the 15 percent gain from the graph structure still appear on an out-of-sample period with no overlap to the original 1982-2025 dataset.

Figures

Figures reproduced from arXiv: 2603.05917 by Hussein Al Osman, Mahtab Haj Ali, Mohammad Al Ridhawi.

Figure 1
Figure 1. Figure 1: Feature engineering pipeline. Raw OHLCV data is processed through [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System architecture. Price data, volume, and technical indicators are [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Single transformer layer architecture. Input passes through multi-head [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: BERT sentiment extraction pipeline. Raw social media posts are [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Adaptive fusion mechanism. The weighting coefficient [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Stock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods, including fundamental analysis and technical indicators, often fail to capture the intricate patterns and cross-sectional dependencies inherent in financial markets. This paper presents an integrated framework combining a node transformer architecture with BERT-based sentiment analysis for stock price forecasting. The proposed model represents the stock market as a graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT model extracts sentiment information from social media posts and combines it with quantitative market features through attention-based fusion mechanisms. The node transformer processes historical market data while capturing both temporal evolution and cross-sectional dependencies among stocks. Experiments conducted on 20 S&P 500 stocks spanning January 1982 to March 2025 demonstrate that the integrated model achieves a mean absolute percentage error (MAPE) of 0.80% for one-day-ahead predictions, compared to 1.20% for ARIMA and 1.00% for LSTM. The inclusion of sentiment analysis reduces prediction error by 10% overall and 25% during earnings announcements, while the graph-based architecture contributes an additional 15% improvement by capturing inter-stock dependencies. Directional accuracy reaches 65% for one-day forecasts. Statistical validation through paired t-tests confirms the significance of these improvements (p < 0.05 for all comparisons). The model maintains lower error during high-volatility periods, achieving MAPE of 1.50% while baseline models range from 1.60% to 2.10%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an integrated node transformer architecture combined with BERT-based sentiment analysis for one-day-ahead stock price prediction. Stocks are represented as nodes in a graph with edges encoding sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT extracts sentiment from social media, fused via attention mechanisms with quantitative features. On 20 S&P 500 stocks from January 1982 to March 2025, the model reports a MAPE of 0.80%, outperforming ARIMA (1.20%) and LSTM (1.00%), with sentiment reducing error by 10% overall (25% during earnings) and the graph contributing an additional 15% improvement; directional accuracy is 65% and paired t-tests show p < 0.05 significance.

Significance. If the central results hold after addressing potential leakage, the work would offer a concrete demonstration of how graph transformers can capture cross-sectional dependencies while incorporating behavioral signals from NLP, which could inform more robust forecasting approaches in noisy, non-stationary financial settings.

major comments (2)
  1. Abstract: The description of graph edges that 'capture relationships including ... correlated price movements' provides no detail on whether correlations are computed over the full 1982-2025 window or via rolling windows using only data available at each forecast origin. If the former, future price information contaminates the graph for earlier dates, directly violating the one-day-ahead setup and rendering the reported 15% graph contribution and overall MAPE claims unreliable.
  2. Abstract: No information is given on training/validation/test splits, hyperparameter tuning, or procedures to control for look-ahead bias in either the graph construction or the sentiment data alignment. In non-stationary financial series, these omissions make it impossible to determine whether the 0.80% MAPE, the 10%/25% sentiment reductions, or the t-test results reflect out-of-sample predictive power rather than in-sample fitting.
minor comments (2)
  1. Abstract: The fusion mechanism is described only at a high level ('attention-based fusion mechanisms'); adding a brief equation or diagram would clarify how sentiment embeddings are combined with node features.
  2. The abstract reports results on 20 stocks but does not specify selection criteria or whether results generalize beyond this small sample; a sensitivity table across different stock subsets would strengthen the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their valuable comments, which help improve the clarity and rigor of our manuscript regarding potential data leakage and experimental details. We address each concern below and will incorporate the necessary revisions.

read point-by-point responses
  1. Referee: Abstract: The description of graph edges that 'capture relationships including ... correlated price movements' provides no detail on whether correlations are computed over the full 1982-2025 window or via rolling windows using only data available at each forecast origin. If the former, future price information contaminates the graph for earlier dates, directly violating the one-day-ahead setup and rendering the reported 15% graph contribution and overall MAPE claims unreliable.

    Authors: We appreciate this observation on a potential source of look-ahead bias. Upon review, the original manuscript lacked explicit specification of the correlation calculation window. In the revised version, we will clarify that edge correlations are derived from rolling windows using only past data up to the forecast date. This ensures compliance with the one-day-ahead protocol and preserves the validity of the reported improvements from the graph structure. Additional details and justification will be added to the methods section. revision: yes

  2. Referee: Abstract: No information is given on training/validation/test splits, hyperparameter tuning, or procedures to control for look-ahead bias in either the graph construction or the sentiment data alignment. In non-stationary financial series, these omissions make it impossible to determine whether the 0.80% MAPE, the 10%/25% sentiment reductions, or the t-test results reflect out-of-sample predictive power rather than in-sample fitting.

    Authors: We agree that these procedural details are crucial for assessing the robustness of results in financial forecasting. We will revise the paper to include a dedicated subsection on the experimental protocol, specifying the chronological train-validation-test split, the use of walk-forward validation for hyperparameter selection, and safeguards against look-ahead bias in graph and sentiment feature construction. This will demonstrate that our evaluations are strictly out-of-sample. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical ML study that trains a node transformer plus BERT model on historical stock data and reports test-set performance metrics such as MAPE and percentage improvements. No mathematical derivation chain, equations, or first-principles results are present that reduce to self-definition or to fitted inputs by construction. The graph construction and sentiment fusion are described as model components whose outputs are evaluated on held-out forecasts; these steps do not collapse into the input data by the paper's own statements. Standard self-citations, if any, are not load-bearing for the central empirical claims. The reported results therefore remain self-contained experimental outcomes rather than tautological restatements of the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Since only the abstract is available, specific free parameters and axioms cannot be exhaustively listed from the full methods; the above are inferred from the described approach.

free parameters (2)
  • Transformer attention weights and BERT fine-tuning parameters
    Numerous parameters optimized during training on the historical stock and sentiment data.
  • Graph edge weights for inter-stock relationships
    Derived from correlations and affiliations, likely involving thresholds or scaling factors chosen to fit the data.
axioms (2)
  • domain assumption Historical price movements and social media sentiment contain exploitable patterns for future price prediction
    Fundamental premise underlying the entire forecasting framework.
  • domain assumption The defined graph structure accurately represents relevant dependencies between stocks
    Invoked when constructing edges for sectoral, correlation, and supply chain relations.

pith-pipeline@v0.9.0 · 5842 in / 1689 out tokens · 89061 ms · 2026-05-21T11:21:01.384893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    node transformer architecture... graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections... BERT-based sentiment analysis

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Efficient capital markets: A review of theory and empirical work,

    E. F. Fama, “Efficient capital markets: A review of theory and empirical work,”The Journal of Finance, vol. 25, no. 2, pp. 383–417, 1970

  2. [2]

    Stock market prediction via deep learning techniques: A survey,

    J. Zhang, Y . Teng, and W. Chen, “Stock market prediction via deep learning techniques: A survey,”arXiv preprint arXiv:2212.12717, 2022

  3. [3]

    Market efficiency in the age of big data,

    I. W. R. Martin and S. Nagel, “Market efficiency in the age of big data,” Journal of Financial Economics, vol. 145, no. 1, pp. 154–177, 2022

  4. [4]

    Information in financial markets and its real effects,

    I. Goldstein, “Information in financial markets and its real effects,” Review of Finance, vol. 27, no. 1, pp. 1–32, 2023

  5. [5]

    Prospect theory: An analysis of decision under risk,

    D. Kahneman and A. Tversky, “Prospect theory: An analysis of decision under risk,”Econometrica, vol. 47, no. 2, pp. 263–291, 1979

  6. [6]

    J. J. Murphy,Technical analysis of the financial markets: A comprehen- sive guide to trading methods and applications. New York Institute of Finance, 1999

  7. [7]

    Deep learning with long short-term memory networks for financial market predictions,

    T. Fischer and C. Krauss, “Deep learning with long short-term memory networks for financial market predictions,”European Journal of Oper- ational Research, vol. 270, no. 2, pp. 654–669, 2018

  8. [8]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30, 2017, pp. 5998–6008

  9. [9]

    A comprehensive survey on graph neural networks,

    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y . Philip, “A comprehensive survey on graph neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2020

  10. [10]

    Nodeformer: A scalable graph structure learning transformer for node classification,

    Q. Wu, W. Zhao, Z. Li, D. P. Wipf, and J. Yan, “Nodeformer: A scalable graph structure learning transformer for node classification,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 27 387– 27 401

  11. [11]

    Comparison between autoregressive integrated moving average and long short-term memory models for stock price prediction,

    P. R. Low and E. Sakk, “Comparison between autoregressive integrated moving average and long short-term memory models for stock price prediction,”IAES International Journal of Artificial Intelligence, vol. 12, no. 4, pp. 1828–1835, 2023

  12. [12]

    The ARIMA model for the indonesia stock price,

    S. T. Wahyudi, “The ARIMA model for the indonesia stock price,” International Journal of Economics and Management, vol. 11, pp. 223– 236, 2017

  13. [13]

    A na ¨ıve SVM-KNN based stock market trend reversal analysis for Indian benchmark indices,

    R. K. Nayak, D. Mishra, and A. K. Rath, “A na ¨ıve SVM-KNN based stock market trend reversal analysis for Indian benchmark indices,” Applied Soft Computing, vol. 35, pp. 670–680, 2015

  14. [14]

    Prediction of stock index of Tata Steel using hybrid machine learning based optimization techniques,

    M. Siddique and D. Panda, “Prediction of stock index of Tata Steel using hybrid machine learning based optimization techniques,”International Journal of Recent Technology and Engineering, vol. 8, pp. 3186–3193, 2019

  15. [15]

    Predicting the direction of stock market prices using tree-based classifiers,

    S. Basak, S. Kar, S. Saha, L. Khaidem, and S. R. Dey, “Predicting the direction of stock market prices using tree-based classifiers,”The North American Journal of Economics and Finance, vol. 47, pp. 552–567, 2019

  16. [16]

    Convolutional neural network for stock trading using technical indicators,

    S. K. Chandar, “Convolutional neural network for stock trading using technical indicators,”Automated Software Engineering, vol. 29, pp. 1– 14, 2022

  17. [17]

    Stock market trend prediction using high-order information of time series,

    M. Wen, P. Li, L. Zhang, and Y . Chen, “Stock market trend prediction using high-order information of time series,”IEEE Access, vol. 7, pp. 28 299–28 308, 2019

  18. [18]

    Recurrent neural networks approach to the financial forecast of Google assets,

    L. Di Persio and O. Honchar, “Recurrent neural networks approach to the financial forecast of Google assets,”International Journal of Mathematics and Computers in Simulation, vol. 11, pp. 7–13, 2017

  19. [19]

    Hy- brid deep learning model for stock price prediction,

    M. A. Hossain, R. Karim, R. Thulasiram, N. Bruce, and Y . Wang, “Hy- brid deep learning model for stock price prediction,”IEEE Symposium Series on Computational Intelligence, pp. 1837–1844, 2018

  20. [20]

    Stacked deep learning structure with bidirectional long-short term memory for stock market prediction,

    Y . Xu, L. Chhim, B. Zhenget al., “Stacked deep learning structure with bidirectional long-short term memory for stock market prediction,” in Communications in Computer and Information Science, vol. 1265, 2020, pp. 447–460

  21. [21]

    A new deep network model for stock price prediction,

    M. Liu, H. Sheng, N. Zhanget al., “A new deep network model for stock price prediction,” inInternational Conference on Machine Learning for Cyber Security, 2022, pp. 413–426

  22. [22]

    A novel graph convolutional feature based convolutional neural network for stock trend prediction,

    W. Chen, M. Jiang, W.-G. Zhang, and Z. Chen, “A novel graph convolutional feature based convolutional neural network for stock trend prediction,”Information Sciences, vol. 556, pp. 67–94, 2021

  23. [23]

    MG-Conv: A spa- tiotemporal multi-graph convolutional neural network for stock market index trend prediction,

    C. Wang, H. Liang, B. Wang, X. Cui, and Y . Xu, “MG-Conv: A spa- tiotemporal multi-graph convolutional neural network for stock market index trend prediction,”Computers and Electrical Engineering, vol. 103, p. 108285, 2022

  24. [24]

    A financial forecasting model based on transformer architecture,

    S. Li and Z. Qian, “A financial forecasting model based on transformer architecture,”IEEE International Conference on Big Data, pp. 5384– 5386, 2019

  25. [25]

    A CNN-BiLSTM-AM method for stock price prediction,

    W. Lu, J. Li, J. Wanget al., “A CNN-BiLSTM-AM method for stock price prediction,”Neural Computing and Applications, vol. 33, pp. 4741–4753, 2021

  26. [26]

    SemEval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news,

    K. Cortis, A. Freitas, T. Daudert, M. Huerlimann, M. Zarrouk, S. Hand- schuh, and B. Davis, “SemEval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news,” inProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017, pp. 519–535

  27. [27]

    BERT: Pre- training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inProceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 4171–4186. 18

  28. [28]

    Comparing predictive accuracy,

    F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,” Journal of Business & Economic Statistics, vol. 13, no. 3, pp. 253–263, 1995

  29. [29]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProceedings of the International Conference on Learning Representations, 2017

  30. [30]

    Enhancing stock market prediction with hybrid deep learning: Integrating LSTM, transformer attention, federated learning, and sentiment analysis,

    Y . Nejatbakhsh and M. Aliasgari, “Enhancing stock market prediction with hybrid deep learning: Integrating LSTM, transformer attention, federated learning, and sentiment analysis,”IEEE Access, vol. 14, pp. 3926–3942, 2025

  31. [31]

    EGCN: Entropy-based graph convolutional network for anomalous pattern detection and forecasting in real estate markets,

    D. Le, S. Rajasegarar, W. Luo, T. T. Nguyen, N. V o, Q. Nguyen, and M. Angelova, “EGCN: Entropy-based graph convolutional network for anomalous pattern detection and forecasting in real estate markets,” PLoS ONE, vol. 20, no. 10, p. e0334141, 2025

  32. [32]

    Enhancing real estate prediction with entropy-based pattern analysis and economic sentiment integration,

    D. Le, S. Rajasegarar, W. Luo, T. T. Nguyen, and M. Angelova, “Enhancing real estate prediction with entropy-based pattern analysis and economic sentiment integration,”Engineering Computations, pp. 1–24, 2025. Mohammad Al Ridhawireceived the B.A.Sc. de- gree in computer engineering and the M.Sc. degree in digital transformation and innovation (machine lea...