Visual Chart Representations for Cryptocurrency Regime Prediction: A Systematic Deep Learning Study

Dustin M. Haggett

arxiv: 2605.00875 · v1 · submitted 2026-04-25 · 💻 cs.CV · cs.AI

Visual Chart Representations for Cryptocurrency Regime Prediction: A Systematic Deep Learning Study

Dustin M. Haggett This is my paper

Pith reviewed 2026-05-09 21:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords candlestick chartscryptocurrencyregime predictionconvolutional neural networksimage classificationfinancial time seriesmarket regimesdeep learning

0 comments

The pith

A simple four-layer CNN on raw candlestick charts predicts cryptocurrency market regimes with 0.892 AUC-ROC and beats larger pretrained models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests multiple ways to convert financial charts into images and feed them to deep learning models for classifying bull, bear, or sideways regimes in crypto and stock markets. It compares raw candlestick images against Gramian Angular Field encodings, different chart components, four network architectures, and the effect of ImageNet pretraining across eight experiments on Bitcoin, Ethereum, and S&P 500 data from 2018-2024. The central finding is that a lightweight custom CNN on low-resolution price-only charts delivers the best results while more elaborate setups underperform. A reader would care because this suggests that the visual patterns used by technical traders can be automated effectively with modest computational resources rather than requiring heavy vision models.

Core claim

The paper establishes that a 4-layer CNN trained on raw candlestick chart images using only price data at 128x128 resolution achieves the highest regime classification performance of 0.892 AUC-ROC on the tested datasets, surpassing ResNet18, EfficientNet-B0, Vision Transformer, and multi-channel GAF representations, with transfer learning providing a 4-16% improvement and GradCAM confirming attention on relevant chart features.

What carries the argument

Raw candlestick chart images processed by a lightweight 4-layer convolutional neural network for three-class regime classification.

If this is right

Price-only charts at 128x128 resolution outperform both higher resolutions and encodings that add volume or GAF transformations.
A small custom CNN exceeds the accuracy of larger architectures even after ImageNet transfer learning.
GradCAM visualizations show the model focuses on candlestick patterns consistent with human chart reading.
The performance advantage holds across Bitcoin, Ethereum, and S&P 500 over the 2018-2024 period.
Transfer learning from natural images still adds value despite the domain shift to financial charts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result raises the possibility that domain-specific simple models can replace general-purpose vision transformers for many financial image tasks.
If labels can be produced in real time without lookahead, the method could support live trading signals.
Extending the same pipeline to other timeframes or asset classes would test whether the simplicity advantage persists.
The finding invites direct comparison against non-visual time-series models on the identical regime labels.

Load-bearing premise

The visual patterns visible in past charts remain reliably predictive of future regimes when labels are assigned without using future information.

What would settle it

Testing the same 4-layer CNN on 2025 data or a new asset class and observing AUC-ROC below 0.75 would indicate the patterns do not generalize.

Figures

Figures reproduced from arXiv: 2605.00875 by Dustin M. Haggett.

**Figure 2.** Figure 2: Chart representation methods. Top row: candlestick variants (basic, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Experiment 3: Chart components impact. Price-only candlesticks [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: GradCAM visualization showing model attention for bull (top [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: ROC curve (left) and Precision-Recall curve (right) for baseline model. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Confusion matrix for baseline model. The model correctly identifies [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Complete experimental results dashboard summarizing all eight experiments. Each subplot shows accuracy (blue) and AUC-ROC (green) for different [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Technical traders have long relied on visual analysis of candlestick charts to identify market patterns and predict price movements. While deep learning has achieved remarkable success in image classification, its application to financial chart images remains underexplored. This paper presents a systematic study comparing different visual representations for cryptocurrency regime prediction. We evaluate three image encoding methods (raw candlestick charts, Gramian Angular Fields, and multi-channel GAF), five chart component configurations, four neural network architectures (CNN, ResNet18, EfficientNet-B0, and Vision Transformer), and the impact of ImageNet transfer learning. Through eight controlled experiments on Bitcoin, Ethereum, and S&P 500 data spanning 2018-2024, we identify optimal configurations for visual regime classification. Our results show that a simple 4-layer CNN on raw candlestick charts achieves 0.892 AUC-ROC, outperforming larger pretrained models. Surprisingly, simpler representations (price-only charts, 128x128 resolution) consistently outperform more complex alternatives. We provide interpretability analysis using GradCAM and demonstrate that transfer learning improves performance by 4-16% despite the domain gap between natural images and financial charts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts a systematic comparison of visual encodings (raw candlestick charts, Gramian Angular Fields, multi-channel GAF), chart components, and neural architectures (4-layer CNN, ResNet18, EfficientNet-B0, ViT) with and without ImageNet pretraining for classifying cryptocurrency market regimes (bull/bear/sideways) on 2018-2024 BTC, ETH, and S&P 500 data. It reports that a simple 4-layer CNN on raw 128x128 price-only charts attains the highest AUC-ROC of 0.892, outperforming larger models, and that simpler representations consistently beat more complex ones; GradCAM interpretability is also provided.

Significance. If the regime labels are defined strictly from post-window price movements and all splits are strictly chronological with no leakage, the result would demonstrate that minimal visual CNNs can extract predictive regime signals from raw charts more effectively than transfer learning or sophisticated encodings, offering a practical baseline for financial image-based classification and questioning the necessity of large pretrained models in this domain.

major comments (2)

[§3] §3 (Data Preparation and Labeling): The procedure for assigning bull/bear/sideways regime labels to each chart window is not specified (no return threshold, horizon length, or smoothing method is given). Because the central 0.892 AUC-ROC claim rests on these labels being free of lookahead bias, the absence of an explicit rule prevents verification that the reported performance reflects genuine out-of-sample predictability rather than label contamination from future ticks.
[§4] §4 (Experimental Protocol): No description is provided of the train/test splitting strategy (e.g., walk-forward, purged cross-validation, or strict chronological ordering with no overlapping windows). In non-stationary price series, any non-temporal split or overlap would directly inflate the AUC values cited in the abstract and results, undermining the ranking of the 4-layer CNN over larger architectures.

minor comments (2)

[Table 2] Table 2 and Figure 3: axis labels and legend entries for the eight experiments are inconsistent in naming the image resolutions and channel configurations; this makes it difficult to map the reported AUC numbers back to the exact configurations described in the text.
[§5.3] §5.3 (GradCAM): the heatmaps are shown only for a single example per class; adding quantitative overlap metrics with known technical patterns would strengthen the interpretability claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable feedback on our manuscript. We address each major comment below and will incorporate the necessary clarifications in the revised version.

read point-by-point responses

Referee: [§3] §3 (Data Preparation and Labeling): The procedure for assigning bull/bear/sideways regime labels to each chart window is not specified (no return threshold, horizon length, or smoothing method is given). Because the central 0.892 AUC-ROC claim rests on these labels being free of lookahead bias, the absence of an explicit rule prevents verification that the reported performance reflects genuine out-of-sample predictability rather than label contamination from future ticks.

Authors: We thank the referee for pointing out this lack of detail. We agree that the labeling procedure must be fully specified to allow readers to assess the absence of lookahead bias. In the revised manuscript, we will expand §3 to explicitly describe the regime labeling method, including the exact return thresholds, the horizon length used for labeling, and any smoothing applied. The labels are assigned based solely on price movements following the end of each chart window, ensuring no future information leaks into the input features or labels. revision: yes
Referee: [§4] §4 (Experimental Protocol): No description is provided of the train/test splitting strategy (e.g., walk-forward, purged cross-validation, or strict chronological ordering with no overlapping windows). In non-stationary price series, any non-temporal split or overlap would directly inflate the AUC values cited in the abstract and results, undermining the ranking of the 4-layer CNN over larger architectures.

Authors: We appreciate this comment on the experimental protocol. The current manuscript does not detail the splitting strategy sufficiently. We will revise §4 to clearly state that all data splits are strictly chronological, using a walk-forward validation approach with non-overlapping windows to prevent any temporal leakage. This ensures that training data always precedes test data, which is critical for non-stationary financial time series. We will also include a diagram or pseudocode illustrating the split to enhance clarity. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation on historical data

full rationale

The paper reports AUC-ROC results from training CNNs, ResNets, EfficientNets and ViTs on candlestick chart images to classify bull/bear/sideways regimes. No equations, derivations, parameter fits reused as predictions, or self-citation chains appear in the provided text. All performance numbers are obtained from controlled experiments on 2018-2024 BTC/ETH/S&P500 data; the study is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. Full methods section would be required to audit hyperparameters, loss functions, or labeling rules.

pith-pipeline@v0.9.0 · 5499 in / 1129 out tokens · 28032 ms · 2026-05-09T21:00:07.378514+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Nison,Japanese Candlestick Charting Techniques, 2nd ed

S. Nison,Japanese Candlestick Charting Techniques, 2nd ed. Prentice Hall Press, 2001

work page 2001
[2]

ImageNet classifica- tion with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- tion with deep convolutional neural networks,” inAdvances in Neural Information Processing Systems, 2012, pp. 1097–1105

work page 2012
[3]

Using deep learning neural networks and candlestick chart representation to predict stock market,

R. M. I. Kusuma, T.-T. Ho, W.-C. Kao, Y .-Y . Ou, and K.-L. Hua, “Using deep learning neural networks and candlestick chart representation to predict stock market,”arXiv preprint arXiv:1903.12258, 2019

work page arXiv 1903
[4]

Encoding candlesticks as images for pattern classification using convolutional neural networks,

J.-H. Chen and Y .-C. Tsai, “Encoding candlesticks as images for pattern classification using convolutional neural networks,”Financial Innovation, vol. 6, no. 1, pp. 1–19, 2020

work page 2020
[5]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997
[6]

Deep learning with long short-term memory networks for financial market predictions,

T. Fischer and C. Krauss, “Deep learning with long short-term memory networks for financial market predictions,”European Journal of Oper- ational Research, vol. 270, no. 2, pp. 654–669, 2018

work page 2018
[7]

arXiv preprint arXiv:2001.08317 , year=

N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep transformer mod- els for time series forecasting: The influenza prevalence case,”arXiv preprint arXiv:2001.08317, 2020

work page arXiv 2001
[8]

DPP: Deep predictor for price movement from candlestick charts,

C.-C. Hung and Y .-J. Chen, “DPP: Deep predictor for price movement from candlestick charts,”PLOS ONE, vol. 16, no. 6, p. e0252404, 2021

work page 2021
[9]

Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,

Z. Wang and T. Oates, “Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,” inWorkshops at the AAAI Conference on Artificial Intelligence, 2015

work page 2015
[10]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[11]

From pixels to predictions: Spectrogram and vision transformer for better time series forecasting,

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “From pixels to predictions: Spectrogram and vision transformer for better time series forecasting,” inACM International Conference on AI in Finance, 2023

work page 2023
[12]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 770–778

work page 2016
[13]

EfficientNet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for con- volutional neural networks,” inInt. Conf. Machine Learning, 2019, pp. 6105–6114

work page 2019
[14]

Grad-CAM: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Computer Vision, 2017, pp. 618–626

work page 2017

[1] [1]

Nison,Japanese Candlestick Charting Techniques, 2nd ed

S. Nison,Japanese Candlestick Charting Techniques, 2nd ed. Prentice Hall Press, 2001

work page 2001

[2] [2]

ImageNet classifica- tion with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- tion with deep convolutional neural networks,” inAdvances in Neural Information Processing Systems, 2012, pp. 1097–1105

work page 2012

[3] [3]

Using deep learning neural networks and candlestick chart representation to predict stock market,

R. M. I. Kusuma, T.-T. Ho, W.-C. Kao, Y .-Y . Ou, and K.-L. Hua, “Using deep learning neural networks and candlestick chart representation to predict stock market,”arXiv preprint arXiv:1903.12258, 2019

work page arXiv 1903

[4] [4]

Encoding candlesticks as images for pattern classification using convolutional neural networks,

J.-H. Chen and Y .-C. Tsai, “Encoding candlesticks as images for pattern classification using convolutional neural networks,”Financial Innovation, vol. 6, no. 1, pp. 1–19, 2020

work page 2020

[5] [5]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997

[6] [6]

Deep learning with long short-term memory networks for financial market predictions,

T. Fischer and C. Krauss, “Deep learning with long short-term memory networks for financial market predictions,”European Journal of Oper- ational Research, vol. 270, no. 2, pp. 654–669, 2018

work page 2018

[7] [7]

arXiv preprint arXiv:2001.08317 , year=

N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep transformer mod- els for time series forecasting: The influenza prevalence case,”arXiv preprint arXiv:2001.08317, 2020

work page arXiv 2001

[8] [8]

DPP: Deep predictor for price movement from candlestick charts,

C.-C. Hung and Y .-J. Chen, “DPP: Deep predictor for price movement from candlestick charts,”PLOS ONE, vol. 16, no. 6, p. e0252404, 2021

work page 2021

[9] [9]

Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,

Z. Wang and T. Oates, “Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,” inWorkshops at the AAAI Conference on Artificial Intelligence, 2015

work page 2015

[10] [10]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[11] [11]

From pixels to predictions: Spectrogram and vision transformer for better time series forecasting,

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “From pixels to predictions: Spectrogram and vision transformer for better time series forecasting,” inACM International Conference on AI in Finance, 2023

work page 2023

[12] [12]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 770–778

work page 2016

[13] [13]

EfficientNet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for con- volutional neural networks,” inInt. Conf. Machine Learning, 2019, pp. 6105–6114

work page 2019

[14] [14]

Grad-CAM: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Computer Vision, 2017, pp. 618–626

work page 2017