pith. sign in

arxiv: 2505.03451 · v2 · submitted 2025-05-06 · 💻 cs.CR · cs.AI

Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

Pith reviewed 2026-05-22 16:47 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords quishingQR code detectionphishingmachine learningpixel analysisstructural featurescybersecurity
0
0 comments X

The pith

Machine learning models detect quishing by examining QR code structure and pixel patterns without decoding content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that quishing attacks can be identified by applying machine learning directly to the visual structure of QR codes rather than first extracting the encoded data. This approach matters because decoding a QR code risks exposing a user to a malicious link or other harmful payload, and because QR codes can carry non-URL information such as Wi-Fi credentials. By training classifiers on pixel-level and structural features from a generated dataset of phishing and safe codes, the work demonstrates that these visual traits alone carry enough signal to separate the two classes. The authors further refine the feature set and report that performance holds or improves slightly after removing uninformative pixels.

Core claim

A framework that classifies QR codes as phishing or benign by feeding structural and pixel features extracted from a generated dataset into machine-learning models, without ever decoding the payload; the strongest model reaches an AUC of 0.9106, which rises to 0.9133 after feature pruning, and feature-importance analysis shows that QR structural patterns correlate strongly with phishing labels.

What carries the argument

Pixel-pattern and structural feature extraction from QR codes, used as input to classifiers such as XGBoost on a dataset of labeled phishing and benign examples.

If this is right

  • Structural features of QR codes carry a strong statistical association with phishing labels.
  • Removing non-informative pixels from the feature vector maintains or slightly raises detection performance.
  • Detection remains possible for QR codes that encode Wi-Fi, payment, or other non-URL data.
  • Multiple standard classifiers can be trained for the task, with gradient-boosted trees performing well.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A detector of this kind could be inserted into QR-scanning apps as an early filter before any payload is processed.
  • Attackers may need to alter how they generate QR codes to avoid leaving detectable structural signatures.
  • The same pixel-analysis pipeline could be applied to other static visual encodings that carry hidden data.

Load-bearing premise

A generated collection of phishing and benign QR codes is representative enough of real attacks that their visible pixel patterns alone can separate malicious from safe codes.

What would settle it

Testing the trained model on QR codes taken directly from documented real-world quishing incidents and measuring whether the AUC falls substantially below 0.85.

Figures

Figures reproduced from arXiv: 2505.03451 by Ali Chehab, Fouad Trad.

Figure 1
Figure 1. Figure 1: Ten samples of the generated QRCodes QR codes adhere to a standardized fixed structure, each QR code image can be reliably flattened into a uniform array, with every pixel treated as an individ￾ual feature. This consistency enables effective analysis and comparison across models in a QR code setting. Hyperparameter tuning was performed using a randomized search with 10-fold cross-validation on the training… view at source ↗
Figure 2
Figure 2. Figure 2: Feature Importance of the top 3 models The results reveal that a significant portion of the QR code remains unused in the prediction process. Large black regions in the feature importance maps indicate that the majority of pixels contribute little to no information for distin￾guishing phishing from benign QR codes. This suggests that quishing detection is primarily influenced by specific regions of the QR … view at source ↗
Figure 3
Figure 3. Figure 3: Features taken into account when using XGBoost Additionally, [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Features not taken into account when using XGBoost 4.3 Experiment 3: Feature Selection In this experiment, we performed feature selection based on the most important features identified by each of the three best-performing models: Random Forest, LightGBM, and XGBoost. We then retrained all models using only these selected [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Feature Importance distribution features and compared their performance to their original versions without fea￾ture selection. The results, presented in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

The rise of QR code-based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Na\"ive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual patterns correlated with phishing labels and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce the first framework for detecting quishing attacks by directly analyzing QR code structure and pixel patterns without extracting the embedded content. A generated dataset of phishing and benign QR codes is used to train models including Logistic Regression, Decision Trees, Random Forest, Naive Bayes, LightGBM, and XGBoost; the best model (XGBoost) achieves an AUC of 0.9106, which improves to 0.9133 after pruning non-informative pixels via feature importance analysis. The work concludes that structural features of QR codes correlate strongly with phishing risk and establishes a foundation for content-agnostic quishing mitigation.

Significance. If the results hold, the work offers a novel content-agnostic detection layer that avoids exposing users to malicious payloads and extends to non-URL QR data types. The multi-model evaluation and feature-refinement step provide concrete empirical support for the feasibility of pixel-based quishing detection, which could complement existing URL-centric defenses.

major comments (2)
  1. [Dataset generation procedure (Methods/Experimental section)] Dataset generation procedure (Methods/Experimental section): the manuscript provides no details on how phishing QR codes were synthesized, including whether URL lengths, error-correction levels, versions, or mask patterns were matched to the benign set. This is load-bearing for the central AUC claims (0.9106 and 0.9133) because the model may exploit controllable generation artifacts rather than learning generalizable visual signatures of malicious intent.
  2. [Experimental evaluation] Experimental evaluation: dataset size, train-test split ratios, cross-validation method, and any baseline comparisons (e.g., against URL-based detectors) are not reported. These omissions prevent assessment of whether the reported performance is robust or merely an artifact of the self-generated collection.
minor comments (1)
  1. [Abstract] The abstract uses the LaTeX form 'Naïve Bayes' but the rendered text appears as 'Naive Bayes'; ensure consistent spelling throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have prepared point-by-point responses to the major comments and will revise the paper accordingly to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Dataset generation procedure (Methods/Experimental section)] Dataset generation procedure (Methods/Experimental section): the manuscript provides no details on how phishing QR codes were synthesized, including whether URL lengths, error-correction levels, versions, or mask patterns were matched to the benign set. This is load-bearing for the central AUC claims (0.9106 and 0.9133) because the model may exploit controllable generation artifacts rather than learning generalizable visual signatures of malicious intent.

    Authors: We agree that detailed information on dataset synthesis is necessary to support the validity of the reported AUC values and to demonstrate that the model captures meaningful patterns rather than generation artifacts. In the revised manuscript, we will expand the Methods section with a full description of the generation process for both phishing and benign QR codes. This will include the specific parameters used (URL lengths, error-correction levels, versions, and mask patterns) and how they were controlled or matched across the two classes to ensure fair comparison. revision: yes

  2. Referee: [Experimental evaluation] Experimental evaluation: dataset size, train-test split ratios, cross-validation method, and any baseline comparisons (e.g., against URL-based detectors) are not reported. These omissions prevent assessment of whether the reported performance is robust or merely an artifact of the self-generated collection.

    Authors: We acknowledge these omissions limit the ability to evaluate robustness. In the revised manuscript, we will report the full dataset size, the exact train-test split ratios, the cross-validation procedure, and associated performance metrics. For baseline comparisons, we note that our content-agnostic pixel-based approach is designed to avoid payload extraction and thus differs in scope from URL-based detectors; however, we will add a discussion of this distinction and include any feasible comparative analysis or rationale for the chosen evaluation strategy. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on generated dataset

full rationale

The paper describes generating a dataset of phishing and benign QR codes, extracting pixel and structural features, and training standard classifiers (XGBoost achieving AUC 0.9106, refined to 0.9133). The reported performance metrics are direct empirical results on held-out portions of this dataset rather than any derived quantity that reduces to a fitted parameter or self-referential definition by construction. No equations appear in the provided text, and no self-citations are invoked to justify uniqueness or load-bearing premises. The central claim remains an independent empirical observation on the authors' own data split, qualifying as self-contained against external benchmarks per the evaluation criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central performance claim rests on the representativeness of the synthetically generated phishing QR codes and the assumption that pixel-level visual features are sufficient to separate classes.

free parameters (1)
  • Pixel feature selection threshold
    Non-informative pixels were removed after importance analysis to reach the final AUC of 0.9133.

pith-pipeline@v0.9.0 · 5773 in / 1171 out tokens · 66883 ms · 2026-05-22T16:47:31.936778+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Advances in Engineering Software173, 103288 (2022)

    Ahammad, S.H., Kale, S.D., Upadhye, G.D., Pande, S.D., Babu, E.V., Dhumane, A.V., Bahadur, M.D.K.J.: Phishing url detection using machine learning methods. Advances in Engineering Software173, 103288 (2022)

  2. [2]

    In: 2022 7th International Conference on Data Science and Ma- chine Learning Applications (CDMA)

    Aljabri, M., Mirza, S.: Phishing attacks detection using machine learning and deep learning models. In: 2022 7th International Conference on Data Science and Ma- chine Learning Applications (CDMA). pp. 175–180. IEEE (2022)

  3. [3]

    International journal of computer applications184(33), 34–39 (2022)

    Amoah, G.A., Hayfron-Acquah, J.: Qr code security: mitigating the issue of quish- ing (qr code phishing). International journal of computer applications184(33), 34–39 (2022)

  4. [4]

    Hannousse, A., Yahiouche, S.: Towards benchmark datasets for machine learning basedwebsitephishingdetection:Anexperimentalstudy.EngineeringApplications of Artificial Intelligence104, 104347 (2021)

  5. [5]

    In: 2018 APWG Symposium on Electronic Crime Research (eCrime)

    Le Page, S., Jourdan, G.V., Bochmann, G.V., Flood, J., Onut, I.V.: Using url shorteners to compare phishing and malware attacks. In: 2018 APWG Symposium on Electronic Crime Research (eCrime). pp. 1–13. IEEE (2018)

  6. [6]

    IEEE Transactions on Network and Service Management 11(4), 458–471 (2014)

    Marchal, S., François, J., State, R., Engel, T.: Phishstorm: Detecting phishing with streaming analytics. IEEE Transactions on Network and Service Management 11(4), 458–471 (2014)

  7. [7]

    Expert Systems with Applications 236, 121183 (2024)

    Opara, C., Chen, Y., Wei, B.: Look before you leap: Detecting phishing web pages by exploiting raw url and html characteristics. Expert Systems with Applications 236, 121183 (2024)

  8. [8]

    IEEE Access (2023)

    Rafsanjani, A.S., Kamaruddin, N.B., Rusli, H.M., Dabbagh, M.: Qsecr: Secure qr code scanner according to a novel malicious url detection framework. IEEE Access (2023)

  9. [9]

    In: Proceedings of the 2022 European Symposium on Usable Security

    Sharevski, F., Devine, A., Pieroni, E., Jachim, P.: Phishing with malicious qr codes. In: Proceedings of the 2022 European Symposium on Usable Security. pp. 160–171 (2022)

  10. [10]

    In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word- level deep learning model for phishing url detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2857–2861. IEEE (2020)

  11. [11]

    In: 2024 2nd International Conference on Foundation and Large Language Models (FLLM)

    Trad, F., Chehab, A.: Large multimodal agents for accurate phishing detection with enhanced token optimization and cost reduction. In: 2024 2nd International Conference on Foundation and Large Language Models (FLLM). pp. 229–237. IEEE (2024)

  12. [12]

    Machine Learning and Knowledge Extrac- tion6(1), 367–384 (2024) Detecting Quishing Attacks with ML Techniques 13

    Trad, F., Chehab, A.: Prompt engineering or fine-tuning? a case study on phishing detection with large language models. Machine Learning and Knowledge Extrac- tion6(1), 367–384 (2024) Detecting Quishing Attacks with ML Techniques 13

  13. [13]

    In: Intelligent Systems and Pattern Recognition

    Trad, F., Chehab, A.: To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models. In: Intelligent Systems and Pattern Recognition. pp. 158–173. Springer Nature Switzerland, Cham (2025).ht tps://doi.org/10.1007/978-3-031-82150-9_13

  14. [14]

    In: Financial Cryptography and Data Security: FC 2013 Workshops, USEC and WAHC 2013, Okinawa, Japan, April 1, 2013, Revised Selected Papers 17

    Vidas, T., Owusu, E., Wang, S., Zeng, C., Cranor, L.F., Christin, N.: Qrishing: The susceptibility of smartphone users to qr code phishing attacks. In: Financial Cryptography and Data Security: FC 2013 Workshops, USEC and WAHC 2013, Okinawa, Japan, April 1, 2013, Revised Selected Papers 17. pp. 52–69. Springer (2013)

  15. [15]

    In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Wang, Y., Zhu, W., Xu, H., Qin, Z., Ren, K., Ma, W.: A large-scale pretrained deep model for phishing url detection. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)

  16. [16]

    In: 2019 7th International Conference on Smart Computing & Communications (ICSCC)

    Yong, K.S., Chiew, K.L., Tan, C.L.: A survey of the qr code phishing: the cur- rent attacks and countermeasures. In: 2019 7th International Conference on Smart Computing & Communications (ICSCC). pp. 1–5. IEEE (2019)