Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis
Pith reviewed 2026-05-22 16:47 UTC · model grok-4.3
The pith
Machine learning models detect quishing by examining QR code structure and pixel patterns without decoding content.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A framework that classifies QR codes as phishing or benign by feeding structural and pixel features extracted from a generated dataset into machine-learning models, without ever decoding the payload; the strongest model reaches an AUC of 0.9106, which rises to 0.9133 after feature pruning, and feature-importance analysis shows that QR structural patterns correlate strongly with phishing labels.
What carries the argument
Pixel-pattern and structural feature extraction from QR codes, used as input to classifiers such as XGBoost on a dataset of labeled phishing and benign examples.
If this is right
- Structural features of QR codes carry a strong statistical association with phishing labels.
- Removing non-informative pixels from the feature vector maintains or slightly raises detection performance.
- Detection remains possible for QR codes that encode Wi-Fi, payment, or other non-URL data.
- Multiple standard classifiers can be trained for the task, with gradient-boosted trees performing well.
Where Pith is reading between the lines
- A detector of this kind could be inserted into QR-scanning apps as an early filter before any payload is processed.
- Attackers may need to alter how they generate QR codes to avoid leaving detectable structural signatures.
- The same pixel-analysis pipeline could be applied to other static visual encodings that carry hidden data.
Load-bearing premise
A generated collection of phishing and benign QR codes is representative enough of real attacks that their visible pixel patterns alone can separate malicious from safe codes.
What would settle it
Testing the trained model on QR codes taken directly from documented real-world quishing incidents and measuring whether the AUC falls substantially below 0.85.
Figures
read the original abstract
The rise of QR code-based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Na\"ive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual patterns correlated with phishing labels and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce the first framework for detecting quishing attacks by directly analyzing QR code structure and pixel patterns without extracting the embedded content. A generated dataset of phishing and benign QR codes is used to train models including Logistic Regression, Decision Trees, Random Forest, Naive Bayes, LightGBM, and XGBoost; the best model (XGBoost) achieves an AUC of 0.9106, which improves to 0.9133 after pruning non-informative pixels via feature importance analysis. The work concludes that structural features of QR codes correlate strongly with phishing risk and establishes a foundation for content-agnostic quishing mitigation.
Significance. If the results hold, the work offers a novel content-agnostic detection layer that avoids exposing users to malicious payloads and extends to non-URL QR data types. The multi-model evaluation and feature-refinement step provide concrete empirical support for the feasibility of pixel-based quishing detection, which could complement existing URL-centric defenses.
major comments (2)
- [Dataset generation procedure (Methods/Experimental section)] Dataset generation procedure (Methods/Experimental section): the manuscript provides no details on how phishing QR codes were synthesized, including whether URL lengths, error-correction levels, versions, or mask patterns were matched to the benign set. This is load-bearing for the central AUC claims (0.9106 and 0.9133) because the model may exploit controllable generation artifacts rather than learning generalizable visual signatures of malicious intent.
- [Experimental evaluation] Experimental evaluation: dataset size, train-test split ratios, cross-validation method, and any baseline comparisons (e.g., against URL-based detectors) are not reported. These omissions prevent assessment of whether the reported performance is robust or merely an artifact of the self-generated collection.
minor comments (1)
- [Abstract] The abstract uses the LaTeX form 'Naïve Bayes' but the rendered text appears as 'Naive Bayes'; ensure consistent spelling throughout.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We have prepared point-by-point responses to the major comments and will revise the paper accordingly to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Dataset generation procedure (Methods/Experimental section)] Dataset generation procedure (Methods/Experimental section): the manuscript provides no details on how phishing QR codes were synthesized, including whether URL lengths, error-correction levels, versions, or mask patterns were matched to the benign set. This is load-bearing for the central AUC claims (0.9106 and 0.9133) because the model may exploit controllable generation artifacts rather than learning generalizable visual signatures of malicious intent.
Authors: We agree that detailed information on dataset synthesis is necessary to support the validity of the reported AUC values and to demonstrate that the model captures meaningful patterns rather than generation artifacts. In the revised manuscript, we will expand the Methods section with a full description of the generation process for both phishing and benign QR codes. This will include the specific parameters used (URL lengths, error-correction levels, versions, and mask patterns) and how they were controlled or matched across the two classes to ensure fair comparison. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation: dataset size, train-test split ratios, cross-validation method, and any baseline comparisons (e.g., against URL-based detectors) are not reported. These omissions prevent assessment of whether the reported performance is robust or merely an artifact of the self-generated collection.
Authors: We acknowledge these omissions limit the ability to evaluate robustness. In the revised manuscript, we will report the full dataset size, the exact train-test split ratios, the cross-validation procedure, and associated performance metrics. For baseline comparisons, we note that our content-agnostic pixel-based approach is designed to avoid payload extraction and thus differs in scope from URL-based detectors; however, we will add a discussion of this distinction and include any feasible comparative analysis or rationale for the chosen evaluation strategy. revision: yes
Circularity Check
No circularity: empirical ML evaluation on generated dataset
full rationale
The paper describes generating a dataset of phishing and benign QR codes, extracting pixel and structural features, and training standard classifiers (XGBoost achieving AUC 0.9106, refined to 0.9133). The reported performance metrics are direct empirical results on held-out portions of this dataset rather than any derived quantity that reduces to a fitted parameter or self-referential definition by construction. No equations appear in the provided text, and no self-citations are invoked to justify uniqueness or load-bearing premises. The central claim remains an independent empirical observation on the authors' own data split, qualifying as self-contained against external benchmarks per the evaluation criteria.
Axiom & Free-Parameter Ledger
free parameters (1)
- Pixel feature selection threshold
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We generated a dataset of phishing and benign QR codes... used the 'qrcode' Python library... selected version 13... error correction level was set to 'low'... box size was set to 1... border to 0... flattened into a uniform array, with every pixel treated as an individual feature.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our best-performing model (XGBoost) achieves an AUC of 0.9106... feature importance analysis... removing non-informative pixels, improving performance to an AUC of 0.9133
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Advances in Engineering Software173, 103288 (2022)
Ahammad, S.H., Kale, S.D., Upadhye, G.D., Pande, S.D., Babu, E.V., Dhumane, A.V., Bahadur, M.D.K.J.: Phishing url detection using machine learning methods. Advances in Engineering Software173, 103288 (2022)
work page 2022
-
[2]
In: 2022 7th International Conference on Data Science and Ma- chine Learning Applications (CDMA)
Aljabri, M., Mirza, S.: Phishing attacks detection using machine learning and deep learning models. In: 2022 7th International Conference on Data Science and Ma- chine Learning Applications (CDMA). pp. 175–180. IEEE (2022)
work page 2022
-
[3]
International journal of computer applications184(33), 34–39 (2022)
Amoah, G.A., Hayfron-Acquah, J.: Qr code security: mitigating the issue of quish- ing (qr code phishing). International journal of computer applications184(33), 34–39 (2022)
work page 2022
-
[4]
Hannousse, A., Yahiouche, S.: Towards benchmark datasets for machine learning basedwebsitephishingdetection:Anexperimentalstudy.EngineeringApplications of Artificial Intelligence104, 104347 (2021)
work page 2021
-
[5]
In: 2018 APWG Symposium on Electronic Crime Research (eCrime)
Le Page, S., Jourdan, G.V., Bochmann, G.V., Flood, J., Onut, I.V.: Using url shorteners to compare phishing and malware attacks. In: 2018 APWG Symposium on Electronic Crime Research (eCrime). pp. 1–13. IEEE (2018)
work page 2018
-
[6]
IEEE Transactions on Network and Service Management 11(4), 458–471 (2014)
Marchal, S., François, J., State, R., Engel, T.: Phishstorm: Detecting phishing with streaming analytics. IEEE Transactions on Network and Service Management 11(4), 458–471 (2014)
work page 2014
-
[7]
Expert Systems with Applications 236, 121183 (2024)
Opara, C., Chen, Y., Wei, B.: Look before you leap: Detecting phishing web pages by exploiting raw url and html characteristics. Expert Systems with Applications 236, 121183 (2024)
work page 2024
-
[8]
Rafsanjani, A.S., Kamaruddin, N.B., Rusli, H.M., Dabbagh, M.: Qsecr: Secure qr code scanner according to a novel malicious url detection framework. IEEE Access (2023)
work page 2023
-
[9]
In: Proceedings of the 2022 European Symposium on Usable Security
Sharevski, F., Devine, A., Pieroni, E., Jachim, P.: Phishing with malicious qr codes. In: Proceedings of the 2022 European Symposium on Usable Security. pp. 160–171 (2022)
work page 2022
-
[10]
Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word- level deep learning model for phishing url detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2857–2861. IEEE (2020)
work page 2020
-
[11]
In: 2024 2nd International Conference on Foundation and Large Language Models (FLLM)
Trad, F., Chehab, A.: Large multimodal agents for accurate phishing detection with enhanced token optimization and cost reduction. In: 2024 2nd International Conference on Foundation and Large Language Models (FLLM). pp. 229–237. IEEE (2024)
work page 2024
-
[12]
Trad, F., Chehab, A.: Prompt engineering or fine-tuning? a case study on phishing detection with large language models. Machine Learning and Knowledge Extrac- tion6(1), 367–384 (2024) Detecting Quishing Attacks with ML Techniques 13
work page 2024
-
[13]
In: Intelligent Systems and Pattern Recognition
Trad, F., Chehab, A.: To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models. In: Intelligent Systems and Pattern Recognition. pp. 158–173. Springer Nature Switzerland, Cham (2025).ht tps://doi.org/10.1007/978-3-031-82150-9_13
-
[14]
Vidas, T., Owusu, E., Wang, S., Zeng, C., Cranor, L.F., Christin, N.: Qrishing: The susceptibility of smartphone users to qr code phishing attacks. In: Financial Cryptography and Data Security: FC 2013 Workshops, USEC and WAHC 2013, Okinawa, Japan, April 1, 2013, Revised Selected Papers 17. pp. 52–69. Springer (2013)
work page 2013
-
[15]
Wang, Y., Zhu, W., Xu, H., Qin, Z., Ren, K., Ma, W.: A large-scale pretrained deep model for phishing url detection. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)
work page 2023
-
[16]
In: 2019 7th International Conference on Smart Computing & Communications (ICSCC)
Yong, K.S., Chiew, K.L., Tan, C.L.: A survey of the qr code phishing: the cur- rent attacks and countermeasures. In: 2019 7th International Conference on Smart Computing & Communications (ICSCC). pp. 1–5. IEEE (2019)
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.