PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification
Pith reviewed 2026-05-07 16:16 UTC · model grok-4.3
The pith
A hybrid framework uses answer set programming to revise machine learning predictions for more consistent phishing website detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The PHISHREV framework combines machine learning classifiers with non-monotonic reasoning via Answer Set Programming to perform context-aware decision refinement. The post-hoc reasoning layer incorporates expert knowledge to revise classifier predictions through formal belief revisions. Experimental results indicate that the reasoning module modifies 5.08% of classifier outputs, leading to improved decision consistency, and that new domain knowledge can be incorporated into the reasoning layer in O(n) time without retraining the model.
What carries the argument
The post-hoc non-monotonic reasoning layer using Answer Set Programming (ASP) that performs formal belief revision on the outputs of the machine learning classifier.
If this is right
- The reasoning module changes 5.08% of the machine learning classifier outputs.
- Decision consistency improves after the reasoning layer is applied.
- New domain knowledge integrates into the system in linear time without any need to retrain the underlying classifier.
Where Pith is reading between the lines
- This design could let defenders update phishing detectors quickly when new attack patterns appear by editing rules instead of gathering fresh training data.
- The same hybrid pattern might apply to other security tasks such as fraud detection where explicit domain rules can correct statistical errors.
- Explicit rule-based revisions could make the overall system more auditable than a pure black-box model.
Load-bearing premise
The expert knowledge encoded in the ASP rules accurately reflects real-world phishing contexts and revising the machine learning predictions based on this reasoning improves the actual correctness of classifications.
What would settle it
A comparison of classification accuracy on a labeled test set of phishing and legitimate websites before and after the reasoning layer is applied, to determine whether the revisions raise or lower the true positive and false positive rates.
Figures
read the original abstract
Phishing detection systems are predominantly rely on statistical machine learning models, which often lack contextual reasoning and are vulnerable to adversarial manipulation. In this work, we propose a hybrid framework that integrates machine learning classifiers with non-monotonic reasoning using Answer Set Programming (ASP) to enable context-aware decision refinement. The proposed post-hoc reasoning layer incorporates expert knowledge to revise classifier predictions through formal belief revisions. Experimental results indicate that the reasoning module modifies 5.08\% of classifier outputs, leading to improved decision consistency. A key advantage is that new domain knowledge can be incorporated into the reasoning layer in $\mathcal{O}(n)$ time, eliminating the need for model retraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PHISHREV, a hybrid framework integrating machine learning classifiers with post-hoc non-monotonic reasoning via Answer Set Programming (ASP) for context-aware phishing website classification. A reasoning layer encodes expert knowledge to perform formal belief revisions on ML predictions. The central empirical claim is that this module modifies 5.08% of classifier outputs and yields improved decision consistency; a secondary claim is that new domain knowledge can be incorporated into the ASP layer in O(n) time without retraining the underlying ML model.
Significance. If the revisions can be shown to improve accuracy against ground truth (rather than merely increasing consistency with a fixed rule set), the approach would address a practical limitation of pure ML phishing detectors: the inability to rapidly incorporate new contextual knowledge without retraining. The post-hoc design and claimed linear update cost are potentially valuable for security applications where threat models evolve quickly.
major comments (2)
- [Abstract] Abstract: The claim that the reasoning module 'modifies 5.08% of classifier outputs, leading to improved decision consistency' is presented without any description of the experimental setup, datasets, baseline classifiers, statistical tests, or the precise definition and measurement of 'decision consistency'. This information is required to evaluate the central empirical result.
- [Abstract] Abstract: No before/after accuracy, precision, recall, or F1 scores on labeled data are reported, nor any comparison demonstrating that ASP revisions increase correctness relative to ground truth rather than simply aligning outputs with the non-monotonic rules. Without these metrics the claim that the hybrid system improves phishing classification cannot be assessed.
minor comments (2)
- [Abstract] Abstract contains a grammatical error: 'are predominantly rely on' should read 'predominantly rely on'.
- [Abstract] The O(n) incorporation claim is stated without defining n or describing the update mechanism in the ASP layer.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments point by point below. We agree that the abstract requires expansion for clarity and will revise it to include the requested details while preserving its conciseness. We also clarify the scope of our empirical claims and will incorporate additional metrics as noted.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the reasoning module 'modifies 5.08% of classifier outputs, leading to improved decision consistency' is presented without any description of the experimental setup, datasets, baseline classifiers, statistical tests, or the precise definition and measurement of 'decision consistency'. This information is required to evaluate the central empirical result.
Authors: We agree that the abstract omits these details due to length constraints. The full manuscript (Section 4) specifies the experimental setup, including the phishing datasets used, baseline classifiers (e.g., standard ML models for website classification), the definition of decision consistency (as the rate of alignment between revised predictions and the ASP-encoded expert rules), and statistical validation of the 5.08% modification rate. We will revise the abstract to include a brief summary of the datasets, baselines, consistency definition, and mention of the experimental protocol to allow immediate evaluation of the central result. revision: yes
-
Referee: [Abstract] Abstract: No before/after accuracy, precision, recall, or F1 scores on labeled data are reported, nor any comparison demonstrating that ASP revisions increase correctness relative to ground truth rather than simply aligning outputs with the non-monotonic rules. Without these metrics the claim that the hybrid system improves phishing classification cannot be assessed.
Authors: The manuscript's primary empirical claim concerns improved decision consistency with the non-monotonic rules rather than increased correctness against ground-truth labels. The ASP layer performs belief revision to enforce expert knowledge, which may produce outputs that differ from the original ground truth depending on rule quality; we therefore focused evaluation on consistency gains and the O(n) update property. No before/after accuracy metrics appear in the current version because they were outside the stated scope. We acknowledge that reporting these metrics would strengthen the presentation and will add before/after accuracy, precision, recall, and F1 comparisons on the labeled data in the revised manuscript, along with explicit discussion of how the revisions relate to ground truth. revision: yes
Circularity Check
No circularity: empirical claims and architectural properties do not reduce to inputs by construction
full rationale
The paper describes a hybrid ML+ASP framework whose central results are experimental observations (reasoning module modifies 5.08% of outputs, yielding improved consistency) and a stated complexity advantage (O(n) incorporation of new knowledge without retraining). These are presented as measured outcomes and design properties rather than first-principles derivations or predictions. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the provided text. The derivation chain is therefore self-contained against external benchmarks; the skeptic concern about accuracy vs. consistency is a correctness issue, not a circularity reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Answer Set Programming can effectively model non-monotonic reasoning and belief revision for incorporating expert knowledge into ML predictions.
Reference graph
Works this paper leans on
-
[1]
Analysis of phishing attack trends, impacts and prevention methods: literature study,
F. P. E. Putra, A. Zulfikri, G. Arifin, R. M. Ilhamsyahet al., “Analysis of phishing attack trends, impacts and prevention methods: literature study,”Brilliance: Research of Artificial Intelligence, vol. 4, no. 1, pp. 413–421, 2024
work page 2024
-
[2]
Machine learning techniques for detecting phishing url attacks,
D. T. Mosa, M. Y . Shams, A. A. Abohany, E.-S. M. El-kenawy, and M. Thabet, “Machine learning techniques for detecting phishing url attacks,”Computers, Materials and Continua, vol. 75, no. 1, pp. 1271– 1290, 2023
work page 2023
-
[3]
Phishing website detection using deep learning models,
U. Zara, K. Ayub, H. U. Khan, A. Daud, T. Alsahfi, and S. Gulzar, “Phishing website detection using deep learning models,”IEEE Access, 2024
work page 2024
-
[4]
A review of adversarial at- tack and defense for classification methods,
Y . Li, M. Cheng, C.-J. Hsieh, and T. C. Lee, “A review of adversarial at- tack and defense for classification methods,”The American Statistician, vol. 76, no. 4, pp. 329–345, 2022
work page 2022
-
[5]
Phishstorm: Detecting phishing with streaming analytics,
S. Marchal, J. Franc ¸ois, R. State, and T. Engel, “Phishstorm: Detecting phishing with streaming analytics,”IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458–471, 2014
work page 2014
-
[6]
Phishing url detection with neural networks: an empirical study,
H. Ghalechyan, E. Israyelyan, A. Arakelyan, G. Hovhannisyan, and A. Davtyan, “Phishing url detection with neural networks: an empirical study,”Scientific reports, vol. 14, no. 1, p. 25134, 2024
work page 2024
- [7]
-
[8]
Sok: a comprehensive reexamination of phishing research from the security perspective,
A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, “Sok: a comprehensive reexamination of phishing research from the security perspective,”IEEE Communications Surveys & Tutorials, vol. 22, no. 1, pp. 671–708, 2019
work page 2019
-
[9]
Machine learningtechniquesfor detection of website phishing: A review for promises and challenges,
A. Odeh, I. Keshta, and E. Abdelfattah, “Machine learningtechniquesfor detection of website phishing: A review for promises and challenges,” in2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2021, pp. 0813–0818
work page 2021
-
[10]
Deep learning for phishing detection: Taxonomy, current challenges and future directions,
N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep learning for phishing detection: Taxonomy, current challenges and future directions,”Ieee Access, vol. 10, pp. 36 429–36 463, 2022
work page 2022
-
[11]
Multi-shot asp solving with clingo,
M. GEBSER, R. KAMINSKI, B. KAUFMANN, and T. SCHAUB, “Multi-shot asp solving with clingo,”Theory and Practice of Logic Programming, vol. 19, no. 1, p. 27–82, 2019
work page 2019
-
[12]
A. Hannousse and S. Yahiouche, “Web page phishing detection,” 2021. [Online]. Available: https://doi.org/10.17632/c2gw7fy2j4.3 5
-
[13]
Breaking alert fatigue: Ai-assisted siem framework for effective incident response,
T. Ban, T. Takahashi, S. Ndichu, and D. Inoue, “Breaking alert fatigue: Ai-assisted siem framework for effective incident response,”Applied Sciences, vol. 13, no. 11, p. 6610, 2023
work page 2023
-
[14]
A. Hannousse and S. Yahiouche, “Towards benchmark datasets for machine learning based website phishing detection: An experimental study,”Engineering Applications of Artificial Intelligence, vol. 104, p. 104347, 2021
work page 2021
-
[15]
K. Adane, B. Beyene, and M. Abebe, “Single and hybrid-ensemble learning-based phishing website detection: Examining impacts of varied nature datasets and informative feature selection technique,”Digital Threats: Research and Practice, vol. 4, no. 3, pp. 1–27, 2023
work page 2023
-
[16]
An explainable feature selection framework for web phish- ing detection with machine learning,
S. S. Shafin, “An explainable feature selection framework for web phish- ing detection with machine learning,”Data Science and Management, vol. 8, no. 2, pp. 127–136, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.