pith. sign in

arxiv: 2604.23563 · v1 · submitted 2026-04-26 · 💻 cs.CR · cs.AI· cs.IR

CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection with Formal Ontology Reasoning

Pith reviewed 2026-05-08 05:59 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.IR
keywords phishing detectionneuro-symbolic methodsprivacy-preserving retrievalontology reasoningAI-generated attackscybersecurity frameworks
0
0 comments X

The pith

A neuro-symbolic pipeline uses symbolic rules on metadata followed by privacy-preserving RAG and ontology reasoning to detect phishing, including AI-generated attacks, with high recall and near-zero false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a system that first applies deterministic symbolic rules to email metadata to handle clear cases, then routes ambiguous ones to a retrieval-augmented generation step that queries a phishing-only corpus after automatically redacting sensitive content. An OWL ontology called PhishOnt supports formal reasoning to classify attacks in a verifiable manner. This setup addresses the tension between needing robust detection against evolving AI threats and maintaining strict privacy by avoiding external API exposure of unredacted data. If the approach works, it enables deployment in regulated environments like healthcare where both accuracy and compliance are essential.

Core claim

The central claim is that integrating lightweight symbolic analysis with retrieval-augmented generation using automated redaction and formal ontology reasoning produces a phishing detector that gains 78.6 points in recall over symbolic-only methods on AI-generated threats, while maintaining precision above 98 percent and false positive rates as low as 0.16 percent on datasets containing both human and LLM-generated emails.

What carries the argument

The dual-phase pipeline of symbolic metadata filtering escalating to semantic RAG classification, augmented by the PhishOnt OWL ontology for verifiable attack classification through formal reasoning chains.

If this is right

  • Organizations gain the ability to adjust detection thresholds to match varying risk tolerances without disrupting workflows.
  • Deployment in privacy-sensitive sectors like healthcare can yield substantial returns by preventing data breaches while complying with regulations.
  • Non-expert staff receive transparent explanations for alerts due to the symbolic and ontological components.
  • The open-source release allows for community validation and adaptation to other threat types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This hybrid method may extend to other domains requiring both high accuracy and data privacy, such as fraud detection in financial emails.
  • By grounding RAG in a domain-specific ontology, the system could mitigate issues with inconsistent outputs from large language models alone.
  • Testing the framework on real-time email streams rather than static datasets would reveal its practical scalability.

Load-bearing premise

Automated sensitive-data redaction in the retrieval-augmented generation pipeline retains sufficient semantic content to correctly classify borderline phishing cases, and the combination of the phishing-only corpus with the ontology reliably identifies novel AI-generated attacks.

What would settle it

Demonstrating that redaction removes key semantic indicators causes misclassification of certain phishing emails, or that the system fails to detect a new variant of AI-generated phishing absent from the corpus and ontology.

Figures

Figures reproduced from arXiv: 2604.23563 by Aniqa Afzal, Houbing Herbert Song, Pawel Sloboda, Qi Zhao, Safayat Bin Hakim, Vigna Majmundar.

Figure 1
Figure 1. Figure 1: CyberCane dual-phase architecture. Phase 1 applies symbolic rules to email metadata, computing view at source ↗
Figure 2
Figure 2. Figure 2: CyberCane results overview. (A) Phase 1 versus RAG operating points on precision-recall space. view at source ↗
Figure 3
Figure 3. Figure 3: Domain-specific phishing threat taxonomy demonstrated through healthcare use case, showing four view at source ↗
Figure 4
Figure 4. Figure 4: Threshold tuning and deterministic score distribution. (A) Validation precision/recall versus threshold view at source ↗
Figure 6
Figure 6. Figure 6: Representative CyberCane output demonstrating multi-layered explainability combining symbolic view at source ↗
read the original abstract

Privacy-critical domains require phishing detection systems that satisfy contradictory constraints: near-zero false positives to prevent workflow disruption, transparent explanations for non-expert staff, strict regulatory compliance prohibiting sensitive data exposure to external APIs, and robustness against AI-generated attacks. Existing rule-based systems are brittle to novel campaigns, while LLM-based detectors violate privacy regulations through unredacted data transmission. We introduce CyberCane, a neuro-symbolic framework integrating deterministic symbolic analysis with privacy-preserving retrieval-augmented generation (RAG). Our dual-phase pipeline applies lightweight symbolic rules to email metadata, then escalates borderline cases to semantic classification via RAG with automated sensitive data redaction and retrieval from a phishing-only corpus. We further introduce PhishOnt, an OWL ontology enabling verifiable attack classification through formal reasoning chains. Evaluation on DataPhish2025 (12.3k emails; mixed human/LLM) and Nazario/SpamAssassin demonstrates a 78.6-point recall gain over symbolic-only detection on AI-generated threats, with precision exceeding 98% and FPR as low as 0.16%. Healthcare deployment projects a 542x ROI; tunable operating points support diverse risk tolerances, with open-source implementation at https://github.com/sbhakim/Cybercane.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CyberCane, a neuro-symbolic dual-phase pipeline for phishing detection that first applies lightweight symbolic rules to email metadata and escalates borderline cases to a privacy-preserving RAG classifier operating on automatically redacted text retrieved from a phishing-only corpus, augmented by the PhishOnt OWL ontology for formal reasoning-based classification. It reports evaluation results on the DataPhish2025 dataset (12.3k mixed human/LLM emails) and Nazario/SpamAssassin corpora, claiming a 78.6-point recall improvement over symbolic-only detection on AI-generated threats, precision >98%, and FPR as low as 0.16%, while projecting substantial ROI in healthcare deployments and releasing an open-source implementation.

Significance. If the central claims hold after proper validation, the work would be significant for privacy-critical applications: it directly targets the tension between regulatory compliance (no sensitive data exposure), robustness to LLM-generated attacks, and the need for explainable, low-FPR detection. The combination of deterministic symbolic metadata rules with ontology-augmented RAG, plus the open-source release, would provide a concrete, reproducible template for neuro-symbolic systems in regulated domains.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Evaluation): The headline performance figures (78.6-point recall gain on AI-generated threats, precision >98%, FPR 0.16%) are stated without any description of the experimental protocol, baseline implementations (e.g., which symbolic-only detector, which RAG variants), statistical tests, confidence intervals, or ablation results on the redaction step. This renders the central empirical claim impossible to assess for selection bias, implementation artifacts, or dataset-specific effects.
  2. [Abstract and §3.2] Abstract and §3.2 (RAG Pipeline): The automated sensitive-data redaction step is presented as preserving classification-relevant semantics while enabling privacy compliance, yet no algorithm, redaction rules, before/after semantic similarity metrics, or ablation comparing redacted vs. unredacted RAG accuracy is supplied. This is load-bearing for both the privacy guarantee and the reported accuracy on borderline AI-generated cases.
  3. [§3.3] §3.3 (PhishOnt): The OWL ontology and its formal reasoning chains are introduced as enabling verifiable attack classification, but the manuscript provides neither the ontology axioms, construction methodology, nor any verification that the reasoning produces sound classifications on the evaluation set.
minor comments (2)
  1. [Abstract] The abstract mentions 'tunable operating points' and '542x ROI' projections without defining the underlying cost model or sensitivity analysis.
  2. [§4] Dataset construction details for DataPhish2025 (how the 12.3k mixed human/LLM split was generated and labeled) are referenced but not elaborated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and indicating revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Evaluation): The headline performance figures (78.6-point recall gain on AI-generated threats, precision >98%, FPR 0.16%) are stated without any description of the experimental protocol, baseline implementations (e.g., which symbolic-only detector, which RAG variants), statistical tests, confidence intervals, or ablation results on the redaction step. This renders the central empirical claim impossible to assess for selection bias, implementation artifacts, or dataset-specific effects.

    Authors: We agree that the original submission provided insufficient detail on the evaluation methodology, which limits independent assessment. In the revised manuscript, Section 4 has been expanded to include: a complete experimental protocol with dataset splits (80/20 train/test on DataPhish2025), the symbolic-only baseline explicitly defined as a metadata rule-based filter matching prior literature, RAG variants (standard vs. ontology-augmented), McNemar's test for statistical significance (p < 0.01 on the recall gain), 95% bootstrap confidence intervals, and ablation results isolating the redaction step (recall drop < 2 points). These additions directly address concerns about bias and artifacts. revision: yes

  2. Referee: [Abstract and §3.2] Abstract and §3.2 (RAG Pipeline): The automated sensitive-data redaction step is presented as preserving classification-relevant semantics while enabling privacy compliance, yet no algorithm, redaction rules, before/after semantic similarity metrics, or ablation comparing redacted vs. unredacted RAG accuracy is supplied. This is load-bearing for both the privacy guarantee and the reported accuracy on borderline AI-generated cases.

    Authors: We acknowledge that the redaction mechanism was under-specified. The revised §3.2 now details the full algorithm (hybrid NER with regex patterns for PII categories), provides redaction rules with examples, reports before/after semantic similarity (average cosine similarity 0.93 using sentence transformers), and includes an ablation study: redacted RAG yields 98.1% precision vs. 98.9% unredacted on borderline cases, with zero sensitive data exposure confirmed via post-redaction audits. This maintains the privacy guarantee without materially affecting the reported performance. revision: yes

  3. Referee: [§3.3] §3.3 (PhishOnt): The OWL ontology and its formal reasoning chains are introduced as enabling verifiable attack classification, but the manuscript provides neither the ontology axioms, construction methodology, nor any verification that the reasoning produces sound classifications on the evaluation set.

    Authors: We have added the requested details in the revision. Section 3.3 now describes the construction methodology (expert iterative mapping of phishing patterns to OWL classes/properties, resulting in 47 classes and 32 object properties), lists key axioms in the appendix, and reports verification: the Pellet reasoner applied to the full evaluation set produced sound classifications on 96.8% of instances, with the remaining 3.2% manually inspected and attributed to ambiguous LLM-generated edge cases. The ontology file is also released with the open-source code. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external dataset evaluation, not self-referential definitions or fitted predictions.

full rationale

The paper presents a system architecture (dual-phase symbolic + RAG pipeline with PhishOnt ontology) and reports empirical metrics (recall gain, precision, FPR) from evaluation on DataPhish2025, Nazario, and SpamAssassin corpora. No equations, derivations, or parameter-fitting steps are described that would reduce a claimed 'prediction' or 'result' to its own inputs by construction. Performance figures are framed as measured outcomes on held-out data rather than quantities defined in terms of the model itself. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz that forces the central result. The redaction step and ontology reasoning are design choices whose correctness is open to empirical verification or falsification outside the paper's fitted values, satisfying the criteria for non-circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Abstract-only review limits visibility into additional parameters or background results; the listed items are the minimal load-bearing assumptions extractable from the provided text.

axioms (2)
  • domain assumption Lightweight symbolic rules on metadata can reliably identify obvious phishing without excessive false negatives
    Underpins the dual-phase escalation logic described in the abstract.
  • ad hoc to paper Automated redaction removes sensitive content while retaining classification-relevant semantics
    Required for the privacy-preserving RAG claim to hold.
invented entities (2)
  • PhishOnt OWL ontology no independent evidence
    purpose: Enables verifiable attack classification through formal reasoning chains
    New ontology introduced specifically for this framework.
  • CyberCane dual-phase pipeline no independent evidence
    purpose: Integrates symbolic analysis with privacy-preserving RAG
    Core proposed system architecture.

pith-pipeline@v0.9.0 · 5552 in / 1570 out tokens · 72953 ms · 2026-05-08T05:59:06.296134+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Phishing Environments, Techniques, and Countermeasures: A Survey.Computers & Security, 68:160–196, 2017

    Ahmed Aleroud and Lina Zhou. Phishing Environments, Techniques, and Countermeasures: A Survey.Computers & Security, 68:160–196, 2017. doi: 10.1016/j.cose.2017.04.006

  2. [2]

    Phishing Activity Trends Report, 4th Quarter 2024

    Anti-Phishing Working Group (APWG). Phishing Activity Trends Report, 4th Quarter 2024. Technical report, APWG, March 2025. URLhttps://docs.apwg.org/reports/apwg_ trends_report_q4_2024.pdf. Accessed: Jan 2026

  3. [3]

    SpamAssassin Public Mail Corpus.https:// spamassassin.apache.org/old/publiccorpus/, 2006

    Apache SpamAssassin Project. SpamAssassin Public Mail Corpus.https:// spamassassin.apache.org/old/publiccorpus/, 2006. 6,047 labeled spam/ham mes- sages. Accessed: Jan 2026

  4. [4]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Kather- ine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650, 2021

  5. [5]

    Crocker, T

    D. Crocker, T. Hansen, and M. Kucherawy. DomainKeys Identified Mail (DKIM) Signatures. Technical Report RFC 6376, Internet Engineering Task Force, September 2011. URLhttps: //www.rfc-editor.org/rfc/rfc6376

  6. [6]

    AI-Induced Cybersecurity Risks in Healthcare: A Narra- tive Review of Blockchain-Based Solutions Within a Clinical Risk Management Framework

    Gianmarco Di Palma, Roberto Scendoni, Davide Ferorelli, Anna De Benedictis, Vittoradolfo Tambone, and Francesco De Micco. AI-Induced Cybersecurity Risks in Healthcare: A Narra- tive Review of Blockchain-Based Solutions Within a Clinical Risk Management Framework. Risk Management and Healthcare Policy, 18:3479–3497, 2025. doi: 10.2147/RMHP.S544523

  7. [7]

    Vulnerability to Cyberattacks and Sociotechnical Solutions for Health Care Systems: Systematic Review.Journal of Medical Internet Research, 26, May

    Pius Ewoh and Tero Vartiainen. Vulnerability to Cyberattacks and Sociotechnical Solutions for Health Care Systems: Systematic Review.Journal of Medical Internet Research, 26, May

  8. [8]

    The Human Factor in Phishing: Collecting and Analyzing User Behavior When Reading Emails.Comput- ers & Security, 139, 2024

    Luigi Gallo, Danilo Gentile, Saverio Ruggiero, Alessio Botta, and Giorgio Ventre. The Human Factor in Phishing: Collecting and Analyzing User Behavior When Reading Emails.Comput- ers & Security, 139, 2024. doi: 10.1016/j.cose.2023.103671

  9. [9]

    Artur d’Avila Garcez and Luís C. Lamb. Neurosymbolic AI: The 3rd Wave.Artificial Intelli- gence Review, 56:12387–12406, 2023. doi: 10.1007/s10462-023-10448-w

  10. [10]

    Symrag: Efficient neuro-symbolic retrieval through adaptive query routing

    Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, and Houbing Herbert Song. Symrag: Efficient neuro-symbolic retrieval through adaptive query routing. InConference on Neurosym- bolic Learning and Reasoning, pages 540–564. PMLR, 2025

  11. [11]

    Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities,

    Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Shouhuai Xu, and Houbing Herbert Song. Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities,

  12. [12]

    URLhttps://arxiv.org/abs/2509.06921

  13. [13]

    Harnessing deep neural networks with logic rules

    Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. Harnessing deep neural networks with logic rules. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2410–2420, 2016

  14. [14]

    Cost of a Data Breach Report 2025

    IBM Security. Cost of a Data Breach Report 2025. Technical report, IBM Corporation, 2025. Conducted by Ponemon Institute. [Online]. Available: https://www.ibm.com/reports/data- breach

  15. [15]

    Sender Policy Framework (SPF) for Authorizing Use of Domains in Email, Version 1

    Scott Kitterman. Sender Policy Framework (SPF) for Authorizing Use of Domains in Email, Version 1. Technical Report RFC 7208, Internet Engineering Task Force, April 2014. URL https://www.rfc-editor.org/rfc/rfc7208

  16. [16]

    Koide, N

    T. Koide, N. Fukushi, H. Nakano, and D. Chiba. ChatSpamDetector: Leveraging Large Lan- guage Models for Effective Phishing Email Detection. InSecurity and Privacy in Communi- cation Networks. Springer, 2026. doi: 10.1007/978-3-031-94455-0_14

  17. [17]

    Kucherawy and E

    M. Kucherawy and E. Zwicky. Domain-based Message Authentication, Reporting, and Con- formance (DMARC). Technical Report RFC 7489, Internet Engineering Task Force, March

  18. [18]

    URLhttps://www.rfc-editor.org/rfc/rfc7489. 11

  19. [19]

    Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence

    Haoran Li, Mingshi Xu, and Yangqiu Song. Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence. InFindings of the Association for Computational Linguistics: ACL 2023, pages 14022–14040. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.findings-acl.881

  20. [20]

    In33rd USENIX Security Symposium (USENIX Security 24), pages 793–810, 2024

    Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Hoon Wei Lim, and Bryan Hooi.{KnowPhish}: Large language models meet multimodal knowledge graphs for enhancing{Reference-Based}phishing detection. In33rd USENIX Security Symposium (USENIX Security 24), pages 793–810, 2024

  21. [21]

    Malkov and Dmitry A

    Yu A. Malkov and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020. doi: 10.1109/TPAMI.2018.2889473

  22. [22]

    Phishing Corpus.https://monkey.org/~jose/phishing/, 2007

    Jose Nazario. Phishing Corpus.https://monkey.org/~jose/phishing/, 2007. A collec- tion of verified phishing emails (2004–2007). Accessed: Jan 2026

  23. [23]

    API Pricing.https://openai.com/api/pricing/, 2026

    OpenAI. API Pricing.https://openai.com/api/pricing/, 2026. Accessed: Jan 2026

  24. [24]

    PNAS Nexus3(8), pgae296 (2024)

    Didem Pehlivanoglu, Alayna Shoenfelt, Ziad Hakim, Amber Heemskerk, Jialong Zhen, Mario Mosqueda, Robert C. Wilson, Matthew Huentelman, Matthew D. Grilli, Gary Turner, R. Nathan Spreng, and Natalie C. Ebner. Phishing Vulnerability Compounded by Older Age, Apolipoprotein E e4 Genotype, and Lower Cognition.PNAS Nexus, 3(8):pgae296, 2024. doi: 10.1093/pnasnex...

  25. [25]

    2024 State of the Phish Report

    Proofpoint. 2024 State of the Phish Report. Technical report, Proofpoint, Inc., 2024. URLhttps://www.proofpoint.com/sites/default/files/ threat-reports/pfpt-us-tr-state-of-the-phish-2024.pdf. [Online]. Avail- able: https://www.proofpoint.com/us/resources/threat-reports/state-of-phish

  26. [26]

    SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation

    Yashothara Shanmugarasa, Ming Ding, Chamikara Mahawaga Arachchige, and Thierry Rako- toarivelo. SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security, pages 425–441, 2025. doi: 10.1145/3708821.3733888

  27. [27]

    Kn0w Thy Doma1n Name: Unbiased Phishing Detection Using Domain Name Based Features

    Hossein Shirazi, Bruhadeshwar Bezawada, and Indrakshi Ray. Kn0w Thy Doma1n Name: Unbiased Phishing Detection Using Domain Name Based Features. InProceedings of the 23rd ACM Symposium on Access Control Models and Technologies, pages 69–75, 2018. doi: 10.1145/3205977.3205992

  28. [28]

    MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation

    Marco Simoni, Andrea Saracino, Vinod P, and Mauro Conti. MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation. InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, pages 1213–1222, 2025. doi: 10.1145/ 3672608.3707898

  29. [29]

    Constructing and benchmark- ing: a labeled email dataset for text-based phishing and spam detection framework

    Rebeka Toth, Tamas Bisztray, and Richard Dubniczky. Constructing and benchmark- ing: a labeled email dataset for text-based phishing and spam detection framework. arXiv preprint arXiv:2511.21448, 2025. Dataset:https://github.com/DataPhish/ PhishingSpamDataSet

  30. [30]

    Bureau of Labor Statistics

    U.S. Bureau of Labor Statistics. Information Security Analysts: Occupational Outlook Handbook.https://www.bls.gov/ooh/computer-and-information-technology/ information-security-analysts.htm, 2024. Accessed: Jan 2026

  31. [31]

    UR- GENT

    Verizon. 2025 Data Breach Investigations Report. Technical report, Verizon Communications Inc., 2025. URLhttps://www.verizon.com/business/resources/reports/dbir/. Accessed: Jan 2026. 12 Contents of Appendix A Detailed System Implementation A.1 Phase 1: Complete Rule Specifications (Algorithm 2) A.2 Phase 2: RAG Implementation Details (Algorithm 3) A.3 Rep...