pith. machine review for the scientific record. sign in

arxiv: 2511.20944 · v4 · submitted 2025-11-26 · 💻 cs.LG · cs.CR

Recognition: unknown

Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

Authors on Pith no claims yet
classification 💻 cs.LG cs.CR
keywords emailachievesanalysisapproachbusinesscatboostcompromisecost-sensitive
0
0 comments X
read the original abstract

Business Email Compromise (BEC) is a high-impact social engineering threat with extreme operational asymmetry: false negatives can trigger large financial losses, while false positives primarily incur investigation and delay costs. This paper compares two BEC detection paradigms under a cost-sensitive decision framework: (i) a semantic transformer approach (DistilBERT) for contextual language understanding, and (ii) a forensic psycholinguistic approach (CatBoost) using engineered linguistic and structural cues. We evaluate both on a hybrid dataset (N = 7,990) combining legitimate corporate email and AI-synthesised adversarial fraud generated across 30 BEC taxonomies, including character-level Unicode obfuscations. We add classical baselines (TF-IDF+LogReg and character n-gram+Linear SVM), an ablation study for the Smiling Assassin Score, and a homoglyph-map sensitivity analysis. DistilBERT achieves AUC = 1.0000 and F1 = 0.9981 at 7.403 ms per email on GPU; CatBoost achieves AUC = 0.9860 and F1 = 0.9382 at 0.855 ms on CPU. A three-way cost-sensitive decision policy (auto-allow, auto-block, manual review) optimises expected financial loss under a 1:5,167 false-negative-to-false-positive cost ratio.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.