Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

Yaw Osei Adjei (Kwame Nkrumah University of Science , Technology , Kumasi , Ghana) , Frederick Ayivor (Independent Researcher , Fishers , Indiana , USA)

Authors on Pith no claims yet

classification 💻 cs.LG cs.CR

keywords emailachievesanalysisapproachbusinesscatboostcompromisecost-sensitive

0 comments

read the original abstract

Business Email Compromise (BEC) is a high-impact social engineering threat with extreme operational asymmetry: false negatives can trigger large financial losses, while false positives primarily incur investigation and delay costs. This paper compares two BEC detection paradigms under a cost-sensitive decision framework: (i) a semantic transformer approach (DistilBERT) for contextual language understanding, and (ii) a forensic psycholinguistic approach (CatBoost) using engineered linguistic and structural cues. We evaluate both on a hybrid dataset (N = 7,990) combining legitimate corporate email and AI-synthesised adversarial fraud generated across 30 BEC taxonomies, including character-level Unicode obfuscations. We add classical baselines (TF-IDF+LogReg and character n-gram+Linear SVM), an ablation study for the Smiling Assassin Score, and a homoglyph-map sensitivity analysis. DistilBERT achieves AUC = 1.0000 and F1 = 0.9981 at 7.403 ms per email on GPU; CatBoost achieves AUC = 0.9860 and F1 = 0.9382 at 0.855 ms on CPU. A three-way cost-sensitive decision policy (auto-allow, auto-block, manual review) optimises expected financial loss under a 1:5,167 false-negative-to-false-positive cost ratio.

This paper has not been read by Pith yet.

Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

discussion (0)