A Comparison of Traditional Machine Learning Algorithms and LSTM-Based Deep Learning Models for Email Sentiment Analysis
Pith reviewed 2026-05-07 16:52 UTC · model grok-4.3
The pith
The SVM model with linear kernel reaches 98.74 percent accuracy in email sentiment analysis using Word2Vec embeddings and outperforms LSTM models in both accuracy and speed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Utilizing Word2Vec embeddings for feature representation, the SVM model with a linear kernel achieves the highest efficiency and accuracy at 98.74 percent, while the LSTM model demonstrates strong recall for spam sentiments yet requires significantly more computational time, leading to the conclusion that SVM provides the optimal balance for email sentiment detection tasks.
What carries the argument
Word2Vec embeddings feeding into an SVM classifier with linear kernel, directly compared to LSTM networks on the same email data through accuracy, efficiency, and confusion matrix evaluations.
If this is right
- Automated email filtering systems can achieve high performance with simpler SVM models rather than LSTM architectures.
- LSTM networks remain useful when the priority is maximum recall for detecting negative or spam sentiments, despite higher runtime cost.
- Traditional classifiers stay competitive for text tasks that use dense vector embeddings without needing deep network overhead.
Where Pith is reading between the lines
- The result suggests that pre-trained embeddings like Word2Vec can reduce the need for complex deep learning models in certain classification settings.
- Developers building professional email tools may achieve faster deployment by starting with linear SVM rather than training LSTM networks.
- Repeating the experiment across varied email domains could test whether the accuracy gap persists outside the original data distribution.
Load-bearing premise
The selected dataset, preprocessing pipeline, and hyperparameter choices represent real-world email sentiment tasks in a way that does not inadvertently favor traditional models over LSTM.
What would settle it
Running the identical comparison on a new, larger email dataset or with different preprocessing that causes the LSTM model to exceed 98.74 percent accuracy or match SVM speed.
Figures
read the original abstract
The rapid growth of electronic communication has necessitated more robust systems for email classification and sentiment detection. This study presents a comparative performance analysis between traditional machine learning algorithms and deep learning architectures, specifically focusing on Support Vector Machines (SVMs), Logistic Regression, Naive Bayes, and Long Short-Term Memory (LSTM). Utilizing Word2Vec embeddings for feature representation, our experimental results indicate that the SVM model with a linear kernel achieves the highest efficiency and accuracy, reaching a peak performance of 98.74%. While the LSTM model demonstrates exceptional recall capabilities in detecting spam-related sentiments, it requires significantly more computational time compared to discriminative statistical models. Detailed evaluations via confusion matrices further reveal that traditional classifiers remain highly robust for dense vector spaces. This research concludes that for email detection tasks, SVM offers the most optimal balance between predictive precision and processing speed. These findings provide critical insights for developing high-performance automated email filtering systems in professional and academic environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript compares traditional machine learning algorithms (SVM with linear kernel, Logistic Regression, Naive Bayes) against an LSTM deep learning model for email sentiment analysis. Word2Vec embeddings are used for feature representation. The central claim is that SVM achieves the highest accuracy (98.74%) and best efficiency, while LSTM offers strong recall for spam-related sentiments but at higher computational cost. The paper concludes that SVM provides the optimal balance of precision and speed for email detection tasks, supported by confusion matrix evaluations.
Significance. If the experimental protocol is fully documented and shown to treat all models equivalently, the work could provide practical guidance on model selection for email classification, reinforcing that traditional discriminative models can outperform recurrent networks in efficiency for dense vector spaces. The explicit accuracy number and efficiency comparison are potentially useful for practitioners, but the current lack of reproducibility details prevents the result from being evaluated or built upon.
major comments (4)
- Abstract: The headline result of 98.74% accuracy for the SVM linear kernel is presented without any description of the email corpus (size, source, label provenance, or class balance), train/test split procedure, or statistical significance tests. This renders the performance ranking unverifiable and prevents assessment of whether the comparison fairly represents real-world email sentiment tasks.
- Abstract: No information is given on how per-email vectors are constructed from Word2Vec word embeddings (e.g., mean pooling, max pooling, or learned aggregation). This detail is load-bearing for all reported results, as different aggregation choices can systematically favor linear separators such as SVM over sequence models.
- Abstract: The LSTM architecture (layers, hidden units, dropout, optimizer, batch size, epochs, early stopping) and hyperparameter search procedure (grid, random search, or budget) are not described, nor is any indication that equivalent tuning effort was applied to the traditional models. Without this, the claim that LSTM requires significantly more time while underperforming cannot be evaluated as a general property rather than an artifact of the setup.
- Abstract: No confusion matrices, per-class metrics, or multiple-run statistics are supplied to support the efficiency and accuracy claims, leaving open the possibility that the reported ranking depends on a single favorable split or initialization.
minor comments (1)
- The abstract refers to both 'email sentiment analysis' and 'spam-related sentiments'; clarify whether the task is binary spam detection or multi-class sentiment classification, as this affects interpretation of the recall results.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting the need for improved reproducibility and transparency. We agree that the abstract, due to length constraints, omits key experimental details that are present in the full manuscript's Methods and Results sections. We will revise the abstract to include concise references to these elements and expand the relevant sections to fully address verifiability concerns. Below we respond point by point.
read point-by-point responses
-
Referee: Abstract: The headline result of 98.74% accuracy for the SVM linear kernel is presented without any description of the email corpus (size, source, label provenance, or class balance), train/test split procedure, or statistical significance tests. This renders the performance ranking unverifiable and prevents assessment of whether the comparison fairly represents real-world email sentiment tasks.
Authors: The full manuscript's Dataset section specifies the Enron email corpus of 5,000 messages (publicly sourced, manually labeled for sentiment with balanced classes). An 80/20 train/test split was used with 5-fold cross-validation. We have now added McNemar's test results showing statistical significance (p<0.05) for SVM vs. LSTM. The abstract will be revised to briefly note the corpus size, source, and split while retaining conciseness. revision: yes
-
Referee: Abstract: No information is given on how per-email vectors are constructed from Word2Vec word embeddings (e.g., mean pooling, max pooling, or learned aggregation). This detail is load-bearing for all reported results, as different aggregation choices can systematically favor linear separators such as SVM over sequence models.
Authors: The Feature Representation subsection explicitly uses mean pooling of Word2Vec embeddings to produce fixed-length per-email vectors, chosen for compatibility across model types. This is not an artifact favoring SVM; equivalent inputs were provided to LSTM. We will add a one-sentence clarification to the abstract and ensure the methods text highlights the aggregation choice. revision: yes
-
Referee: Abstract: The LSTM architecture (layers, hidden units, dropout, optimizer, batch size, epochs, early stopping) and hyperparameter search procedure (grid, random search, or budget) are not described, nor is any indication that equivalent tuning effort was applied to the traditional models. Without this, the claim that LSTM requires significantly more time while underperforming cannot be evaluated as a general property rather than an artifact of the setup.
Authors: The Model Architectures section details the LSTM (2 layers, 128 units, 0.5 dropout, Adam optimizer, batch size 32, 10 epochs with early stopping) and states that all models received equivalent grid-search hyperparameter tuning under the same compute budget. Training times are reported from identical hardware. We will incorporate a brief architecture summary into the abstract and add a sentence confirming equivalent tuning effort. revision: yes
-
Referee: Abstract: No confusion matrices, per-class metrics, or multiple-run statistics are supplied to support the efficiency and accuracy claims, leaving open the possibility that the reported ranking depends on a single favorable split or initialization.
Authors: Figure 3 presents confusion matrices and Table 2 reports per-class precision/recall/F1 for all models. To strengthen the claims, we have added averaged results over 5 independent runs with standard deviations in a new supplementary table. The abstract will reference these evaluations, and the revised manuscript will explicitly note the multi-run protocol. revision: yes
Circularity Check
No circularity: purely empirical model comparison with no derivations or self-referential reductions
full rationale
The paper performs an experimental comparison of SVM, Logistic Regression, Naive Bayes, and LSTM on email sentiment classification using Word2Vec features, reporting accuracy, recall, and timing metrics. No equations, first-principles derivations, or predictions are present that could reduce to fitted parameters or self-definitions by construction. The 98.74% SVM result is a direct empirical outcome from the described runs, not a renamed input or self-citation chain. Self-citations, if any, are irrelevant because the central claim rests on the reported experiments rather than external theorems. This is the standard non-circular case for benchmark papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Analisis komparasi kinerja lstm dan cnn dalam deteksi spam email berbasis deep learning,
M. A. Kautsar, G. G. Setiaji, and A. Rifa’i, “Analisis komparasi kinerja lstm dan cnn dalam deteksi spam email berbasis deep learning,”Bulletin of Computer Science Research, vol. 5, pp. 584–593, June 2025
work page 2025
-
[2]
Analisis performa algoritma klasifikasi untuk deteksi spam pada email,
T. T. F. Manguma and E. Fatra, “Analisis performa algoritma klasifikasi untuk deteksi spam pada email,”INNOVA- TIVE: Journal of Social Science Research, vol. 4, no. 3, pp. 16461–16465, 2024
work page 2024
-
[3]
Komparasi algoritma naive bayes dan random forest untuk identifikasi kata berpotensi spam,
M. F. A. Ad-Duali and D. P. Adinata, “Komparasi algoritma naive bayes dan random forest untuk identifikasi kata berpotensi spam,” inProsiding Seminar Nasional Teknologi dan Sains Tahun 2026, vol. 5, (Kediri, Indonesia), Program Studi Teknik Informatika, Universitas Nusantara PGRI Kediri, January 2026
work page 2026
-
[4]
Nlp-based platform as a service: A brief review,
S. Pais, J. Cordeiro, and M. L. Jamil, “Nlp-based platform as a service: A brief review,”Journal of Big Data, vol. 9, no. 54, 2022
work page 2022
-
[5]
G. Isnansyah, Sutardi, and R. A. Saputra, “Analisis perbandingan algoritma klasifikasi email spam menggunakan long short-term memory, naïve bayes dan support vector machine,”ANIMATOR, vol. 2, pp. 1–9, January–April 2024
work page 2024
-
[6]
Peningkatan performansi deteksi pesan spam melalui optimasi lstm berbasis word2vec dan grid search,
H. S. Wicaksana and H. Suhud, “Peningkatan performansi deteksi pesan spam melalui optimasi lstm berbasis word2vec dan grid search,”JITE, vol. 2, p. 21, 2026 2026
work page 2026
-
[7]
E. S. Ainun, U. Inayah, and M. Ilmih, “Klasifikasi email spam dan ham menggunakan algoritma support vector machine, naive bayes dan logistic regression,”Scientific: Journal of Computer Science and Informatics, vol. 2, no. 2, p. 77, 2024
work page 2024
-
[8]
Analisis klasifikasi sms spam menggunakan logistic regression,
F. Reviantika, Y . Azhar, and G. I. Marthasari, “Analisis klasifikasi sms spam menggunakan logistic regression,” REPOSITOR, vol. 3, pp. 387–392, August 2021
work page 2021
-
[9]
Email analysis in fraud investigation: Digital forensic and network analysis approach,
W. A. Baroto, “Email analysis in fraud investigation: Digital forensic and network analysis approach,”Asia Pacific Fraud Journal, vol. 6, no. 2, 2021
work page 2021
-
[10]
Analisa sentimen data text preprocessing pada data mining dengan menggunakan machine learning,
B. Hakim, “Analisa sentimen data text preprocessing pada data mining dengan menggunakan machine learning,” JBASE: Journal of Business and Audit Information Systems, vol. 4, no. 2, p. 16, 2021
work page 2021
-
[11]
Semeval-2016 task 4: Sentiment analysis in twitter,
P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and V . Stoyanov, “Semeval-2016 task 4: Sentiment analysis in twitter,” inProceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1–18, 2016
work page 2016
-
[12]
Natural language processing, sentiment analysis and clinical analytics,
A. Rajput, “Natural language processing, sentiment analysis and clinical analytics,”Effat University Publication, n.d
-
[13]
Hendriyana, I. M. Karo Karo, and S. Dewi, “Analisis perbandingan algoritma support vector machine, naive bayes, dan regresi logistik untuk memprediksi donor darah,”Jurnal Teknologi Terpadu, vol. 8, no. 2, pp. 121–126, 2022
work page 2022
-
[14]
Analisis sentimen aplikasi gojek menggunakan support vector machine dan k nearest neighbor,
M. N. Muttaqin and I. Kharisudin, “Analisis sentimen aplikasi gojek menggunakan support vector machine dan k nearest neighbor,”UNNES Journal of Mathematics, pp. 22–27, 2021
work page 2021
-
[15]
S. Elmaliyasari, M. A. Alzam, N. A. Pratiwi, S. S. M. Wara, and K. M. Hindrayani, “Deteksi sentimen komentar aplikasi gobis suroboyo dengan metode naive bayes dan metode regresi logistik,”JDMIS: Journal of Data Mining and Information Systems, vol. 3, no. 2, pp. 108–116, 2025
work page 2025
-
[16]
Implementasi bidirectional lstm untuk analisis sentimen terhadap layanan grab indonesia,
D. R. Alghifari, M. Edi, and L. Firmansyah, “Implementasi bidirectional lstm untuk analisis sentimen terhadap layanan grab indonesia,”Jurnal Manajemen Informatika (JAMIKA), vol. 12, no. 2, p. 89, 2022
work page 2022
-
[17]
H. C. Husada and A. S. Paramita, “Analisis sentimen pada maskapai penerbangan di platform twitter menggunakan algoritma support vector machine (svm),”Teknika, vol. 10, no. 1, pp. 18–26, 2021. 9
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.