Comparison of Classical Machine Learning Approaches on Bangla Textual Emotion Analysis
Pith reviewed 2026-05-24 20:12 UTC · model grok-4.3
The pith
Support vector machines with radial basis function kernels classify six emotions in Bangla Facebook comments with 52.98 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors gathered a corpus of Bangla Facebook comments and annotated it for six emotions. They compared five classical machine learning techniques and found that SVM with a non-linear radial-basis function kernel gave the highest performance with 52.98% average accuracy and 0.3324 macro F1 score.
What carries the argument
Support vector machine classifier using a radial basis function kernel on combinations of features derived from the annotated Bangla text corpus
Load-bearing premise
The manual annotations of the Facebook comments correctly capture the emotions expressed and the collected corpus is representative of Bangla emotional language use.
What would settle it
A new experiment that re-annotates the same or similar Bangla comments with different annotators and retrains the models to check if SVM with RBF still achieves the top accuracy and F1 scores.
read the original abstract
Detecting emotions from text is an extension of simple sentiment polarity detection. Instead of considering only positive or negative sentiments, emotions are conveyed using more tangible manner; thus, they can be expressed as many shades of gray. This paper manifests the results of our experimentation for fine-grained emotion analysis on Bangla text. We gathered and annotated a text corpus consisting of user comments from several Facebook groups regarding socio-economic and political issues, and we made efforts to extract the basic emotions (sadness, happiness, disgust, surprise, fear, anger) conveyed through these comments. Finally, we compared the results of the five most popular classical machine learning techniques namely Naive Bayes, Decision Tree, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and K-Means Clustering with several combinations of features. Our best model (SVM with a non-linear radial-basis function (RBF) kernel) achieved an overall average accuracy score of 52.98% and an F1 score (macro) of 0.3324
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper collects and annotates a corpus of Bangla Facebook comments for six basic emotions (sadness, happiness, disgust, surprise, fear, anger) and compares five classical ML approaches—Naive Bayes, Decision Tree, k-NN, SVM, and K-Means—across feature combinations. The best reported result is SVM with RBF kernel at 52.98% accuracy and 0.3324 macro-F1.
Significance. If the experimental details hold, the work supplies a new annotated resource and baseline numbers for Bangla emotion detection, a low-resource setting where such comparisons remain useful. The inclusion of both supervised classifiers and unsupervised clustering broadens the scope, though the modest absolute performance underscores the inherent difficulty of fine-grained emotion classification.
major comments (2)
- [Abstract] Abstract: the reported accuracy (52.98%) and macro-F1 (0.3324) are presented without any mention of corpus size, class distribution, number of annotators, inter-annotator agreement, train/test split, or validation procedure. These omissions are load-bearing for the central empirical claim, as it is impossible to determine whether the scores exceed chance level for a 6-class problem or are reproducible.
- [Methodology/Results] Methodology/Results (inferred from the listed methods): K-Means is an unsupervised clustering algorithm, yet the paper evaluates it on labeled emotion data. The manuscript must specify the cluster-to-label mapping procedure (e.g., majority vote, Hungarian assignment) used to compute accuracy and F1; without this, the direct comparison to the supervised models is not interpretable.
minor comments (2)
- [Abstract] Abstract: 'this paper manifests the results' is nonstandard; 'reports' or 'presents' is clearer.
- [Abstract] The abstract states that 'several combinations of features' were tested but does not enumerate them (e.g., unigrams, TF-IDF, n-grams, or lexical resources). Adding this list would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate the suggested clarifications in the revised version.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported accuracy (52.98%) and macro-F1 (0.3324) are presented without any mention of corpus size, class distribution, number of annotators, inter-annotator agreement, train/test split, or validation procedure. These omissions are load-bearing for the central empirical claim, as it is impossible to determine whether the scores exceed chance level for a 6-class problem or are reproducible.
Authors: We agree that the abstract would benefit from these contextual details to help readers immediately interpret the results. The full manuscript describes the Facebook comment corpus, the annotation process for the six emotions, and the train/test splits with validation. We will revise the abstract to briefly note the corpus size, the 6-class setup, and the evaluation procedure so that the reported accuracy and macro-F1 can be assessed against chance performance. revision: yes
-
Referee: [Methodology/Results] Methodology/Results (inferred from the listed methods): K-Means is an unsupervised clustering algorithm, yet the paper evaluates it on labeled emotion data. The manuscript must specify the cluster-to-label mapping procedure (e.g., majority vote, Hungarian assignment) used to compute accuracy and F1; without this, the direct comparison to the supervised models is not interpretable.
Authors: We acknowledge that the cluster-to-label mapping must be stated explicitly. After running K-Means on the feature vectors, each cluster was assigned the emotion label with the highest frequency among its members according to the ground-truth annotations (majority vote). We will add a clear description of this procedure in the methodology section of the revised manuscript to make the unsupervised results directly comparable to the supervised classifiers. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a standard empirical ML comparison: a new Bangla emotion corpus is collected and annotated, then five classical classifiers (NB, DT, kNN, SVM, K-Means) are trained with feature combinations and evaluated via accuracy/F1 on held-out data. The reported 52.98% accuracy and 0.3324 macro-F1 for SVM-RBF are direct experimental measurements, not quantities that reduce to fitted inputs by construction. No equations, derivations, self-citation chains, uniqueness theorems, or ansatzes appear; the annotation premise is an explicit modeling assumption rather than a hidden circular step. This is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The six basic emotions (sadness, happiness, disgust, surprise, fear, anger) are the appropriate and exhaustive categories for the emotional content in the collected comments.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.