Improving Facial Emotion Recognition through Dataset Merging and Balanced Training Strategies
Pith reviewed 2026-05-09 23:58 UTC · model grok-4.3
The pith
Merging the CK+, FER+, and KDEF datasets plus augmentation and weighted sampling lets a deep CNN classify seven basic facial emotions at 82% accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By increasing training data through the merger of CK+, FER+, and KDEF and then applying online and offline augmentation together with random weighted sampling, the deep convolutional network reaches 82% accuracy on the seven basic emotions and demonstrates that these steps directly reduce the performance penalty caused by class imbalance.
What carries the argument
Dataset merging across CK+, FER+, and KDEF together with online/offline augmentation and random weighted sampling to correct remaining class imbalance before training a deep convolutional network.
If this is right
- The merged dataset supplies more examples per emotion class, supporting more stable feature learning inside the convolutional network.
- Random weighted sampling and augmentation together shrink the accuracy gap between majority and minority emotion classes.
- Overall classification reaches 82% on the seven basic emotions, outperforming training on any single source dataset.
- The combination of merging and balancing directly mitigates the data imbalance problem that otherwise degrades facial emotion recognition.
Where Pith is reading between the lines
- The same merging-plus-balancing recipe could be tested on other multi-class image tasks that suffer from uneven label distributions.
- Real-time systems in human-computer interaction might achieve more reliable emotion detection by adopting this data-preparation pipeline rather than solely changing model depth.
- Cross-dataset label noise remains a hidden variable; explicit consistency checks on merged labels would be a natural next measurement.
- If the 82% figure generalizes, incremental gains may now come more from refining the balancing weights than from further dataset growth.
Load-bearing premise
Emotion labels remain consistent and compatible when the three datasets are merged and that augmentation plus weighted sampling improves generalization without introducing new biases or overfitting.
What would settle it
Training the same network on each dataset separately without merging or balancing and finding accuracy well below 82%, or testing the merged model on a completely independent dataset such as AffectNet and observing a large drop in performance.
Figures
read the original abstract
In this paper, a deep learning framework is proposed for automatic facial emotion based on deep convolutional networks. In order to increase the generalization ability and the robustness of the method, the dataset size is increased by merging three publicly available facial emotion datasets: CK+, FER+ and KDEF. Despite the increase in dataset size, the minority classes still suffer from insufficient number of training samples, leading to data imbalance. The data imbalance problem is minimized by online and offline augmentation techniques and random weighted sampling. Experimental results demonstrate that the proposed method can recognize the seven basic emotions with 82% accuracy. The results demonstrate the effectiveness of the proposed approach in tackling the challenges of data imbalance and improving classification performance in facial emotion recognition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a deep convolutional network framework for recognizing seven basic facial emotions. To boost generalization and robustness, it merges the CK+, FER+, and KDEF datasets and mitigates resulting class imbalance via online/offline augmentation plus random weighted sampling. Experimental results are reported to reach 82% accuracy, which the authors interpret as evidence that the merging and balancing strategies successfully address data imbalance and improve classification performance.
Significance. If the 82% figure were shown to be robust against baselines, label harmonization checks, and standard validation protocols, the work would offer a straightforward empirical recipe for enlarging training sets in facial emotion recognition while controlling imbalance. The approach itself is conventional, so its value would lie mainly in the concrete performance lift rather than in novel methodology.
major comments (3)
- [Abstract] Abstract and Experimental results section: the central claim of 82% accuracy is presented without any baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether the reported accuracy reflects genuine improvement from dataset merging and balancing or simply the result of training on a larger but noisier pool.
- [Methods] Methods / Dataset merging description: no label-mapping table, inter-annotator agreement statistics, or cross-dataset consistency check is supplied for aligning the seven emotion classes across CK+ (FACS-coded posed expressions), FER+ (crowd-sourced in-the-wild labels), and KDEF (actor-intended posed expressions). Systematic label mismatches would directly undermine the accuracy figure.
- [Experimental results] Experimental results: the claim that augmentation and weighted sampling improve generalization is not supported by ablation studies or held-out test details. Standard augmentation cannot correct label noise; therefore the 82% result may conflate recognition performance with annotation artifacts.
minor comments (1)
- [Abstract] The abstract states the use of 'deep convolutional networks' but never specifies the exact architecture, input resolution, or training hyperparameters.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and Experimental results section: the central claim of 82% accuracy is presented without any baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether the reported accuracy reflects genuine improvement from dataset merging and balancing or simply the result of training on a larger but noisier pool.
Authors: We acknowledge the absence of explicit baseline comparisons and detailed validation protocols in the current version. The 82% accuracy was obtained on the merged dataset using the proposed balancing strategies. In the revision we will add baseline results from models trained on each individual dataset (CK+, FER+, KDEF) separately, specify the cross-validation protocol (5-fold stratified), and include a confusion matrix plus per-class precision/recall for error analysis. These additions will allow direct assessment of whether the performance lift stems from merging and balancing. revision: yes
-
Referee: [Methods] Methods / Dataset merging description: no label-mapping table, inter-annotator agreement statistics, or cross-dataset consistency check is supplied for aligning the seven emotion classes across CK+ (FACS-coded posed expressions), FER+ (crowd-sourced in-the-wild labels), and KDEF (actor-intended posed expressions). Systematic label mismatches would directly undermine the accuracy figure.
Authors: We will insert a label-mapping table in the revised Methods section that explicitly shows the correspondence of the seven emotion categories across the three datasets. All datasets use the same seven-class taxonomy, and mapping followed the original annotations. Inter-annotator agreement statistics are unavailable for FER+ in its public release, so we cannot generate new figures; we will instead add a brief discussion of each dataset's labeling provenance and known variability. We will also describe the manual sample verification performed during merging to check label consistency. revision: partial
-
Referee: [Experimental results] Experimental results: the claim that augmentation and weighted sampling improve generalization is not supported by ablation studies or held-out test details. Standard augmentation cannot correct label noise; therefore the 82% result may conflate recognition performance with annotation artifacts.
Authors: We will add ablation experiments that isolate the contribution of offline augmentation, online augmentation, and random weighted sampling. The test-set split (20% held-out, stratified by class and dataset source) will be detailed. While we agree that augmentation cannot remove label noise, the weighted sampling was introduced precisely to improve minority-class generalization on the merged data; we will discuss the potential influence of annotation artifacts and note that the multi-source training provides a form of robustness check. revision: yes
- Inter-annotator agreement statistics for FER+, which are not provided in the original dataset release and cannot be retroactively computed without re-annotating the data.
Circularity Check
No circularity: purely empirical ML pipeline with held-out evaluation
full rationale
The paper describes a standard supervised learning workflow: merge three public datasets (CK+, FER+, KDEF), apply online/offline augmentation and weighted sampling to address class imbalance, train a deep CNN, and report accuracy on held-out test data. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The 82% accuracy figure is obtained by direct training and evaluation rather than by algebraic reduction to the input data or prior author results. This is the expected non-circular outcome for an empirical computer-vision experiment.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah, M. Hamdi, Emotion recognition for healthcare surveillance systems us- ing neural networks: A survey, in: 2021 International Wireless Commu- nications and Mobile Computing (IWCMC), IEEE, 2021, pp. 681–687
work page 2021
-
[2]
B. Ribeiro, G. Oliveira, A. Laranjeira, J. P. Arrais, Deep learning in digital marketing: brand detection and emotion recognition, Interna- tional Journal of Machine Intelligence and Sensory Signal Processing 2 (2017) 32–50
work page 2017
- [3]
-
[4]
Y. An, J. Lee, E. Bak, S. Pan, Deep facial emotion recognition using local features based on facial landmarks for security system., Computers, Materials & Continua 76 (2023). 19
work page 2023
- [5]
-
[6]
D. Matsumoto, More evidence for the universality of a contempt ex- pression, Motivation and Emotion 16 (1992) 363–368
work page 1992
-
[7]
C. Shan, S. Gong, P. W. McOwan, Facial expression recognition based on local binary patterns: A comprehensive study, Image and vision Computing 27 (2009) 803–816
work page 2009
-
[8]
Y. Shi, Z. Lv, N. Bi, C. Zhang, An improved sift algorithm for robust emotion recognition under various face poses and illuminations, Neural Computing and Applications 32 (2020) 9267–9281
work page 2020
-
[9]
J. Zhou, S. Zhang, H. Mei, D. Wang, A method of facial expression recognition based on gabor and nmf, Pattern Recognition and Image Analysis 26 (2016) 119–124
work page 2016
-
[10]
M. Abdulrahman, A. Eleyan, Facial expression recognition using sup- port vector machines, in: 2015 23nd signal processing and communica- tions applications conference (SIU), IEEE, 2015, pp. 276–279
work page 2015
-
[11]
P. P. Thakare, P. S. Patil, Facial expression recognition algorithm based on knn classifier, International Journal of Computer Science and Net- work 5 (2016) 941
work page 2016
-
[12]
B. Kayao˘ glu, T. Tokta¸ s, S. Kırbız, Cnn-based emotion recognition using data augmentation and preprocessing methods, in: 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), IEEE, 2023, pp. 1–4
work page 2023
-
[13]
¨O. Ezerceli, M. T. Eskil, Convolutional neural network (cnn) algorithm based facial emotion recognition (fer) system for fer-2013 dataset, in: 2022 International Conference on Electrical, Computer, Communica- tions and Mechatronics Engineering (ICECCME), IEEE, 2022, pp. 1–6
work page 2013
-
[14]
M. R. A. Borgalli, S. Surve, Deep learning for facial emotion recogni- tion using custom cnn architecture, in: Journal of Physics: Conference Series, volume 2236, IOP Publishing, 2022, p. 012004. 20
work page 2022
-
[15]
S. Z. Jumani, F. Ali, S. Guriro, I. A. Kandhro, A. Khan, A. Zaidi, Facial expression recognition with histogram of oriented gradients using cnn, Indian Journal of Science and Technology 12 (2019) 1–8
work page 2019
-
[16]
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews, The extended cohn-kanade dataset (ck+): A complete dataset for ac- tion unit and emotion-specified expression, in: 2010 ieee computer so- ciety conference on computer vision and pattern recognition-workshops, IEEE, 2010, pp. 94–101
work page 2010
-
[17]
M. G. Calvo, D. Lundqvist, Facial expressions of emotion (kdef): Iden- tification under different display-duration conditions, Behavior research methods 40 (2008) 109–115
work page 2008
-
[18]
E. Barsoum, C. Zhang, C. C. Ferrer, Z. Zhang, Training deep networks for facial expression recognition with crowd-sourced label distribution, in: Proceedings of the 18th ACM international conference on multi- modal interaction, 2016, pp. 279–283
work page 2016
-
[19]
J. Deng, J. Guo, E. Ververas, I. Kotsia, S. Zafeiriou, Retinaface: Single-shot multi-level face localisation in the wild, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5203–5212
work page 2020
-
[20]
E. D. Cubuk, B. Zoph, J. Shlens, Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, in: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 702–703
work page 2020
-
[21]
M. Shekelyan, G. Cormode, P. Triantafillou, A. Shanghooshabad, Q. Ma, Weighted random sampling over joins, arXiv preprint arXiv:2201.02670 (2022)
-
[22]
Y. H. Kwon, N. da Vitoria Lobo, Age classification from facial images, Computer vision and image understanding 74 (1999) 1–21
work page 1999
-
[23]
S. Li, W. Deng, Deep facial expression recognition: A survey, IEEE transactions on affective computing 13 (2020) 1195–1215
work page 2020
- [24]
-
[25]
M. N. Chaudhari, M. Deshmukh, G. Ramrakhiani, R. Parvatikar, Face detection using viola jones algorithm and neural networks, in: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE, 2018, pp. 1–6
work page 2018
- [26]
-
[27]
T. F. Cootes, G. J. Edwards, C. J. Taylor, Active appearance mod- els, IEEE Transactions on pattern analysis and machine intelligence 23 (2001) 681–685
work page 2001
-
[28]
T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, Active shape models-their training and application, Computer vision and image un- derstanding 61 (1995) 38–59
work page 1995
-
[29]
M. Kass, A. Witkin, D. Terzopoulos, Snakes: Active contour models, International journal of computer vision 1 (1988) 321–331
work page 1988
-
[30]
D. Chen, S. Ren, Y. Wei, X. Cao, J. Sun, Joint cascade fac detection and alignment, in: Computer Vision–EECV 2014: 13th European Con- frence, Zurich, Switzerland, September 6-12, 2014, Proceedings, part VI 13, Springer, 2014, pp. 109–122
work page 2014
- [31]
-
[32]
K. He, G. Gkioxari, P. Doll´ ar, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961– 2969
work page 2017
-
[33]
B. Chaudhuri, N. Vesdapunt, B. Wang, Joint face detection and facial motion retargeting for multiple faces, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9719–9728
work page 2019
-
[34]
E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le, Autoaug- ment: Learning augmentation strategies from data, in: Proceedings of 22 the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 113–123
work page 2019
-
[35]
D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, B. Laksh- minarayanan, Augmix: A simple data processing method to improve robustness and uncertainty, arXiv preprint arXiv:1912.02781 (2019)
-
[36]
Kırbız, Facial emotion recognition using residual neural networks, Electrica 24 (2024) 818–825
S. Kırbız, Facial emotion recognition using residual neural networks, Electrica 24 (2024) 818–825
work page 2024
-
[37]
Prakasa, Texture feature extraction by using local binary pattern, INKOM Journal 9 (2016) 45–48
E. Prakasa, Texture feature extraction by using local binary pattern, INKOM Journal 9 (2016) 45–48
work page 2016
-
[38]
H. Kaya, F. G¨ urpınar, A. A. Salah, Video-based emotion recognition in the wild using deep transfer learning and score fusion, Image and Vision Computing 65 (2017) 66–75
work page 2017
-
[39]
K. Wang, X. Peng, J. Yang, D. Meng, Y. Qiao, Region attention net- works for pose and occlusion robust facial expression recognition, IEEE Transactions on Image Processing 29 (2020) 4057–4069
work page 2020
- [40]
- [41]
-
[42]
Z. Wang, K. Zhang, W. Luo, R. Sankaranarayana, Htnet for micro- expression recognition, Neurocomputing 602 (2024) 128196
work page 2024
-
[43]
W. Niu, K. Zhang, D. Li, W. Luo, Four-player groupgan for weak expression recognition via latent expression magnification, Knowledge- Based Systems 251 (2022) 109304
work page 2022
-
[44]
S. Yang, P. Luo, C.-C. Loy, X. Tang, Wider face: A face detection benchmark, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525–5533. 23
work page 2016
-
[45]
S. Garatti, A. Car` e, M. C. Campi, Complexity is an effective observable to tune early stopping in scenario optimization, IEEE Transactions on Automatic Control 68 (2022) 928–942
work page 2022
- [46]
- [47]
- [48]
-
[49]
T. A. Araf, A. Siddika, S. Karimi, M. G. R. Alam, Real-time face emotion recognition and visualization using grad-cam, in: 2022 Second International Conference on Advances in Electrical, Computing, Com- munication and Sustainable Technologies (ICAECT), IEEE, 2022, pp. 1–5
work page 2022
- [50]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.