pith. sign in

arxiv: 2401.11877 · v2 · submitted 2024-01-22 · 💻 cs.CV

Assessing the Efficacy of Deep Learning Approaches for Facial Expression Recognition in Individuals with Intellectual Disabilities

Pith reviewed 2026-05-24 04:33 UTC · model grok-4.3

classification 💻 cs.CV
keywords facial expression recognitiondeep learningintellectual disabilitiesconvolutional neural networksuser-specific trainingattention mapssocial robotics
0
0 comments X

The pith

Deep learning models for facial expression recognition require user-specific training to handle the distinct expressions of individuals with intellectual disabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates 12 convolutional neural networks trained on datasets that either exclude or include individuals with intellectual disabilities. Performance metrics and attention map analysis show clear differences in how expressions manifest and how models focus on facial regions between the two groups, as well as variations among individuals with intellectual disabilities. These distinctions lead to the conclusion that general training data is insufficient and that models must instead be adapted through tailored, user-specific methodologies to recognize expressions accurately in this population. The work matters for applications like social robots in homes or care settings that aim to interpret emotional states across diverse users.

Core claim

Examination of outcomes from training convolutional neural networks on an ensemble of datasets without individuals with intellectual disabilities versus a dataset featuring such individuals reveals significant distinctions in facial expressions, demonstrating the need for tailored user-specific training methodologies that enable models to effectively address the unique expressions of each user.

What carries the argument

Set of 12 convolutional neural networks trained in different approaches including mixed datasets, evaluated through performance and attention map variations to highlight expression differences.

Load-bearing premise

The assumption that observed performance differences and attention map variations are caused by intellectual disability status rather than confounding factors such as dataset composition, age, lighting, or labeling differences between the compared datasets.

What would settle it

Retraining the same 12 networks on new datasets matched for age, lighting, and labeling but differing only by intellectual disability status, then checking if performance gaps and attention map differences disappear.

Figures

Figures reproduced from arXiv: 2401.11877 by Cristina Manresa-Yee, F. Xavier Gaya-Morey, Jose M. Buades-Rubio, Silvia Ramis.

Figure 1
Figure 1. Figure 1: Considerations for the scenarios: each user has video clips for each [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average per-class F1 score obtained on Google FE-Test by the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy on the Google FE-Test dataset of the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy of the different training scenarios on MuDERI, by network. We have added the average results obtained by the 15 trainings on FER-DB5 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: F1 score of the different training scenarios on MuDERI, by network. We have added the average results obtained by the 15 trainings on FER-DB5 [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Computed heat maps for the different trainings on FEER-DB5 tested on Google FE-Test and MuDERI, and for the training and test on MuDERI, [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Facial expression recognition has gained significance as a means of imparting social robots with the capacity to discern the emotional states of users. The use of social robotics includes a variety of settings, including homes, nursing homes or daycare centers, serving to a wide range of users. Remarkable performance has been achieved by deep learning approaches, however, its direct use for recognizing facial expressions in individuals with intellectual disabilities has not been yet studied in the literature, to the best of our knowledge. To address this objective, we train a set of 12 convolutional neural networks in different approaches, including an ensemble of datasets without individuals with intellectual disabilities and a dataset featuring such individuals. Our examination of the outcomes, both the performance and the important image regions for the models, reveals significant distinctions in facial expressions between individuals with and without intellectual disabilities, as well as among individuals with intellectual disabilities. Remarkably, our findings show the need of facial expression recognition within this population through tailored user-specific training methodologies, which enable the models to effectively address the unique expressions of each user.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper trains 12 CNNs for facial expression recognition, comparing an ensemble of non-ID datasets against an ID-specific dataset. It reports performance differences and variations in attention maps, concluding that these indicate unique expressions in individuals with intellectual disabilities and that user-specific training methodologies are required to address them effectively.

Significance. If the performance and attention-map differences can be causally attributed to intellectual-disability status rather than dataset confounders, the work would usefully highlight limitations of off-the-shelf FER models for inclusive robotics applications. The study fills a literature gap on this population, but the current experimental design does not isolate the claimed causal factor.

major comments (2)
  1. [Methods / Experimental setup] The central claim that distinctions require user-specific training rests on attributing performance gaps and attention-map differences to ID status. The experimental comparison (ensemble of non-ID datasets vs. ID dataset) provides no matching, stratification, or covariate adjustment for age, lighting, camera angle, labeling protocol, or expression distribution; without such controls the attribution cannot be isolated and the recommendation for tailored methodologies is not supported.
  2. [Results] Results section: no quantitative metrics (accuracy, F1, confusion matrices), dataset sizes, statistical tests, or model-configuration details are supplied to allow verification that the reported distinctions are reliable or larger than would be expected from the listed confounders.
minor comments (2)
  1. [Abstract] Abstract and introduction should explicitly state the sizes and sources of the 12 datasets and the ID corpus so readers can assess comparability.
  2. [Methods] Clarify how the 12 models were selected and whether hyper-parameters were tuned separately on each corpus; this affects interpretation of the performance comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses
  1. Referee: [Methods / Experimental setup] The central claim that distinctions require user-specific training rests on attributing performance gaps and attention-map differences to ID status. The experimental comparison (ensemble of non-ID datasets vs. ID dataset) provides no matching, stratification, or covariate adjustment for age, lighting, camera angle, labeling protocol, or expression distribution; without such controls the attribution cannot be isolated and the recommendation for tailored methodologies is not supported.

    Authors: We agree that the experimental design does not include explicit controls or adjustments for the potential confounders mentioned. The datasets used are existing collections that reflect practical scenarios in which such models would be deployed. While this limits causal attribution to ID status alone, the consistent differences observed across multiple models and attention maps support our conclusion that off-the-shelf models may not suffice. In the revised manuscript, we will expand the discussion to acknowledge these limitations and emphasize that the recommendation for user-specific training is based on observed performance gaps rather than strict causal isolation. revision: yes

  2. Referee: [Results] Results section: no quantitative metrics (accuracy, F1, confusion matrices), dataset sizes, statistical tests, or model-configuration details are supplied to allow verification that the reported distinctions are reliable or larger than would be expected from the listed confounders.

    Authors: We acknowledge that the current version of the manuscript lacks sufficient quantitative metrics, dataset sizes, statistical tests, and detailed model configurations in the Results section. We will revise the manuscript to include accuracy, F1 scores, confusion matrices, dataset sizes, statistical tests, and full model details to allow proper verification. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation with independent dataset comparisons

full rationale

The paper trains 12 CNNs on an ensemble of non-ID datasets versus an ID dataset, then reports performance metrics and attention maps. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing claims. The central recommendation for user-specific training follows directly from the observed experimental differences rather than reducing to any input by construction. This is a standard empirical study whose results are externally falsifiable via replication on the cited datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, invented entities, or non-standard axioms beyond routine computer vision assumptions. No evidence of ad-hoc inventions or fitted constants central to the claim.

axioms (1)
  • domain assumption Convolutional neural networks can extract discriminative facial features from images for expression classification
    Standard assumption invoked when training CNNs on facial images; location is implicit in the choice of 12 CNN architectures.

pith-pipeline@v0.9.0 · 5730 in / 1206 out tokens · 19020 ms · 2026-05-24T04:33:41.191554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    The bodily expressive action stimulus test (beast). construction and validation of a stimulus basis for measuring perception of whole body expression of emotions,

    B. De Gelder and J. Van den Stock, “The bodily expressive action stimulus test (beast). construction and validation of a stimulus basis for measuring perception of whole body expression of emotions,” Frontiers in Psychology , vol. 2, 2011. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00181

  2. [2]

    Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements,

    L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D. Pollak, “Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements,” Psychological Science in the Public Interest, vol. 20, no. 1, pp. 1–68, 2019

  3. [3]

    Guest editorial cogni- tive agents and robots for human-centered systems,

    A. Di Nuovo, G. Acampora, and M. Schlesinger, “Guest editorial cogni- tive agents and robots for human-centered systems,” IEEE Transactions on Cognitive and Developmental Systems , vol. 9, no. 1, pp. 1–4, 2017

  4. [4]

    Conversational affective social robots for ageing and dementia support,

    M. R. Lima, M. Wairagkar, M. Gupta, F. Rodriguez y Baena, P. Bar- naghi, D. J. Sharp, and R. Vaidyanathan, “Conversational affective social robots for ageing and dementia support,” IEEE Transactions on Cognitive and Developmental Systems , vol. 14, no. 4, pp. 1378–1397, 2022

  5. [5]

    A brief review of facial emotion recognition based on visual information,

    B. C. Ko, “A brief review of facial emotion recognition based on visual information,” Sensors, vol. 18, no. 2, 2018

  6. [6]

    A survey on human face expression recognition techniques,

    I. Revina and W. S. Emmanuel, “A survey on human face expression recognition techniques,” Journal of King Saud University - Computer and Information Sciences , vol. 33, no. 6, pp. 619–628, 2021

  7. [7]

    Deep facial expression recognition: A survey,

    S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Transactions on Affective Computing , vol. 13, no. 3, pp. 1195– 1215, 2022

  8. [8]

    An argument for basic emotions,

    P. Ekman, “An argument for basic emotions,” Cognition and Emotion , vol. 6, no. 3-4, pp. 169–200, 1992

  9. [9]

    Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements,

    L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D. Pollak, “Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements,” Psychological Science in the Public Interest, vol. 20, no. 1, pp. 1–68, 2019

  10. [10]

    Emotional expression in psychiatric conditions: New tech- nology for clinicians,

    K. Grabowski, A. Rynkiewicz, A. Lassalle, S. Baron-Cohen, B. Schuller, N. Cummins, A. Baird, J. Podg ´orska-Bednarz, A. Pienia ¸ ˙zek, and I. Łucka, “Emotional expression in psychiatric conditions: New tech- nology for clinicians,” Psychiatry and Clinical Neurosciences , vol. 73, no. 2, pp. 50–62, 2019

  11. [11]

    Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an rgb-d sensor,

    S. Medjden, N. Ahmed, and M. Lataifeh, “Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an rgb-d sensor,” PLoS ONE , vol. 15, no. 7, p. e0235908, 2020

  12. [12]

    Using a social robot to evaluate facial expressions in the wild,

    S. Ramis, J. M. Buades, and F. J. Perales, “Using a social robot to evaluate facial expressions in the wild,” Sensors, vol. 20, no. 23, 2020

  13. [13]

    Automatic analysis of facial expressions: the state of the art,

    M. Pantic and L. J. M. Rothkrantz, “Automatic analysis of facial expressions: the state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, no. 12, pp. 1424–1445, 2000

  14. [14]

    Automatic facial expression analysis: a survey,

    B. Fasel and J. Luettin, “Automatic facial expression analysis: a survey,” Pattern Recognition, vol. 36, no. 1, pp. 259–275, 2003

  15. [15]

    The impact of contextual information on the emotion recognition of children with an intellectual disability,

    G. Murray, K. McKenzie, A. Murray, K. Whelan, J. Cossar, K. Murray, and J. Scotland, “The impact of contextual information on the emotion recognition of children with an intellectual disability,” Journal of Applied Research in Intellectual Disabilities , vol. 32, no. 1, pp. 152–158, 2019. [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.11...

  16. [16]

    The expression and assessment of emotions and internal states in individuals with severe or profound intellectual disabilities,

    D. Adams and C. Oliver, “The expression and assessment of emotions and internal states in individuals with severe or profound intellectual disabilities,” Clinical Psychology Review , vol. 31, no. 3, pp. 293– 306, 2011. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0272735811000080

  17. [17]

    World Health Organization (WHO), International Classification of Func- tioning, Disability and Health (ICF) , 2018

  18. [18]

    Facial emotion recognition in intellectual disabilities,

    R. H. Zaja and J. Rojahn, “Facial emotion recognition in intellectual disabilities,” Current Opinion in Psychiatry , vol. 21, no. 5, 2008. [Online]. Available: https://journals.lww.com/co-psychiatry/ fulltext/2008/09000/facial emotion recognition in intellectual.3.aspx

  19. [19]

    Teaching Children With Mild to Moderate Intellectual Disabilities to Select and Produce Facial Expressions of Emotion Using Modelling and Feedback,

    T. Rayworth, “Teaching Children With Mild to Moderate Intellectual Disabilities to Select and Produce Facial Expressions of Emotion Using Modelling and Feedback,” Ph.D. dissertation, Edith Cowan University, 2997

  20. [20]

    Facial emotional expressions of adults with mental retardation,

    F. L. Wilczenski, “Facial emotional expressions of adults with mental retardation,” Education and Training in Mental Retardation , vol. 26, no. 3, pp. 319–324, 1991. [Online]. Available: http: //www.jstor.org/stable/23878619

  21. [21]

    Facial emotion recognition using deep learning: review and insights,

    W. Mellouk and W. Handouzi, “Facial emotion recognition using deep learning: review and insights,” Procedia Computer Science, vol. 175, pp. 689–694, 2020, the 17th International Conference on Mobile Systems and Pervasive Computing (MobiSPC),The 15th International Conference on Future Networks and Communications (FNC),The 10th International Conference on S...

  22. [22]

    Automatic facial expression recognition for the interaction of individuals with multiple disabilities,

    C. Campomanes- ´Alvarez and B. R. Campomanes- ´Alvarez, “Automatic facial expression recognition for the interaction of individuals with multiple disabilities,” in 2021 International Conference on Applied Artificial Intelligence (ICAPAI), 2021, pp. 1–6

  23. [23]

    Recognition of behaviour patterns for people with profound intellectual and multiple disabilities,

    E. Dovgan, J. Vali ˇc, G. Slapni ˇcar, and M. Lu ˇstrek, “Recognition of behaviour patterns for people with profound intellectual and multiple disabilities,” in Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers, ser. UbiComp...

  24. [24]

    Facial expression in adults with Down’s Syndrome

    M. C. Smith and D. G. Dodson, “Facial expression in adults with Down’s Syndrome.” US, pp. 602–608, 1996

  25. [25]

    Emotion recognition in individuals with down syndrome: A convolutional neural network-based algorithm proposal,

    N. Paredes, E. Caicedo-Bravo, and B. Bacca, “Emotion recognition in individuals with down syndrome: A convolutional neural network-based algorithm proposal,” Symmetry, vol. 15, no. 7, p. 1435, Jul. 2023. [Online]. Available: http://dx.doi.org/10.3390/sym15071435

  26. [26]

    The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,

    P. Lucey, J. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” 07 2010, pp. 94–101

  27. [27]

    A 3d facial expression database for facial behavior research,

    L. Yin, X. Wei, Y . Sun, J. Wang, and M. Rosato, “A 3d facial expression database for facial behavior research,” vol. 2006, 05 2006, pp. 211–216

  28. [28]

    The Japanese Female Facial Expression (JAFFE) Dataset,

    M. Lyons, M. Kamachi, and J. Gyoba, “The Japanese Female Facial Expression (JAFFE) Dataset,” Apr. 1998, The images are provided at 10 no cost for non- commercial scientific research only. If you agree to the conditions listed below, you may request access to download

  29. [29]

    Warsaw set of emotional facial expression pictures: A validation study of facial display photographs,

    M. Olszanowski, G. Pochwatko, K. Kuklinski, M. Scibor-Rylski, P. Lewinski, and R. Ohme, “Warsaw set of emotional facial expression pictures: A validation study of facial display photographs,” Frontiers in Psychology, vol. 5, 12 2014

  30. [30]

    A novel approach to cross dataset studies in facial expression recognition,

    S. Ramis, J. M. Buades, F. J. Perales, and C. Manresa-Yee, “A novel approach to cross dataset studies in facial expression recognition,” Multimedia Tools Appl., vol. 81, no. 27, p. 39507–39544, nov 2022

  31. [31]

    Muderi: Mul- timodal database for emotion recognition among intellectually disabled individuals,

    J. Shukla, M. Barreda- ´Angeles, J. Oliver, and D. Puig, “Muderi: Mul- timodal database for emotion recognition among intellectually disabled individuals,” 11 2016

  32. [32]

    A contrario detection of faces: A case example,

    J.-L. Lisani, S. Ramis, and F. J. Perales, “A contrario detection of faces: A case example,” SIAM Journal on Imaging Sciences , vol. 10, no. 4, pp. 2091–2118, 2017

  33. [33]

    300 faces in-the-wild challenge: The first facial landmark localization challenge,

    C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” 12 2013, pp. 397–403

  34. [34]

    Imagenet classifica- tion with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- tion with deep convolutional neural networks,” in Advances in Neural Information Processing Systems , F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012

  35. [35]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015

  36. [36]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015

  37. [37]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” 2015

  38. [38]

    Xception: Deep learning with depthwise separable convolu- tions,

    F. Chollet, “Xception: Deep learning with depthwise separable convolu- tions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

  39. [39]

    Searching for mobilenetv3,

    A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for mobilenetv3,” 2019

  40. [40]

    Efficientnetv2: Smaller models and faster training,

    M. Tan and Q. V . Le, “Efficientnetv2: Smaller models and faster training,” 2021

  41. [41]

    Deep learning for real-time robust facial expression recognition on a smartphone,

    I. Song, H.-J. Kim, and P. B. Jeon, “Deep learning for real-time robust facial expression recognition on a smartphone,” in 2014 IEEE International Conference on Consumer Electronics (ICCE) , 2014, pp. 564–567

  42. [42]

    A deep-learning approach to facial expression recognition with candid images,

    W. Li, M. Li, Z. Su, and Z. Zhu, “A deep-learning approach to facial expression recognition with candid images,” in 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, 2015, pp. 279–282

  43. [43]

    Analysis of Gender Differences in Facial Expression Recognition Based on Deep Learn- ing Using Explainable Artificial Intelligence,

    C. Manresa-Yee, S. Ramis, and J. M. Buades, “Analysis of Gender Differences in Facial Expression Recognition Based on Deep Learn- ing Using Explainable Artificial Intelligence,” International Journal of Interactive Multimedia and Artificial Intelligence (In press)

  44. [44]

    ”why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, “”why should i trust you?”: Explaining the predictions of any classifier,” 2016

  45. [45]

    Slic superpixels compared to state-of-the-art superpixel methods,

    R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. S ¨usstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,”IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 34, no. 11, pp. 2274–2282, 2012

  46. [46]

    Explainable facial expression recognition for people with intellectual disabilities,

    S. Ramis, C. Manresa-Yee, J. M. Buades-Rubio, and F. X. Gaya-Morey, “Explainable facial expression recognition for people with intellectual disabilities,” in XXIII International Conference on Human Computer In- teraction (Interaccion 2023). Lleida, Spain: Association for Computing Machinery, September 2023. VII. B IOGRAPHY SECTION F. Xavier Gaya-Morey F. ...