pith. sign in

arxiv: 2606.08612 · v1 · pith:3N3PARQ6new · submitted 2026-06-07 · 💻 cs.CV

Facial Expression Recognition in the Deep Learning Era: A Systematic Multi-Criteria Review of Methods, Models, Datasets, Performance, Challenges, and Future Research Directions

Pith reviewed 2026-06-27 18:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords Facial Expression RecognitionDeep LearningSurveyTaxonomyDatasetsPerformance EvaluationIn-the-wild ConditionsFacial Affect Recognition
0
0 comments X

The pith

This survey organizes deep learning FER literature via a seven-axis taxonomy spanning tasks, modalities, architectures, and applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper traces facial expression recognition from early handcrafted features through five phases to current attention, vision-language, and foundation-model methods. It supplies a multi-criteria taxonomy that dissects the field along recognition task, input modality, pre-processing, network architecture, learning strategy, acquisition setting, and application domain. The review then compares representative methods on public benchmarks, catalogs datasets with their annotation schemes, and outlines open challenges. A sympathetic reader would value the structured map for locating where progress has occurred and where gaps remain in real-world conditions.

Core claim

The authors deliver a systematic review of deep learning-based facial expression recognition linked to the broader facial affect recognition domain. They describe five evolutionary phases, introduce a seven-axis taxonomy for literature analysis, provide per-criterion comparisons under in-the-wild settings, compile a task-organized dataset catalog, report quantitative performance tables for state-of-the-art models, and discuss current limitations together with future directions.

What carries the argument

A multi-criteria taxonomy that classifies the literature along seven complementary axes: recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, and application domain.

If this is right

  • Critical strengths and limitations of each taxonomy category become visible under in-the-wild conditions.
  • Public datasets receive a unified, task-organized catalog with annotation schemes and evaluation protocols.
  • Quantitative performance tables allow direct comparison of representative state-of-the-art methods on common benchmarks.
  • Open challenges and future research directions are identified from the gaps across the seven axes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers could use the seven axes as a checklist when designing new models to ensure coverage of under-explored combinations such as micro-expression tasks with vision-language architectures.
  • The taxonomy may help practitioners select methods suited to specific acquisition settings or application domains rather than relying on single-axis surveys.
  • Linking FER explicitly to the wider FAR domain suggests possible transfer of techniques between categorical expression recognition and dimensional or action-unit estimation tasks.

Load-bearing premise

The assumption that the chosen papers and seven-axis taxonomy together give complete, unbiased coverage of the field without missing important works or creating selection bias.

What would settle it

Discovery of a substantial set of recent deep-learning FER papers or major in-the-wild datasets whose methods or evaluation protocols fall outside all seven taxonomy axes.

Figures

Figures reproduced from arXiv: 2606.08612 by Aggelos Psiris, Georgios Th. Papadopoulos, Iraklis Varlamis, Panagiotis Sarigiannidis, Spyridon Evangelatos, Spyridon Georgiou, Thomas Lagkas, Vasileios Argyriou.

Figure 1
Figure 1. Figure 1: Key bibliometric analytics regarding the deep learning-based FER literature: a) Article types, and b) Top-15 most [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Main phases in deep-learning-based facial expression recognition research and key/milestone works. Research has [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Key criteria and main resulting categories of deep learning-based facial expression recognition methods. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative literature methods per recognition task: a) Categorical macro-FER (POSTER++ [ [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative literature methods per input modality: a) Static 2D RGB image (MHAN [ [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative literature methods incorporating different NN architecture types: a) CNNs (FLEPNet [ [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
read the original abstract

Facial Expression Recognition (FER) has advanced rapidly over the last decade, driven by the shift from handcrafted descriptors and shallow classifiers to deep convolutional, attention-based, vision-language, and foundation-model architectures, and by the parallel growth of large-scale in-the-wild benchmarks spanning categorical, dimensional, compound, micro-expression, Action Unit (AU), and intensity-estimation tasks. Yet the deep learning-based FER landscape has so far been reviewed only along narrow task-, architecture-, or application-specific axes, leaving a holistic, systematically organized account of its recent advances missing. This survey addresses that gap with a comprehensive review of recent deep learning-based FER, explicitly linked to the wider Facial Affect Recognition (FAR) domain. Its main contributions are: a) A description of FER's evolution into five distinct phases, from handcrafted features and classical machine learning to attention-based, vision-language, and foundation-model approaches, with the key milestone works of each, b) A multi-criteria taxonomy analyzing the literature along seven complementary axes: recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, and application domain, c) A per-criterion comparative analysis, with critical insights into the strengths and limitations of each category under in-the-wild conditions, d) A task-organized review of public FER datasets, with their annotation schemes, modalities, and evaluation protocols, e) A compilation of performance metrics and a per-task quantitative comparison of representative state-of-the-art methods on widely adopted benchmarks, and f) A discussion of current challenges and promising future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims to fill a gap in the literature by providing a comprehensive, systematically organized review of recent deep learning-based Facial Expression Recognition (FER), linked to the broader Facial Affect Recognition domain. Its contributions include: (a) an evolution of FER into five phases with milestone works; (b) a multi-criteria taxonomy along seven axes (recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, application domain); (c) per-criterion comparative analysis under in-the-wild conditions; (d) a task-organized review of public datasets with annotation schemes and protocols; (e) performance metrics and quantitative comparisons of SOTA methods on benchmarks; and (f) discussion of challenges and future directions.

Significance. If the literature coverage proves complete and unbiased, the survey would offer a useful holistic reference that consolidates advances across tasks, architectures, and settings while providing quantitative benchmark comparisons. The seven-axis taxonomy and explicit linkage to FAR could help organize an otherwise fragmented field.

major comments (1)
  1. [Abstract / Contribution list] Abstract, listed contributions (a)–(f): The central claim that the survey delivers a 'comprehensive review' and 'systematically organized account' of the DL-based FER landscape is load-bearing on the completeness and lack of selection bias in the literature search. No search protocol (databases, keywords, date ranges, inclusion/exclusion rules, or assignment procedure to the seven axes) is described anywhere in the listed contributions or abstract, rendering the taxonomy coverage and dataset review untestable for omissions or overlap.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback highlighting the need for explicit documentation of the literature search process. We agree this is essential for a systematic review and will revise the manuscript accordingly to address the concern.

read point-by-point responses
  1. Referee: [Abstract / Contribution list] Abstract, listed contributions (a)–(f): The central claim that the survey delivers a 'comprehensive review' and 'systematically organized account' of the DL-based FER landscape is load-bearing on the completeness and lack of selection bias in the literature search. No search protocol (databases, keywords, date ranges, inclusion/exclusion rules, or assignment procedure to the seven axes) is described anywhere in the listed contributions or abstract, rendering the taxonomy coverage and dataset review untestable for omissions or overlap.

    Authors: We acknowledge that the current manuscript does not describe the search protocol, which is a standard requirement for systematic reviews to ensure transparency and allow evaluation of potential bias or omissions. In the revised version, we will add a dedicated 'Literature Search Methodology' subsection (likely in Section 2 or as a new Section 3) that explicitly details: the databases queried (e.g., IEEE Xplore, ACM DL, Scopus, Google Scholar, arXiv), the search keywords and Boolean combinations used, the date range (focusing on the deep learning era from ~2014 onward), inclusion/exclusion criteria (e.g., peer-reviewed papers on DL-based FER with quantitative results), the total number of papers screened and retained, and the procedure for mapping papers to the seven taxonomy axes. This addition will directly support the claims of comprehensiveness in the abstract and contributions list (a)–(f). revision: yes

Circularity Check

0 steps flagged

No circularity: survey aggregates external results without derivations or fitted quantities

full rationale

This paper is a literature survey whose contributions consist of describing FER evolution, proposing a seven-axis taxonomy, reviewing datasets, compiling performance metrics from published works, and discussing challenges. No equations, predictions, or first-principles derivations exist that could reduce to inputs by construction. All cited results originate from external papers; the taxonomy and organization are descriptive categorizations rather than self-definitional or fitted claims. The absence of any load-bearing self-citation chain or ansatz means the report remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no new mathematical derivations, fitted parameters, or postulated entities. All content is drawn from previously published papers.

pith-pipeline@v0.9.1-grok · 5873 in / 1105 out tokens · 15141 ms · 2026-06-27T18:48:15.124716+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

300 extracted references · 5 linked inside Pith

  1. [1]

    Constants across cultures in the face and emotion

    P. Ekman and W. V . Friesen, “Constants across cultures in the face and emotion.”Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971. 48

  2. [2]

    Ekman and E

    P. Ekman and E. L. Rosenberg, Eds.,What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). USA: Oxford University Press, 1997

  3. [3]

    Compound facial expressions of emotion,

    S. Du, Y . Tao, and A. M. Martinez, “Compound facial expressions of emotion,”Proceedings of the national academy of sciences, vol. 111, no. 15, pp. E1454– E1462, 2014

  4. [4]

    Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,

    S. Li and W. Deng, “Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,” International Journal of Computer Vision, vol. 127, no. 6, pp. 884–906, 2019

  5. [5]

    Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,

    A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017

  6. [6]

    Aff-wild2: Extending the aff-wild database for affect recognition,

    D. Kollias and S. Zafeiriou, “Aff-wild2: Extending the aff-wild database for affect recognition,”arXiv preprint arXiv:1811.07770, 2018

  7. [7]

    Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,

    E. Sariyanidi, H. Gunes, and A. Cavallaro, “Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,”IEEE transactions on pat- tern analysis and machine intelligence, vol. 37, no. 6, pp. 1113–1133, 2014

  8. [8]

    Toward machine emotional intelligence: Analysis of affective physiolog- ical state,

    R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: Analysis of affective physiolog- ical state,”IEEE transactions on pattern analysis and machine intelligence, vol. 23, no. 10, pp. 1175–1191, 2001

  9. [9]

    Artificial emotional intelli- gence: Conventional and deep learning approach,

    H. Kumar and A. Martin, “Artificial emotional intelli- gence: Conventional and deep learning approach,”Ex- pert Systems with Applications, vol. 212, p. 118651, 2023

  10. [10]

    Deep learning for micro-expression recognition: A survey,

    Y . Li, J. Wei, Y . Liu, J. Kauttonen, and G. Zhao, “Deep learning for micro-expression recognition: A survey,” IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2028–2046, 2022

  11. [11]

    Deep facial expression recognition: A survey,

    S. Li and W. Deng, “Deep facial expression recognition: A survey,”IEEE transactions on affective computing, vol. 13, no. 3, pp. 1195–1215, 2022

  12. [12]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

  13. [13]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  14. [14]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural infor- mation processing systems, vol. 30, 2017

  15. [15]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

  16. [16]

    A survey on facial emotion recognition techniques: A state-of-the-art literature review,

    F. Z. Canal, T. R. M ¨uller, J. C. Matias, G. G. Scotton, A. R. de Sa Junior, E. Pozzebon, and A. C. Sobieranski, “A survey on facial emotion recognition techniques: A state-of-the-art literature review,”Information Sciences, vol. 582, pp. 593–617, 2022

  17. [17]

    Multiview facial expression recognition, a survey,

    M. Jampour and M. Javidi, “Multiview facial expression recognition, a survey,”IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2086–2105, 2022

  18. [18]

    Graph-based facial affect analysis: A review,

    Y . Liu, X. Zhang, J. Zhou, X. Li, Y . Li, and G. Zhao, “Graph-based facial affect analysis: A review,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 2657–2677, 2023

  19. [19]

    A comprehensive re- view of facial expression recognition techniques,

    R. R. Adyapady and B. Annappa, “A comprehensive re- view of facial expression recognition techniques,”Mul- timedia Systems, vol. 29, no. 1, pp. 73–103, 2023

  20. [20]

    Driver’s facial expression recognition: A comprehen- sive survey,

    I. Saadi, A. Taleb-Ahmed, A. Hadid, Y . El Hillaliet al., “Driver’s facial expression recognition: A comprehen- sive survey,”Expert Systems with Applications, vol. 242, p. 122784, 2024

  21. [21]

    Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,

    T. Kopalidis, V . Solachidis, N. Vretos, and P. Daras, “Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,”Infor- mation, vol. 15, no. 3, p. 135, 2024

  22. [22]

    A survey on facial expression recognition of static and dynamic emo- tions,

    Y . Wang, S. Yan, Y . Liu, W. Song, J. Liu, Y . Chang, X. Mai, X. Hu, W. Zhang, and Z. Gan, “A survey on facial expression recognition of static and dynamic emo- tions,”arXiv preprint arXiv:2408.15777, 2024

  23. [23]

    Facial action coding system: a technique for the measurement of facial movement,

    E. Friesen and P. Ekman, “Facial action coding system: a technique for the measurement of facial movement,” Palo Alto, vol. 3, no. 2, p. 5, 1978

  24. [24]

    The japanese female facial expression (jaffe) database,

    M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” inProceedings of third international conference on automatic face and gesture recognition, 1998, pp. 14–16

  25. [25]

    The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,

    P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,” in2010 ieee computer society conference on computer vision and pattern recognition- workshops. IEEE, 2010, pp. 94–101

  26. [26]

    A 3d facial expression database for facial behavior research,

    L. Yin, X. Wei, Y . Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in7th international conference on automatic face and gesture recognition (FGR06). IEEE, 2006, pp. 211–216

  27. [27]

    Facial expres- sion recognition based on local binary patterns: A com- prehensive study,

    C. Shan, S. Gong, and P. W. McOwan, “Facial expres- sion recognition based on local binary patterns: A com- prehensive study,”Image and vision Computing, vol. 27, no. 6, pp. 803–816, 2009

  28. [28]

    A spontaneous micro-expression database: Induce- ment, collection and baseline,

    X. Li, T. Pfister, X. Huang, G. Zhao, and M. Pietik¨ainen, “A spontaneous micro-expression database: Induce- ment, collection and baseline,” in2013 10th IEEE Inter- national Conference and Workshops on Automatic face and gesture recognition (fg). IEEE, 2013, pp. 1–6

  29. [29]

    Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,

    W.-J. Yan, X. Li, S.-J. Wang, G. Zhao, Y .-J. Liu, Y .-H. Chen, and X. Fu, “Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,” PloS one, vol. 9, no. 1, p. e86041, 2014

  30. [30]

    Challenges in representation learning: A report on three machine learning contests,

    I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y . Tang, D. Thaler, D.-H. Leeet al., “Challenges in representation learning: A report on three machine learning contests,” inInter- national conference on neural information processing. Springer, 2013, pp. 117–124

  31. [31]

    Training deep networks for facial expression recog- nition with crowd-sourced label distribution,

    E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recog- nition with crowd-sourced label distribution,” inPro- ceedings of the 18th ACM international conference on multimodal interaction, 2016, pp. 279–283

  32. [32]

    Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,

    S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,” inProceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 2852–2861

  33. [33]

    Deep region and multi-label learning for facial action unit detection,

    K. Zhao, W.-S. Chu, and H. Zhang, “Deep region and multi-label learning for facial action unit detection,” in Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition, 2016, pp. 3391–3399

  34. [34]

    Eac-net: Deep nets with enhancing and cropping for facial action unit detection,

    W. Li, F. Abtahi, Z. Zhu, and L. Yin, “Eac-net: Deep nets with enhancing and cropping for facial action unit detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 11, pp. 2583–2596, 2018

  35. [35]

    Sup- pressing uncertainties for large-scale facial expression recognition,

    K. Wang, X. Peng, J. Yang, S. Lu, and Y . Qiao, “Sup- pressing uncertainties for large-scale facial expression recognition,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2020, pp. 6897–6906

  36. [36]

    Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,

    X. Jiang, Y . Zong, W. Zheng, C. Tang, W. Xia, C. Lu, and J. Liu, “Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,” inPro- ceedings of the 28th ACM international conference on multimedia, 2020, pp. 2881–2889

  37. [37]

    Transfer: Learning relation-aware facial expression representations with transformers,

    F. Xue, Q. Wang, and G. Guo, “Transfer: Learning relation-aware facial expression representations with transformers,” inProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2021, pp. 3601– 3610

  38. [38]

    Facial expression recog- nition with visual transformers and attentional selec- tive fusion,

    F. Ma, B. Sun, and S. Li, “Facial expression recog- nition with visual transformers and attentional selec- tive fusion,”IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1236–1248, 2021

  39. [39]

    Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,

    C. Zheng, M. Mendieta, and C. Chen, “Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 3146–3155

  40. [40]

    4dme: A spontaneous 4d micro-expression dataset with multimodalities,

    X. Li, S. Cheng, Y . Li, M. Behzad, J. Shen, S. Zafeiriou, M. Pantic, and G. Zhao, “4dme: A spontaneous 4d micro-expression dataset with multimodalities,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 3031–3047, 2022

  41. [41]

    Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,

    Y . Wang, Y . Sun, Y . Huang, Z. Liu, S. Gao, W. Zhang, W. Ge, and W. Zhang, “Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 922–20 931

  42. [42]

    Multimodal prompt alignment for facial expression recognition,

    F. Ma, Y . He, B. Sun, and S. Li, “Multimodal prompt alignment for facial expression recognition,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 12 581–12 591

  43. [43]

    Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,

    K. Chumachenko, A. Iosifidis, and M. Gabbouj, “Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4673–4682

  44. [44]

    Libreface: An open-source toolkit for deep facial ex- pression analysis,

    D. Chang, Y . Yin, Z. Li, M. Tran, and M. Soleymani, “Libreface: An open-source toolkit for deep facial ex- pression analysis,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 8205–8215

  45. [45]

    Htnet for micro-expression recognition,

    Z. Wang, K. Zhang, W. Luo, and R. Sankaranarayana, “Htnet for micro-expression recognition,”Neurocom- puting, vol. 602, p. 128196, 2024

  46. [46]

    Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,

    Z. Cheng, Z.-Q. Cheng, J.-Y . He, J. Sun, K. Wang, Y . Lin, Z. Lian, X. Peng, and A. Hauptmann, “Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,”Advances in Neural Informa- tion Processing Systems, vol. 37, pp. 110 805–110 853, 2024

  47. [47]

    Emo-llama: Enhancing facial emotion understanding with instruction tuning,

    B. Xing, Z. Yu, X. Liu, K. Yuan, Q. Ye, W. Xie, H. Yue, J. Yang, and H. K ¨alvi¨ainen, “Emo-llama: Enhancing facial emotion understanding with instruction tuning,” arXiv preprint arXiv:2408.11424, 2024

  48. [48]

    Facellm: A multimodal large language model for face understanding,

    H. O. Shahreza and S. Marcel, “Facellm: A multimodal large language model for face understanding,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 3677– 3687

  49. [49]

    Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,

    A. Li, L. Xu, C. Ling, J. Zhang, and P. Wang, “Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,”Neurocom- puting, vol. 650, p. 130810, 2025

  50. [50]

    Poster++: A simpler and stronger facial expression recognition network,

    J. Mao, R. Xu, X. Yin, Y . Chang, B. Nie, A. Huang, and Y . Wang, “Poster++: A simpler and stronger facial expression recognition network,”Pattern Recognition, vol. 157, p. 110951, 2025

  51. [51]

    Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,

    Y . Liang, Z. Wang, F. Liu, M. Liu, and Y . Yao, “Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,” inProceedings of the Computer Vision and Pattern Recognition Con- ference, 2025, pp. 5651–5656

  52. [52]

    Facexformer: A unified transformer for facial anal- ysis,

    K. Narayan, V . VS, R. Chellappa, and V . M. Patel, “Facexformer: A unified transformer for facial anal- ysis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 11 369– 11 382

  53. [53]

    Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,

    Z. Shao, Y . Cheng, F. Li, Y . Zhou, X. Lu, Y . Xie, and L. Ma, “Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,”IEEE Transactions on Pattern Analysis 50 and Machine Intelligence, 2025

  54. [54]

    Gradient-based learning applied to document recog- nition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recog- nition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

  55. [55]

    Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,

    A. V . Savchenko, L. V . Savchenko, and I. Makarov, “Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,”IEEE Transactions on Affective Comput- ing, vol. 13, no. 4, pp. 2132–2143, 2022

  56. [56]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735– 1780, 1997

  57. [57]

    Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,

    X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,”Image and Vision Comput- ing, vol. 32, no. 10, pp. 692–706, 2014

  58. [58]

    Disfa: A spontaneous facial action intensity database,

    S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn, “Disfa: A spontaneous facial action intensity database,”IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 151–160, 2013

  59. [59]

    Samm: A spontaneous micro-facial move- ment dataset,

    A. K. Davison, C. Lansley, N. Costen, K. Tan, and M. H. Yap, “Samm: A spontaneous micro-facial move- ment dataset,”IEEE transactions on affective comput- ing, vol. 9, no. 1, pp. 116–129, 2016

  60. [60]

    Learn from all: Erasing attention consistency for noisy label facial expression recognition,

    Y . Zhang, C. Wang, X. Ling, and W. Deng, “Learn from all: Erasing attention consistency for noisy label facial expression recognition,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 418–434

  61. [61]

    Transformer-augmented net- work with online label correction for facial expression recognition,

    F. Ma, B. Sun, and S. Li, “Transformer-augmented net- work with online label correction for facial expression recognition,”IEEE Transactions on Affective Comput- ing, vol. 15, no. 2, pp. 593–605, 2023

  62. [62]

    Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,

    D. Kollias, “Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 2328–2336

  63. [63]

    Mer-clip: Au-guided vision-language alignment for micro-expression recognition,

    S. Liu, X. Mao, S. Zhao, P. Li, T. Xu, and E. Chen, “Mer-clip: Au-guided vision-language alignment for micro-expression recognition,”IEEE Transactions on Affective Computing, 2025

  64. [64]

    Deep structured learning for facial action unit intensity estimation,

    R. Walecki, O. Rudovic, V . Pavlovic, B. Schuller, and M. Pantic, “Deep structured learning for facial action unit intensity estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2017, pp. 5709–5718

  65. [65]

    Facial expression recognition from near-infrared videos,

    G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietik ¨ainen, “Facial expression recognition from near-infrared videos,”Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011

  66. [66]

    CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,

    H. Wang, L. Zhang, G. Yang, and J. Liu, “CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,”Symmetry, vol. 17, no. 6, p. 864, 2025

  67. [67]

    Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,

    D. Kollias, P. Tzirakis, A. Baird, A. Cowen, and S. Zafeiriou, “Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5889–5898

  68. [68]

    Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,

    Y . Gu, Y . Weng, Y . Wang, M. Wang, G. Zhuang, J. Huang, X. Peng, L. Luo, and F. Ren, “Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,”IEEE Transactions on Affective Computing, vol. 15, no. 4, pp. 2112–2127, 2024

  69. [69]

    Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,

    X. Jin, J. Xiao, L. Jin, and X. Zhang, “Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,”CAAI Transactions on Intel- ligence Technology, vol. 9, no. 5, pp. 1290–1304, 2024

  70. [70]

    Joint face detection and alignment using multitask cascaded con- volutional networks,

    K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detection and alignment using multitask cascaded con- volutional networks,”IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016

  71. [71]

    RetinaFace: Single-shot multi-level face localisation in the wild,

    J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-shot multi-level face localisation in the wild,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5203–5212

  72. [72]

    One millisecond face align- ment with an ensemble of regression trees,

    V . Kazemi and J. Sullivan, “One millisecond face align- ment with an ensemble of regression trees,” inIEEE Conference on Computer Vision and Pattern Recogni- tion, 2014, pp. 1867–1874

  73. [73]

    How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),

    A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),” inIEEE International Conference on Computer Vision, 2017, pp. 1021–1030

  74. [74]

    Contrast limited adaptive histogram equalization,

    K. Zuiderveld, “Contrast limited adaptive histogram equalization,”Graphics Gems IV, pp. 474–485, 1994

  75. [75]

    AutoAugment: Learning augmentation strate- gies from data,

    E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le, “AutoAugment: Learning augmentation strate- gies from data,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 113–123

  76. [76]

    CutMix: Regularization strategy to train strong classi- fiers with localizable features,

    S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Regularization strategy to train strong classi- fiers with localizable features,” inIEEE/CVF Interna- tional Conference on Computer Vision, 2019, pp. 6023– 6032

  77. [77]

    Less is more: Micro-expression recognition from video using apex frame,

    S.-T. Liong, J. See, K. Wong, and R. C.-W. Phan, “Less is more: Micro-expression recognition from video using apex frame,”Signal Processing: Image Communication, vol. 62, pp. 82–92, 2018

  78. [78]

    Eulerian video magnification for revealing subtle changes in the world,

    H.-Y . Wu, M. Rubinstein, E. Shih, J. Guttag, F. Du- rand, and W. Freeman, “Eulerian video magnification for revealing subtle changes in the world,” inACM Transactions on Graphics, vol. 31, no. 4, 2012, pp. 1–8

  79. [79]

    Joint 3D face reconstruction and dense alignment with posi- tion map regression network,

    Y . Feng, F. Wu, X. Shao, Y . Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with posi- tion map regression network,” inEuropean Conference on Computer Vision, 2018, pp. 534–551

  80. [80]

    Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,

    A. Chaddad, Y . Wu, R. Kateb, and A. Bouridane, “Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,” Sensors, vol. 23, no. 14, p. 6434, 2023

Showing first 80 references.