Facial Expression Recognition in the Deep Learning Era: A Systematic Multi-Criteria Review of Methods, Models, Datasets, Performance, Challenges, and Future Research Directions

Aggelos Psiris; Georgios Th. Papadopoulos; Iraklis Varlamis; Panagiotis Sarigiannidis; Spyridon Evangelatos; Spyridon Georgiou; Thomas Lagkas; Vasileios Argyriou

arxiv: 2606.08612 · v1 · pith:3N3PARQ6new · submitted 2026-06-07 · 💻 cs.CV

Facial Expression Recognition in the Deep Learning Era: A Systematic Multi-Criteria Review of Methods, Models, Datasets, Performance, Challenges, and Future Research Directions

Spyridon Georgiou , Aggelos Psiris , Spyridon Evangelatos , Thomas Lagkas , Vasileios Argyriou , Panagiotis Sarigiannidis , Iraklis Varlamis , Georgios Th. Papadopoulos This is my paper

Pith reviewed 2026-06-27 18:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords Facial Expression RecognitionDeep LearningSurveyTaxonomyDatasetsPerformance EvaluationIn-the-wild ConditionsFacial Affect Recognition

0 comments

The pith

This survey organizes deep learning FER literature via a seven-axis taxonomy spanning tasks, modalities, architectures, and applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper traces facial expression recognition from early handcrafted features through five phases to current attention, vision-language, and foundation-model methods. It supplies a multi-criteria taxonomy that dissects the field along recognition task, input modality, pre-processing, network architecture, learning strategy, acquisition setting, and application domain. The review then compares representative methods on public benchmarks, catalogs datasets with their annotation schemes, and outlines open challenges. A sympathetic reader would value the structured map for locating where progress has occurred and where gaps remain in real-world conditions.

Core claim

The authors deliver a systematic review of deep learning-based facial expression recognition linked to the broader facial affect recognition domain. They describe five evolutionary phases, introduce a seven-axis taxonomy for literature analysis, provide per-criterion comparisons under in-the-wild settings, compile a task-organized dataset catalog, report quantitative performance tables for state-of-the-art models, and discuss current limitations together with future directions.

What carries the argument

A multi-criteria taxonomy that classifies the literature along seven complementary axes: recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, and application domain.

If this is right

Critical strengths and limitations of each taxonomy category become visible under in-the-wild conditions.
Public datasets receive a unified, task-organized catalog with annotation schemes and evaluation protocols.
Quantitative performance tables allow direct comparison of representative state-of-the-art methods on common benchmarks.
Open challenges and future research directions are identified from the gaps across the seven axes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could use the seven axes as a checklist when designing new models to ensure coverage of under-explored combinations such as micro-expression tasks with vision-language architectures.
The taxonomy may help practitioners select methods suited to specific acquisition settings or application domains rather than relying on single-axis surveys.
Linking FER explicitly to the wider FAR domain suggests possible transfer of techniques between categorical expression recognition and dimensional or action-unit estimation tasks.

Load-bearing premise

The assumption that the chosen papers and seven-axis taxonomy together give complete, unbiased coverage of the field without missing important works or creating selection bias.

What would settle it

Discovery of a substantial set of recent deep-learning FER papers or major in-the-wild datasets whose methods or evaluation protocols fall outside all seven taxonomy axes.

Figures

Figures reproduced from arXiv: 2606.08612 by Aggelos Psiris, Georgios Th. Papadopoulos, Iraklis Varlamis, Panagiotis Sarigiannidis, Spyridon Evangelatos, Spyridon Georgiou, Thomas Lagkas, Vasileios Argyriou.

**Figure 1.** Figure 1: Key bibliometric analytics regarding the deep learning-based FER literature: a) Article types, and b) Top-15 most [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Main phases in deep-learning-based facial expression recognition research and key/milestone works. Research has [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Key criteria and main resulting categories of deep learning-based facial expression recognition methods. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Representative literature methods per recognition task: a) Categorical macro-FER (POSTER++ [ [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Representative literature methods per input modality: a) Static 2D RGB image (MHAN [ [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Representative literature methods incorporating different NN architecture types: a) CNNs (FLEPNet [ [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Facial Expression Recognition (FER) has advanced rapidly over the last decade, driven by the shift from handcrafted descriptors and shallow classifiers to deep convolutional, attention-based, vision-language, and foundation-model architectures, and by the parallel growth of large-scale in-the-wild benchmarks spanning categorical, dimensional, compound, micro-expression, Action Unit (AU), and intensity-estimation tasks. Yet the deep learning-based FER landscape has so far been reviewed only along narrow task-, architecture-, or application-specific axes, leaving a holistic, systematically organized account of its recent advances missing. This survey addresses that gap with a comprehensive review of recent deep learning-based FER, explicitly linked to the wider Facial Affect Recognition (FAR) domain. Its main contributions are: a) A description of FER's evolution into five distinct phases, from handcrafted features and classical machine learning to attention-based, vision-language, and foundation-model approaches, with the key milestone works of each, b) A multi-criteria taxonomy analyzing the literature along seven complementary axes: recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, and application domain, c) A per-criterion comparative analysis, with critical insights into the strengths and limitations of each category under in-the-wild conditions, d) A task-organized review of public FER datasets, with their annotation schemes, modalities, and evaluation protocols, e) A compilation of performance metrics and a per-task quantitative comparison of representative state-of-the-art methods on widely adopted benchmarks, and f) A discussion of current challenges and promising future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard organizing survey of DL-based FER that adds a seven-axis taxonomy and five-phase framing to existing literature, but the missing search protocol makes the 'comprehensive' claim hard to verify.

read the letter

The paper's main contribution is pulling FER work into five historical phases and a seven-axis taxonomy covering task, modality, preprocessing, architecture, learning strategy, setting, and application. It also reviews datasets by task and compiles performance numbers from the cited papers on common benchmarks. That kind of map can be handy for someone trying to get oriented in the subfield.

It does the usual survey things competently: it links FER to the broader FAR area, breaks down strengths and limits of different categories under in-the-wild conditions, and lists challenges plus future directions. The quantitative comparisons are drawn directly from the referenced results rather than new experiments.

The soft spot is the absence of any documented literature search protocol, databases, keywords, date ranges, or inclusion rules. Without that, the claim of systematic and complete coverage stays untestable, which is a real issue for a review that positions itself as filling a holistic gap. The taxonomy axes look reasonable on paper but could have overlap or omissions that aren't checked.

This is for readers who need a reference overview rather than a new technical result. It shows clear engagement with the literature and would benefit from peer review once the methodology section is added; the current version is too thin on that point to stand as a definitive account.

Referee Report

1 major / 0 minor

Summary. The paper claims to fill a gap in the literature by providing a comprehensive, systematically organized review of recent deep learning-based Facial Expression Recognition (FER), linked to the broader Facial Affect Recognition domain. Its contributions include: (a) an evolution of FER into five phases with milestone works; (b) a multi-criteria taxonomy along seven axes (recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, application domain); (c) per-criterion comparative analysis under in-the-wild conditions; (d) a task-organized review of public datasets with annotation schemes and protocols; (e) performance metrics and quantitative comparisons of SOTA methods on benchmarks; and (f) discussion of challenges and future directions.

Significance. If the literature coverage proves complete and unbiased, the survey would offer a useful holistic reference that consolidates advances across tasks, architectures, and settings while providing quantitative benchmark comparisons. The seven-axis taxonomy and explicit linkage to FAR could help organize an otherwise fragmented field.

major comments (1)

[Abstract / Contribution list] Abstract, listed contributions (a)–(f): The central claim that the survey delivers a 'comprehensive review' and 'systematically organized account' of the DL-based FER landscape is load-bearing on the completeness and lack of selection bias in the literature search. No search protocol (databases, keywords, date ranges, inclusion/exclusion rules, or assignment procedure to the seven axes) is described anywhere in the listed contributions or abstract, rendering the taxonomy coverage and dataset review untestable for omissions or overlap.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback highlighting the need for explicit documentation of the literature search process. We agree this is essential for a systematic review and will revise the manuscript accordingly to address the concern.

read point-by-point responses

Referee: [Abstract / Contribution list] Abstract, listed contributions (a)–(f): The central claim that the survey delivers a 'comprehensive review' and 'systematically organized account' of the DL-based FER landscape is load-bearing on the completeness and lack of selection bias in the literature search. No search protocol (databases, keywords, date ranges, inclusion/exclusion rules, or assignment procedure to the seven axes) is described anywhere in the listed contributions or abstract, rendering the taxonomy coverage and dataset review untestable for omissions or overlap.

Authors: We acknowledge that the current manuscript does not describe the search protocol, which is a standard requirement for systematic reviews to ensure transparency and allow evaluation of potential bias or omissions. In the revised version, we will add a dedicated 'Literature Search Methodology' subsection (likely in Section 2 or as a new Section 3) that explicitly details: the databases queried (e.g., IEEE Xplore, ACM DL, Scopus, Google Scholar, arXiv), the search keywords and Boolean combinations used, the date range (focusing on the deep learning era from ~2014 onward), inclusion/exclusion criteria (e.g., peer-reviewed papers on DL-based FER with quantitative results), the total number of papers screened and retained, and the procedure for mapping papers to the seven taxonomy axes. This addition will directly support the claims of comprehensiveness in the abstract and contributions list (a)–(f). revision: yes

Circularity Check

0 steps flagged

No circularity: survey aggregates external results without derivations or fitted quantities

full rationale

This paper is a literature survey whose contributions consist of describing FER evolution, proposing a seven-axis taxonomy, reviewing datasets, compiling performance metrics from published works, and discussing challenges. No equations, predictions, or first-principles derivations exist that could reduce to inputs by construction. All cited results originate from external papers; the taxonomy and organization are descriptive categorizations rather than self-definitional or fitted claims. The absence of any load-bearing self-citation chain or ansatz means the report remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no new mathematical derivations, fitted parameters, or postulated entities. All content is drawn from previously published papers.

pith-pipeline@v0.9.1-grok · 5873 in / 1105 out tokens · 15141 ms · 2026-06-27T18:48:15.124716+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 5 linked inside Pith

[1]

Constants across cultures in the face and emotion

P. Ekman and W. V . Friesen, “Constants across cultures in the face and emotion.”Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971. 48

1971
[2]

Ekman and E

P. Ekman and E. L. Rosenberg, Eds.,What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). USA: Oxford University Press, 1997

1997
[3]

Compound facial expressions of emotion,

S. Du, Y . Tao, and A. M. Martinez, “Compound facial expressions of emotion,”Proceedings of the national academy of sciences, vol. 111, no. 15, pp. E1454– E1462, 2014

2014
[4]

Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,

S. Li and W. Deng, “Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,” International Journal of Computer Vision, vol. 127, no. 6, pp. 884–906, 2019

2019
[5]

Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017

2017
[6]

Aff-wild2: Extending the aff-wild database for affect recognition,

D. Kollias and S. Zafeiriou, “Aff-wild2: Extending the aff-wild database for affect recognition,”arXiv preprint arXiv:1811.07770, 2018

arXiv 2018
[7]

Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,

E. Sariyanidi, H. Gunes, and A. Cavallaro, “Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,”IEEE transactions on pat- tern analysis and machine intelligence, vol. 37, no. 6, pp. 1113–1133, 2014

2014
[8]

Toward machine emotional intelligence: Analysis of affective physiolog- ical state,

R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: Analysis of affective physiolog- ical state,”IEEE transactions on pattern analysis and machine intelligence, vol. 23, no. 10, pp. 1175–1191, 2001

2001
[9]

Artificial emotional intelli- gence: Conventional and deep learning approach,

H. Kumar and A. Martin, “Artificial emotional intelli- gence: Conventional and deep learning approach,”Ex- pert Systems with Applications, vol. 212, p. 118651, 2023

2023
[10]

Deep learning for micro-expression recognition: A survey,

Y . Li, J. Wei, Y . Liu, J. Kauttonen, and G. Zhao, “Deep learning for micro-expression recognition: A survey,” IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2028–2046, 2022

2028
[11]

Deep facial expression recognition: A survey,

S. Li and W. Deng, “Deep facial expression recognition: A survey,”IEEE transactions on affective computing, vol. 13, no. 3, pp. 1195–1215, 2022

2022
[12]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

2016
[13]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

Pith/arXiv arXiv 2010
[14]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural infor- mation processing systems, vol. 30, 2017

2017
[15]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

2021
[16]

A survey on facial emotion recognition techniques: A state-of-the-art literature review,

F. Z. Canal, T. R. M ¨uller, J. C. Matias, G. G. Scotton, A. R. de Sa Junior, E. Pozzebon, and A. C. Sobieranski, “A survey on facial emotion recognition techniques: A state-of-the-art literature review,”Information Sciences, vol. 582, pp. 593–617, 2022

2022
[17]

Multiview facial expression recognition, a survey,

M. Jampour and M. Javidi, “Multiview facial expression recognition, a survey,”IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2086–2105, 2022

2086
[18]

Graph-based facial affect analysis: A review,

Y . Liu, X. Zhang, J. Zhou, X. Li, Y . Li, and G. Zhao, “Graph-based facial affect analysis: A review,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 2657–2677, 2023

2023
[19]

A comprehensive re- view of facial expression recognition techniques,

R. R. Adyapady and B. Annappa, “A comprehensive re- view of facial expression recognition techniques,”Mul- timedia Systems, vol. 29, no. 1, pp. 73–103, 2023

2023
[20]

Driver’s facial expression recognition: A comprehen- sive survey,

I. Saadi, A. Taleb-Ahmed, A. Hadid, Y . El Hillaliet al., “Driver’s facial expression recognition: A comprehen- sive survey,”Expert Systems with Applications, vol. 242, p. 122784, 2024

2024
[21]

Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,

T. Kopalidis, V . Solachidis, N. Vretos, and P. Daras, “Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,”Infor- mation, vol. 15, no. 3, p. 135, 2024

2024
[22]

A survey on facial expression recognition of static and dynamic emo- tions,

Y . Wang, S. Yan, Y . Liu, W. Song, J. Liu, Y . Chang, X. Mai, X. Hu, W. Zhang, and Z. Gan, “A survey on facial expression recognition of static and dynamic emo- tions,”arXiv preprint arXiv:2408.15777, 2024

arXiv 2024
[23]

Facial action coding system: a technique for the measurement of facial movement,

E. Friesen and P. Ekman, “Facial action coding system: a technique for the measurement of facial movement,” Palo Alto, vol. 3, no. 2, p. 5, 1978

1978
[24]

The japanese female facial expression (jaffe) database,

M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” inProceedings of third international conference on automatic face and gesture recognition, 1998, pp. 14–16

1998
[25]

The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,” in2010 ieee computer society conference on computer vision and pattern recognition- workshops. IEEE, 2010, pp. 94–101

2010
[26]

A 3d facial expression database for facial behavior research,

L. Yin, X. Wei, Y . Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in7th international conference on automatic face and gesture recognition (FGR06). IEEE, 2006, pp. 211–216

2006
[27]

Facial expres- sion recognition based on local binary patterns: A com- prehensive study,

C. Shan, S. Gong, and P. W. McOwan, “Facial expres- sion recognition based on local binary patterns: A com- prehensive study,”Image and vision Computing, vol. 27, no. 6, pp. 803–816, 2009

2009
[28]

A spontaneous micro-expression database: Induce- ment, collection and baseline,

X. Li, T. Pfister, X. Huang, G. Zhao, and M. Pietik¨ainen, “A spontaneous micro-expression database: Induce- ment, collection and baseline,” in2013 10th IEEE Inter- national Conference and Workshops on Automatic face and gesture recognition (fg). IEEE, 2013, pp. 1–6

2013
[29]

Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,

W.-J. Yan, X. Li, S.-J. Wang, G. Zhao, Y .-J. Liu, Y .-H. Chen, and X. Fu, “Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,” PloS one, vol. 9, no. 1, p. e86041, 2014

2014
[30]

Challenges in representation learning: A report on three machine learning contests,

I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y . Tang, D. Thaler, D.-H. Leeet al., “Challenges in representation learning: A report on three machine learning contests,” inInter- national conference on neural information processing. Springer, 2013, pp. 117–124

2013
[31]

Training deep networks for facial expression recog- nition with crowd-sourced label distribution,

E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recog- nition with crowd-sourced label distribution,” inPro- ceedings of the 18th ACM international conference on multimodal interaction, 2016, pp. 279–283

2016
[32]

Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,

S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,” inProceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 2852–2861

2017
[33]

Deep region and multi-label learning for facial action unit detection,

K. Zhao, W.-S. Chu, and H. Zhang, “Deep region and multi-label learning for facial action unit detection,” in Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition, 2016, pp. 3391–3399

2016
[34]

Eac-net: Deep nets with enhancing and cropping for facial action unit detection,

W. Li, F. Abtahi, Z. Zhu, and L. Yin, “Eac-net: Deep nets with enhancing and cropping for facial action unit detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 11, pp. 2583–2596, 2018

2018
[35]

Sup- pressing uncertainties for large-scale facial expression recognition,

K. Wang, X. Peng, J. Yang, S. Lu, and Y . Qiao, “Sup- pressing uncertainties for large-scale facial expression recognition,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2020, pp. 6897–6906

2020
[36]

Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,

X. Jiang, Y . Zong, W. Zheng, C. Tang, W. Xia, C. Lu, and J. Liu, “Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,” inPro- ceedings of the 28th ACM international conference on multimedia, 2020, pp. 2881–2889

2020
[37]

Transfer: Learning relation-aware facial expression representations with transformers,

F. Xue, Q. Wang, and G. Guo, “Transfer: Learning relation-aware facial expression representations with transformers,” inProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2021, pp. 3601– 3610

2021
[38]

Facial expression recog- nition with visual transformers and attentional selec- tive fusion,

F. Ma, B. Sun, and S. Li, “Facial expression recog- nition with visual transformers and attentional selec- tive fusion,”IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1236–1248, 2021

2021
[39]

Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,

C. Zheng, M. Mendieta, and C. Chen, “Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 3146–3155

2023
[40]

4dme: A spontaneous 4d micro-expression dataset with multimodalities,

X. Li, S. Cheng, Y . Li, M. Behzad, J. Shen, S. Zafeiriou, M. Pantic, and G. Zhao, “4dme: A spontaneous 4d micro-expression dataset with multimodalities,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 3031–3047, 2022

2022
[41]

Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,

Y . Wang, Y . Sun, Y . Huang, Z. Liu, S. Gao, W. Zhang, W. Ge, and W. Zhang, “Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 922–20 931

2022
[42]

Multimodal prompt alignment for facial expression recognition,

F. Ma, Y . He, B. Sun, and S. Li, “Multimodal prompt alignment for facial expression recognition,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 12 581–12 591

2025
[43]

Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,

K. Chumachenko, A. Iosifidis, and M. Gabbouj, “Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4673–4682

2024
[44]

Libreface: An open-source toolkit for deep facial ex- pression analysis,

D. Chang, Y . Yin, Z. Li, M. Tran, and M. Soleymani, “Libreface: An open-source toolkit for deep facial ex- pression analysis,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 8205–8215

2024
[45]

Htnet for micro-expression recognition,

Z. Wang, K. Zhang, W. Luo, and R. Sankaranarayana, “Htnet for micro-expression recognition,”Neurocom- puting, vol. 602, p. 128196, 2024

2024
[46]

Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,

Z. Cheng, Z.-Q. Cheng, J.-Y . He, J. Sun, K. Wang, Y . Lin, Z. Lian, X. Peng, and A. Hauptmann, “Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,”Advances in Neural Informa- tion Processing Systems, vol. 37, pp. 110 805–110 853, 2024

2024
[47]

Emo-llama: Enhancing facial emotion understanding with instruction tuning,

B. Xing, Z. Yu, X. Liu, K. Yuan, Q. Ye, W. Xie, H. Yue, J. Yang, and H. K ¨alvi¨ainen, “Emo-llama: Enhancing facial emotion understanding with instruction tuning,” arXiv preprint arXiv:2408.11424, 2024

arXiv 2024
[48]

Facellm: A multimodal large language model for face understanding,

H. O. Shahreza and S. Marcel, “Facellm: A multimodal large language model for face understanding,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 3677– 3687

2025
[49]

Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,

A. Li, L. Xu, C. Ling, J. Zhang, and P. Wang, “Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,”Neurocom- puting, vol. 650, p. 130810, 2025

2025
[50]

Poster++: A simpler and stronger facial expression recognition network,

J. Mao, R. Xu, X. Yin, Y . Chang, B. Nie, A. Huang, and Y . Wang, “Poster++: A simpler and stronger facial expression recognition network,”Pattern Recognition, vol. 157, p. 110951, 2025

2025
[51]

Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,

Y . Liang, Z. Wang, F. Liu, M. Liu, and Y . Yao, “Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,” inProceedings of the Computer Vision and Pattern Recognition Con- ference, 2025, pp. 5651–5656

2025
[52]

Facexformer: A unified transformer for facial anal- ysis,

K. Narayan, V . VS, R. Chellappa, and V . M. Patel, “Facexformer: A unified transformer for facial anal- ysis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 11 369– 11 382

2025
[53]

Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,

Z. Shao, Y . Cheng, F. Li, Y . Zhou, X. Lu, Y . Xie, and L. Ma, “Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,”IEEE Transactions on Pattern Analysis 50 and Machine Intelligence, 2025

2025
[54]

Gradient-based learning applied to document recog- nition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recog- nition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

2002
[55]

Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,

A. V . Savchenko, L. V . Savchenko, and I. Makarov, “Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,”IEEE Transactions on Affective Comput- ing, vol. 13, no. 4, pp. 2132–2143, 2022

2022
[56]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735– 1780, 1997

1997
[57]

Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,

X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,”Image and Vision Comput- ing, vol. 32, no. 10, pp. 692–706, 2014

2014
[58]

Disfa: A spontaneous facial action intensity database,

S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn, “Disfa: A spontaneous facial action intensity database,”IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 151–160, 2013

2013
[59]

Samm: A spontaneous micro-facial move- ment dataset,

A. K. Davison, C. Lansley, N. Costen, K. Tan, and M. H. Yap, “Samm: A spontaneous micro-facial move- ment dataset,”IEEE transactions on affective comput- ing, vol. 9, no. 1, pp. 116–129, 2016

2016
[60]

Learn from all: Erasing attention consistency for noisy label facial expression recognition,

Y . Zhang, C. Wang, X. Ling, and W. Deng, “Learn from all: Erasing attention consistency for noisy label facial expression recognition,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 418–434

2022
[61]

Transformer-augmented net- work with online label correction for facial expression recognition,

F. Ma, B. Sun, and S. Li, “Transformer-augmented net- work with online label correction for facial expression recognition,”IEEE Transactions on Affective Comput- ing, vol. 15, no. 2, pp. 593–605, 2023

2023
[62]

Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,

D. Kollias, “Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 2328–2336

2022
[63]

Mer-clip: Au-guided vision-language alignment for micro-expression recognition,

S. Liu, X. Mao, S. Zhao, P. Li, T. Xu, and E. Chen, “Mer-clip: Au-guided vision-language alignment for micro-expression recognition,”IEEE Transactions on Affective Computing, 2025

2025
[64]

Deep structured learning for facial action unit intensity estimation,

R. Walecki, O. Rudovic, V . Pavlovic, B. Schuller, and M. Pantic, “Deep structured learning for facial action unit intensity estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2017, pp. 5709–5718

2017
[65]

Facial expression recognition from near-infrared videos,

G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietik ¨ainen, “Facial expression recognition from near-infrared videos,”Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011

2011
[66]

CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,

H. Wang, L. Zhang, G. Yang, and J. Liu, “CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,”Symmetry, vol. 17, no. 6, p. 864, 2025

2025
[67]

Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,

D. Kollias, P. Tzirakis, A. Baird, A. Cowen, and S. Zafeiriou, “Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5889–5898

2023
[68]

Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,

Y . Gu, Y . Weng, Y . Wang, M. Wang, G. Zhuang, J. Huang, X. Peng, L. Luo, and F. Ren, “Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,”IEEE Transactions on Affective Computing, vol. 15, no. 4, pp. 2112–2127, 2024

2024
[69]

Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,

X. Jin, J. Xiao, L. Jin, and X. Zhang, “Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,”CAAI Transactions on Intel- ligence Technology, vol. 9, no. 5, pp. 1290–1304, 2024

2024
[70]

Joint face detection and alignment using multitask cascaded con- volutional networks,

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detection and alignment using multitask cascaded con- volutional networks,”IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016

2016
[71]

RetinaFace: Single-shot multi-level face localisation in the wild,

J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-shot multi-level face localisation in the wild,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5203–5212

2020
[72]

One millisecond face align- ment with an ensemble of regression trees,

V . Kazemi and J. Sullivan, “One millisecond face align- ment with an ensemble of regression trees,” inIEEE Conference on Computer Vision and Pattern Recogni- tion, 2014, pp. 1867–1874

2014
[73]

How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),

A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),” inIEEE International Conference on Computer Vision, 2017, pp. 1021–1030

2017
[74]

Contrast limited adaptive histogram equalization,

K. Zuiderveld, “Contrast limited adaptive histogram equalization,”Graphics Gems IV, pp. 474–485, 1994

1994
[75]

AutoAugment: Learning augmentation strate- gies from data,

E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le, “AutoAugment: Learning augmentation strate- gies from data,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 113–123

2019
[76]

CutMix: Regularization strategy to train strong classi- fiers with localizable features,

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Regularization strategy to train strong classi- fiers with localizable features,” inIEEE/CVF Interna- tional Conference on Computer Vision, 2019, pp. 6023– 6032

2019
[77]

Less is more: Micro-expression recognition from video using apex frame,

S.-T. Liong, J. See, K. Wong, and R. C.-W. Phan, “Less is more: Micro-expression recognition from video using apex frame,”Signal Processing: Image Communication, vol. 62, pp. 82–92, 2018

2018
[78]

Eulerian video magnification for revealing subtle changes in the world,

H.-Y . Wu, M. Rubinstein, E. Shih, J. Guttag, F. Du- rand, and W. Freeman, “Eulerian video magnification for revealing subtle changes in the world,” inACM Transactions on Graphics, vol. 31, no. 4, 2012, pp. 1–8

2012
[79]

Joint 3D face reconstruction and dense alignment with posi- tion map regression network,

Y . Feng, F. Wu, X. Shao, Y . Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with posi- tion map regression network,” inEuropean Conference on Computer Vision, 2018, pp. 534–551

2018
[80]

Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,

A. Chaddad, Y . Wu, R. Kateb, and A. Bouridane, “Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,” Sensors, vol. 23, no. 14, p. 6434, 2023

2023

Showing first 80 references.

[1] [1]

Constants across cultures in the face and emotion

P. Ekman and W. V . Friesen, “Constants across cultures in the face and emotion.”Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971. 48

1971

[2] [2]

Ekman and E

P. Ekman and E. L. Rosenberg, Eds.,What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). USA: Oxford University Press, 1997

1997

[3] [3]

Compound facial expressions of emotion,

S. Du, Y . Tao, and A. M. Martinez, “Compound facial expressions of emotion,”Proceedings of the national academy of sciences, vol. 111, no. 15, pp. E1454– E1462, 2014

2014

[4] [4]

Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,

S. Li and W. Deng, “Blended emotion in-the-wild: Multi-label facial expression recognition using crowd- sourced annotations and deep locality feature learning,” International Journal of Computer Vision, vol. 127, no. 6, pp. 884–906, 2019

2019

[5] [5]

Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017

2017

[6] [6]

Aff-wild2: Extending the aff-wild database for affect recognition,

D. Kollias and S. Zafeiriou, “Aff-wild2: Extending the aff-wild database for affect recognition,”arXiv preprint arXiv:1811.07770, 2018

arXiv 2018

[7] [7]

Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,

E. Sariyanidi, H. Gunes, and A. Cavallaro, “Automatic analysis of facial affect: A survey of registration, repre- sentation, and recognition,”IEEE transactions on pat- tern analysis and machine intelligence, vol. 37, no. 6, pp. 1113–1133, 2014

2014

[8] [8]

Toward machine emotional intelligence: Analysis of affective physiolog- ical state,

R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intelligence: Analysis of affective physiolog- ical state,”IEEE transactions on pattern analysis and machine intelligence, vol. 23, no. 10, pp. 1175–1191, 2001

2001

[9] [9]

Artificial emotional intelli- gence: Conventional and deep learning approach,

H. Kumar and A. Martin, “Artificial emotional intelli- gence: Conventional and deep learning approach,”Ex- pert Systems with Applications, vol. 212, p. 118651, 2023

2023

[10] [10]

Deep learning for micro-expression recognition: A survey,

Y . Li, J. Wei, Y . Liu, J. Kauttonen, and G. Zhao, “Deep learning for micro-expression recognition: A survey,” IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2028–2046, 2022

2028

[11] [11]

Deep facial expression recognition: A survey,

S. Li and W. Deng, “Deep facial expression recognition: A survey,”IEEE transactions on affective computing, vol. 13, no. 3, pp. 1195–1215, 2022

2022

[12] [12]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

2016

[13] [13]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

Pith/arXiv arXiv 2010

[14] [14]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural infor- mation processing systems, vol. 30, 2017

2017

[15] [15]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

2021

[16] [16]

A survey on facial emotion recognition techniques: A state-of-the-art literature review,

F. Z. Canal, T. R. M ¨uller, J. C. Matias, G. G. Scotton, A. R. de Sa Junior, E. Pozzebon, and A. C. Sobieranski, “A survey on facial emotion recognition techniques: A state-of-the-art literature review,”Information Sciences, vol. 582, pp. 593–617, 2022

2022

[17] [17]

Multiview facial expression recognition, a survey,

M. Jampour and M. Javidi, “Multiview facial expression recognition, a survey,”IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2086–2105, 2022

2086

[18] [18]

Graph-based facial affect analysis: A review,

Y . Liu, X. Zhang, J. Zhou, X. Li, Y . Li, and G. Zhao, “Graph-based facial affect analysis: A review,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 2657–2677, 2023

2023

[19] [19]

A comprehensive re- view of facial expression recognition techniques,

R. R. Adyapady and B. Annappa, “A comprehensive re- view of facial expression recognition techniques,”Mul- timedia Systems, vol. 29, no. 1, pp. 73–103, 2023

2023

[20] [20]

Driver’s facial expression recognition: A comprehen- sive survey,

I. Saadi, A. Taleb-Ahmed, A. Hadid, Y . El Hillaliet al., “Driver’s facial expression recognition: A comprehen- sive survey,”Expert Systems with Applications, vol. 242, p. 122784, 2024

2024

[21] [21]

Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,

T. Kopalidis, V . Solachidis, N. Vretos, and P. Daras, “Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets,”Infor- mation, vol. 15, no. 3, p. 135, 2024

2024

[22] [22]

A survey on facial expression recognition of static and dynamic emo- tions,

Y . Wang, S. Yan, Y . Liu, W. Song, J. Liu, Y . Chang, X. Mai, X. Hu, W. Zhang, and Z. Gan, “A survey on facial expression recognition of static and dynamic emo- tions,”arXiv preprint arXiv:2408.15777, 2024

arXiv 2024

[23] [23]

Facial action coding system: a technique for the measurement of facial movement,

E. Friesen and P. Ekman, “Facial action coding system: a technique for the measurement of facial movement,” Palo Alto, vol. 3, no. 2, p. 5, 1978

1978

[24] [24]

The japanese female facial expression (jaffe) database,

M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” inProceedings of third international conference on automatic face and gesture recognition, 1998, pp. 14–16

1998

[25] [25]

The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion- specified expression,” in2010 ieee computer society conference on computer vision and pattern recognition- workshops. IEEE, 2010, pp. 94–101

2010

[26] [26]

A 3d facial expression database for facial behavior research,

L. Yin, X. Wei, Y . Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in7th international conference on automatic face and gesture recognition (FGR06). IEEE, 2006, pp. 211–216

2006

[27] [27]

Facial expres- sion recognition based on local binary patterns: A com- prehensive study,

C. Shan, S. Gong, and P. W. McOwan, “Facial expres- sion recognition based on local binary patterns: A com- prehensive study,”Image and vision Computing, vol. 27, no. 6, pp. 803–816, 2009

2009

[28] [28]

A spontaneous micro-expression database: Induce- ment, collection and baseline,

X. Li, T. Pfister, X. Huang, G. Zhao, and M. Pietik¨ainen, “A spontaneous micro-expression database: Induce- ment, collection and baseline,” in2013 10th IEEE Inter- national Conference and Workshops on Automatic face and gesture recognition (fg). IEEE, 2013, pp. 1–6

2013

[29] [29]

Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,

W.-J. Yan, X. Li, S.-J. Wang, G. Zhao, Y .-J. Liu, Y .-H. Chen, and X. Fu, “Casme ii: An improved spontaneous 49 micro-expression database and the baseline evaluation,” PloS one, vol. 9, no. 1, p. e86041, 2014

2014

[30] [30]

Challenges in representation learning: A report on three machine learning contests,

I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y . Tang, D. Thaler, D.-H. Leeet al., “Challenges in representation learning: A report on three machine learning contests,” inInter- national conference on neural information processing. Springer, 2013, pp. 117–124

2013

[31] [31]

Training deep networks for facial expression recog- nition with crowd-sourced label distribution,

E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recog- nition with crowd-sourced label distribution,” inPro- ceedings of the 18th ACM international conference on multimodal interaction, 2016, pp. 279–283

2016

[32] [32]

Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,

S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recog- nition in the wild,” inProceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 2852–2861

2017

[33] [33]

Deep region and multi-label learning for facial action unit detection,

K. Zhao, W.-S. Chu, and H. Zhang, “Deep region and multi-label learning for facial action unit detection,” in Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition, 2016, pp. 3391–3399

2016

[34] [34]

Eac-net: Deep nets with enhancing and cropping for facial action unit detection,

W. Li, F. Abtahi, Z. Zhu, and L. Yin, “Eac-net: Deep nets with enhancing and cropping for facial action unit detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 11, pp. 2583–2596, 2018

2018

[35] [35]

Sup- pressing uncertainties for large-scale facial expression recognition,

K. Wang, X. Peng, J. Yang, S. Lu, and Y . Qiao, “Sup- pressing uncertainties for large-scale facial expression recognition,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2020, pp. 6897–6906

2020

[36] [36]

Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,

X. Jiang, Y . Zong, W. Zheng, C. Tang, W. Xia, C. Lu, and J. Liu, “Dfew: A large-scale database for recog- nizing dynamic facial expressions in the wild,” inPro- ceedings of the 28th ACM international conference on multimedia, 2020, pp. 2881–2889

2020

[37] [37]

Transfer: Learning relation-aware facial expression representations with transformers,

F. Xue, Q. Wang, and G. Guo, “Transfer: Learning relation-aware facial expression representations with transformers,” inProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2021, pp. 3601– 3610

2021

[38] [38]

Facial expression recog- nition with visual transformers and attentional selec- tive fusion,

F. Ma, B. Sun, and S. Li, “Facial expression recog- nition with visual transformers and attentional selec- tive fusion,”IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1236–1248, 2021

2021

[39] [39]

Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,

C. Zheng, M. Mendieta, and C. Chen, “Poster: A pyra- mid cross-fusion transformer network for facial expres- sion recognition,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2023, pp. 3146–3155

2023

[40] [40]

4dme: A spontaneous 4d micro-expression dataset with multimodalities,

X. Li, S. Cheng, Y . Li, M. Behzad, J. Shen, S. Zafeiriou, M. Pantic, and G. Zhao, “4dme: A spontaneous 4d micro-expression dataset with multimodalities,”IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 3031–3047, 2022

2022

[41] [41]

Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,

Y . Wang, Y . Sun, Y . Huang, Z. Liu, S. Gao, W. Zhang, W. Ge, and W. Zhang, “Ferv39k: A large-scale multi- scene dataset for facial expression recognition in videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 922–20 931

2022

[42] [42]

Multimodal prompt alignment for facial expression recognition,

F. Ma, Y . He, B. Sun, and S. Li, “Multimodal prompt alignment for facial expression recognition,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 12 581–12 591

2025

[43] [43]

Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,

K. Chumachenko, A. Iosifidis, and M. Gabbouj, “Mma- dfer: Multimodal adaptation of unimodal models for dynamic facial expression recognition in-the-wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4673–4682

2024

[44] [44]

Libreface: An open-source toolkit for deep facial ex- pression analysis,

D. Chang, Y . Yin, Z. Li, M. Tran, and M. Soleymani, “Libreface: An open-source toolkit for deep facial ex- pression analysis,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 8205–8215

2024

[45] [45]

Htnet for micro-expression recognition,

Z. Wang, K. Zhang, W. Luo, and R. Sankaranarayana, “Htnet for micro-expression recognition,”Neurocom- puting, vol. 602, p. 128196, 2024

2024

[46] [46]

Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,

Z. Cheng, Z.-Q. Cheng, J.-Y . He, J. Sun, K. Wang, Y . Lin, Z. Lian, X. Peng, and A. Hauptmann, “Emotion- llama: Multimodal emotion recognition and reasoning with instruction tuning,”Advances in Neural Informa- tion Processing Systems, vol. 37, pp. 110 805–110 853, 2024

2024

[47] [47]

Emo-llama: Enhancing facial emotion understanding with instruction tuning,

B. Xing, Z. Yu, X. Liu, K. Yuan, Q. Ye, W. Xie, H. Yue, J. Yang, and H. K ¨alvi¨ainen, “Emo-llama: Enhancing facial emotion understanding with instruction tuning,” arXiv preprint arXiv:2408.11424, 2024

arXiv 2024

[48] [48]

Facellm: A multimodal large language model for face understanding,

H. O. Shahreza and S. Marcel, “Facellm: A multimodal large language model for face understanding,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 3677– 3687

2025

[49] [49]

Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,

A. Li, L. Xu, C. Ling, J. Zhang, and P. Wang, “Emo- verse: Enhancing multimodal large language models for affective computing via multitask learning,”Neurocom- puting, vol. 650, p. 130810, 2025

2025

[50] [50]

Poster++: A simpler and stronger facial expression recognition network,

J. Mao, R. Xu, X. Yin, Y . Chang, B. Nie, A. Huang, and Y . Wang, “Poster++: A simpler and stronger facial expression recognition network,”Pattern Recognition, vol. 157, p. 110951, 2025

2025

[51] [51]

Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,

Y . Liang, Z. Wang, F. Liu, M. Liu, and Y . Yao, “Mamba- va: A mamba-based approach for continuous emotion recognition in valence-arousal space,” inProceedings of the Computer Vision and Pattern Recognition Con- ference, 2025, pp. 5651–5656

2025

[52] [52]

Facexformer: A unified transformer for facial anal- ysis,

K. Narayan, V . VS, R. Chellappa, and V . M. Patel, “Facexformer: A unified transformer for facial anal- ysis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 11 369– 11 382

2025

[53] [53]

Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,

Z. Shao, Y . Cheng, F. Li, Y . Zhou, X. Lu, Y . Xie, and L. Ma, “Mol: Joint estimation of micro-expression, optical flow, and landmark via transformer-graph-style convolution,”IEEE Transactions on Pattern Analysis 50 and Machine Intelligence, 2025

2025

[54] [54]

Gradient-based learning applied to document recog- nition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recog- nition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

2002

[55] [55]

Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,

A. V . Savchenko, L. V . Savchenko, and I. Makarov, “Classifying emotions and engagement in online learn- ing based on a single facial expression recognition neu- ral network,”IEEE Transactions on Affective Comput- ing, vol. 13, no. 4, pp. 2132–2143, 2022

2022

[56] [56]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735– 1780, 1997

1997

[57] [57]

Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,

X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d- spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,”Image and Vision Comput- ing, vol. 32, no. 10, pp. 692–706, 2014

2014

[58] [58]

Disfa: A spontaneous facial action intensity database,

S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn, “Disfa: A spontaneous facial action intensity database,”IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 151–160, 2013

2013

[59] [59]

Samm: A spontaneous micro-facial move- ment dataset,

A. K. Davison, C. Lansley, N. Costen, K. Tan, and M. H. Yap, “Samm: A spontaneous micro-facial move- ment dataset,”IEEE transactions on affective comput- ing, vol. 9, no. 1, pp. 116–129, 2016

2016

[60] [60]

Learn from all: Erasing attention consistency for noisy label facial expression recognition,

Y . Zhang, C. Wang, X. Ling, and W. Deng, “Learn from all: Erasing attention consistency for noisy label facial expression recognition,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 418–434

2022

[61] [61]

Transformer-augmented net- work with online label correction for facial expression recognition,

F. Ma, B. Sun, and S. Li, “Transformer-augmented net- work with online label correction for facial expression recognition,”IEEE Transactions on Affective Comput- ing, vol. 15, no. 2, pp. 593–605, 2023

2023

[62] [62]

Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,

D. Kollias, “Abaw: Valence-arousal estimation, expres- sion recognition, action unit detection & multi-task learning challenges,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 2328–2336

2022

[63] [63]

Mer-clip: Au-guided vision-language alignment for micro-expression recognition,

S. Liu, X. Mao, S. Zhao, P. Li, T. Xu, and E. Chen, “Mer-clip: Au-guided vision-language alignment for micro-expression recognition,”IEEE Transactions on Affective Computing, 2025

2025

[64] [64]

Deep structured learning for facial action unit intensity estimation,

R. Walecki, O. Rudovic, V . Pavlovic, B. Schuller, and M. Pantic, “Deep structured learning for facial action unit intensity estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2017, pp. 5709–5718

2017

[65] [65]

Facial expression recognition from near-infrared videos,

G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietik ¨ainen, “Facial expression recognition from near-infrared videos,”Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011

2011

[66] [66]

CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,

H. Wang, L. Zhang, G. Yang, and J. Liu, “CTIFERK: A thermal infrared facial expression recognition model with Kolmogorov–Arnold networks for smart class- rooms,”Symmetry, vol. 17, no. 6, p. 864, 2025

2025

[67] [67]

Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,

D. Kollias, P. Tzirakis, A. Baird, A. Cowen, and S. Zafeiriou, “Abaw: Valence-arousal estimation, ex- pression recognition, action unit detection & emotional reaction intensity estimation challenges,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5889–5898

2023

[68] [68]

Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,

Y . Gu, Y . Weng, Y . Wang, M. Wang, G. Zhuang, J. Huang, X. Peng, L. Luo, and F. Ren, “Emotake: Exploring drivers’ emotion for takeover behavior pre- diction,”IEEE Transactions on Affective Computing, vol. 15, no. 4, pp. 2112–2127, 2024

2024

[69] [69]

Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,

X. Jin, J. Xiao, L. Jin, and X. Zhang, “Residual multi- modal transformer for expression-EEG fusion continu- ous emotion recognition,”CAAI Transactions on Intel- ligence Technology, vol. 9, no. 5, pp. 1290–1304, 2024

2024

[70] [70]

Joint face detection and alignment using multitask cascaded con- volutional networks,

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detection and alignment using multitask cascaded con- volutional networks,”IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016

2016

[71] [71]

RetinaFace: Single-shot multi-level face localisation in the wild,

J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-shot multi-level face localisation in the wild,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5203–5212

2020

[72] [72]

One millisecond face align- ment with an ensemble of regression trees,

V . Kazemi and J. Sullivan, “One millisecond face align- ment with an ensemble of regression trees,” inIEEE Conference on Computer Vision and Pattern Recogni- tion, 2014, pp. 1867–1874

2014

[73] [73]

How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),

A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks),” inIEEE International Conference on Computer Vision, 2017, pp. 1021–1030

2017

[74] [74]

Contrast limited adaptive histogram equalization,

K. Zuiderveld, “Contrast limited adaptive histogram equalization,”Graphics Gems IV, pp. 474–485, 1994

1994

[75] [75]

AutoAugment: Learning augmentation strate- gies from data,

E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le, “AutoAugment: Learning augmentation strate- gies from data,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 113–123

2019

[76] [76]

CutMix: Regularization strategy to train strong classi- fiers with localizable features,

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Regularization strategy to train strong classi- fiers with localizable features,” inIEEE/CVF Interna- tional Conference on Computer Vision, 2019, pp. 6023– 6032

2019

[77] [77]

Less is more: Micro-expression recognition from video using apex frame,

S.-T. Liong, J. See, K. Wong, and R. C.-W. Phan, “Less is more: Micro-expression recognition from video using apex frame,”Signal Processing: Image Communication, vol. 62, pp. 82–92, 2018

2018

[78] [78]

Eulerian video magnification for revealing subtle changes in the world,

H.-Y . Wu, M. Rubinstein, E. Shih, J. Guttag, F. Du- rand, and W. Freeman, “Eulerian video magnification for revealing subtle changes in the world,” inACM Transactions on Graphics, vol. 31, no. 4, 2012, pp. 1–8

2012

[79] [79]

Joint 3D face reconstruction and dense alignment with posi- tion map regression network,

Y . Feng, F. Wu, X. Shao, Y . Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with posi- tion map regression network,” inEuropean Conference on Computer Vision, 2018, pp. 534–551

2018

[80] [80]

Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,

A. Chaddad, Y . Wu, R. Kateb, and A. Bouridane, “Elec- troencephalography signal processing: A comprehen- sive review and analysis of methods and techniques,” Sensors, vol. 23, no. 14, p. 6434, 2023

2023