A cross-modal network for facial expression recognition

Chao Li; Chunwei Tian; Jingyuan Xie; Qi Zhang; Shichao Zhang; Wangmeng Zuo

arxiv: 2605.04439 · v1 · submitted 2026-05-06 · 💻 cs.CV

A cross-modal network for facial expression recognition

Chunwei Tian , Jingyuan Xie , Qi Zhang , Chao Li , Wangmeng Zuo , Shichao Zhang This is my paper

Pith reviewed 2026-05-08 18:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords facial expression recognitioncross-modal networkface symmetryfeature fusiondeep neural networksalient refinementhalf-face alignmentexpression classification

0 comments

The pith

CMNet recognizes facial expressions by combining symmetric features from whole and half faces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents CMNet, a cross-modal network for facial expression recognition that uses face symmetry to learn from a whole face as well as its left and right halves. The goal is to extract complementary features that capture both biological and structural aspects of expressions. A refinement module selects salient information to avoid instability when fusing these features. A separate alignment step ensures that the left and right half-face features correspond properly. The authors report that this design allows CMNet to perform better than prior methods including SCN and LAENet-SA.

Core claim

CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Experimental results demonstrate that CMNet outperforms SCN and LAENet-SA for facial expression recognition.

What carries the argument

Cross-modal network (CMNet) with salient facial information refinement module and half-face alignment optimization mechanism that processes whole-face and half-face inputs symmetrically to extract complementary features.

Load-bearing premise

That fusing biological and structural information from whole and half faces via the salient facial information refinement module and half-face alignment optimization mechanism does not produce negative effects and instead improves stability and performance of the obtained facial expression classifier.

What would settle it

A direct comparison on standard facial expression benchmarks showing that CMNet does not exceed the accuracy of SCN or LAENet-SA would falsify the outperformance claim.

Figures

Figures reproduced from arXiv: 2605.04439 by Chao Li, Chunwei Tian, Jingyuan Xie, Qi Zhang, Shichao Zhang, Wangmeng Zuo.

**Figure 1.** Figure 1: The architecture of the proposed CMNet for facial expression recognition. view at source ↗

**Figure 2.** Figure 2: That is, firstly, a division method of central point view at source ↗

**Figure 3.** Figure 3: Part facial images with seven emotions from view at source ↗

**Figure 4.** Figure 4: Part facial images with seven emotions from view at source ↗

**Figure 5.** Figure 5: Part facial images with eight emotions from AffectNet dataset. For context-sensitive scenes, CAER-S dataset [37] and SFEW 2.0 dataset [41] are used to conduct comparative experiments to test robustness of different methods for emotion recognition in a dynamic and context-sensitive environment in this paper. Specifically, CAER-S dataset was created by selecting static images from video clips in the CAER dat… view at source ↗

**Figure 8.** Figure 8: The attention visualization result generated by Grad view at source ↗

**Figure 9.** Figure 9: The accuracy on RAF-DB using different values of view at source ↗

**Figure 10.** Figure 10: Confusion matrix of our CMNet for cross-database: view at source ↗

read the original abstract

Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Our experimental results demonstrate that CMNet outperforms several novel methods, i.e., SCN and LAENet-SA for facial expression recognition. Codes can be obtained at https://github.com/hellloxiaotian/CMNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CMNet adds a symmetry-based cross-modal design with refinement and alignment modules for facial expression recognition, but the abstract leaves the performance claims unverified.

read the letter

CMNet processes whole faces along with left and right half faces separately to pull out symmetric expression cues, then adds a salient refinement module and a half-face alignment mechanism to combine the information without negative side effects. The specific combination of these pieces is new relative to the cited priors like SCN and LAENet-SA. The design shows some care in trying to use biological face properties rather than just deeper layers or standard attention. The code link is also a plus for anyone who wants to check the implementation. The main weakness is that the abstract gives no datasets, no accuracy numbers, no ablation tables, and no training details. Without those, it is impossible to know whether the claimed gains over SCN and LAENet-SA come from the new modules or from other uncontrolled factors such as backbone choice or parameter count. The stress-test point about needing ablations to confirm the refinement and alignment steps actually prevent negative fusion is still open; end-to-end wins alone do not settle it. This paper is for people already working on facial expression recognition who track incremental architecture ideas in that subfield. A reader looking for symmetry or cross-view fusion tricks might pick up a usable idea if the full experiments hold up. It has a clear enough proposal and a code release to deserve a serious referee rather than a desk reject, even if the review will likely ask for the missing ablations and numbers.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes CMNet, a cross-modal network for facial expression recognition that processes whole-face, left-half-face, and right-half-face inputs to exploit symmetry for complementary expression features. It introduces a salient facial information refinement module to extract salient information and avoid negative fusion effects, plus a half-face alignment optimization mechanism to align half-face features and reduce unilateral reliance. The central empirical claim is that CMNet outperforms SCN and LAENet-SA.

Significance. If substantiated by controlled experiments, the incorporation of explicit biological priors (symmetry) and structural fusion mechanisms could offer a practical route to more stable FER models. The public code release aids reproducibility.

major comments (3)

[Abstract] Abstract: the claim that CMNet 'outperforms several novel methods, i.e., SCN and LAENet-SA' is presented without any mention of datasets, training protocols, ablation studies, statistical tests, or error bars, so it is impossible to attribute gains to the proposed modules rather than uncontrolled factors.
[Method] Method section (salient facial information refinement module): the assertion that this module 'can obtain salient facial expression information to improve stability' and 'prevent negative effect of biological and structural information fusion' is load-bearing for the central claim, yet no ablation (full CMNet vs. variant lacking the module) or capacity-matched baseline is reported.
[Method] Method section (half-face alignment optimization mechanism): the claim that the mechanism 'align[s] obtained expression information of learned left and right half faces' and thereby reduces unilateral reliance lacks supporting controlled experiments that would demonstrate it mitigates negative fusion rather than simply adding parameters.

minor comments (1)

[Abstract] Abstract: the phrasing 'CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces' is unclear and should be reworded for precision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the abstract and experimental validation can be strengthened for clarity and rigor. We will revise the manuscript accordingly by expanding the abstract with experimental details and adding targeted ablation studies for the proposed modules.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that CMNet 'outperforms several novel methods, i.e., SCN and LAENet-SA' is presented without any mention of datasets, training protocols, ablation studies, statistical tests, or error bars, so it is impossible to attribute gains to the proposed modules rather than uncontrolled factors.

Authors: We agree that the abstract should provide more context to support the performance claim. In the revised version, we will update the abstract to explicitly mention the datasets (RAF-DB and FER2013), training protocols (including data augmentation and optimization details), reference to ablation studies in Section 4, and note that results include error bars with statistical significance testing. These details are already present in the experimental section and will now be summarized in the abstract to better attribute improvements to the cross-modal design and modules. revision: yes
Referee: [Method] Method section (salient facial information refinement module): the assertion that this module 'can obtain salient facial expression information to improve stability' and 'prevent negative effect of biological and structural information fusion' is load-bearing for the central claim, yet no ablation (full CMNet vs. variant lacking the module) or capacity-matched baseline is reported.

Authors: We acknowledge that a direct ablation isolating the salient facial information refinement module would provide stronger evidence. The current manuscript demonstrates overall superiority over SCN and LAENet-SA, but to directly address this point we will add a new ablation study in the revised experiments section: comparing full CMNet against a variant without the refinement module, plus a capacity-matched baseline (e.g., by adjusting channel dimensions to equalize parameters). This will quantify the module's contribution to stability and negative fusion prevention. revision: yes
Referee: [Method] Method section (half-face alignment optimization mechanism): the claim that the mechanism 'align[s] obtained expression information of learned left and right half faces' and thereby reduces unilateral reliance lacks supporting controlled experiments that would demonstrate it mitigates negative fusion rather than simply adding parameters.

Authors: We agree that controlled experiments are necessary to isolate the effect of the half-face alignment optimization mechanism. While the overall results support reduced unilateral reliance through the cross-modal design, we will add an ablation in the revision: full CMNet versus a variant without the alignment mechanism, including metrics on feature alignment (e.g., cosine similarity between left/right features) and performance under asymmetric conditions. This will demonstrate that the mechanism mitigates negative fusion beyond mere parameter addition. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture validated by external comparisons

full rationale

The paper proposes CMNet, a cross-modal network using whole-face and half-face symmetry to extract complementary features, with two custom modules (salient facial information refinement and half-face alignment optimization) to mitigate fusion issues. All load-bearing claims are supported by end-to-end experimental accuracy gains against SCN and LAENet-SA on standard benchmarks. No equations, fitted parameters renamed as predictions, self-citations forming uniqueness theorems, or ansatzes smuggled via prior work appear in the derivation chain. The architecture choices are presented as design decisions justified by biological intuition and then tested empirically, with no reduction of outputs to inputs by construction. This is a standard empirical DL proposal whose validity hinges on reproducible experiments rather than internal self-reference.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard deep-learning assumptions plus domain-specific priors about facial symmetry and the benefit of multi-view fusion; no machine-checked proofs or parameter-free derivations are present. The new modules are architectural inventions whose value is asserted via experiments.

free parameters (1)

Module design choices and hyperparameters
Standard deep network weights and architectural decisions (layer sizes, fusion weights, alignment parameters) are learned or chosen to fit the data.

axioms (2)

domain assumption Facial expressions are reliably encoded in symmetric and half-face structural information
Invoked when the network is designed to extract complementary features from whole, left, and right faces.
domain assumption Fusing multi-view facial features improves classifier stability when properly refined
Core premise behind the salient refinement and alignment modules.

invented entities (2)

Salient facial information refinement module no independent evidence
purpose: To extract salient expression information and prevent negative effects from information fusion
New module introduced to improve stability of the classifier.
Half-face alignment optimization mechanism no independent evidence
purpose: To align learned expression information from left and right half faces and reduce unilateral reliance
New mechanism proposed to balance half-face contributions.

pith-pipeline@v0.9.0 · 5470 in / 1585 out tokens · 47196 ms · 2026-05-08T18:36:35.087556+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost (Jcost = ½(x + x⁻¹) − 1) Cost.washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L_sl = (2/NC) Σ (x_l − x_r)^2 ... α is set to 0.9 ... balance two losses
Foundation.AlphaCoordinateFixation (RS chain has zero adjustable parameters) alphaCoordinateFixationCert unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

α is set to 0.9 in this paper ... initial learning rate of 0.01 ... batch size of 32

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 1 internal anchor

[1]

Expression systems: Editorial overview,

A. R. Shatzman, “Expression systems: Editorial overview,” Curr . Opin. Biotechnol. , vol. 4, no. 5, pp. 517–519, 1993

work page 1993
[2]

Predicting personalized image emotion perceptions in social networks,

S. Zhao, H. Y ao, Y . Gao, G. Ding, and T.-S. Chua, “Predicting personalized image emotion perceptions in social networks,” IEEE Trans. Affective Comput., vol. 9, no. 4, pp. 526–540, 2016

work page 2016
[3]

To- ward label-efﬁcient emotion and sentiment analysis,

S. Zhao, X. Hong, J. Y ang, Y . Zhao, and G. Ding, “To- ward label-efﬁcient emotion and sentiment analysis,” Proc. IEEE , vol. 111, no. 10, pp. 1159–1197, 2023

work page 2023
[4]

Constants across cultures in the face and emotion.,

P . Ekman and W. V . Friesen, “Constants across cultures in the face and emotion.,” J. Pers. Soc. Psychol., vol. 17, no. 2, p. 124, 1971

work page 1971
[5]

Attention mechanisms in computer vision: A survey,

M.-H. Guo et al., “Attention mechanisms in computer vision: A survey,” Comput. Visual Media , vol. 8, no. 3, pp. 331–368, 2022

work page 2022
[6]

Region attention networks for pose and occlusion ro- bust facial expression recognition,

K. Wang, X. Peng, J. Y ang, D. Meng, and Y . Qiao, “Region attention networks for pose and occlusion ro- bust facial expression recognition,” IEEE Trans. Image Process., vol. 29, pp. 4057–4069, 2020

work page 2020
[7]

Light attention embedding for facial expression recognition,

C. Wang, J. Xue, K. Lu, and Y . Y an, “Light attention embedding for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol. , vol. 32, no. 4, pp. 1834–1847, 2021

work page 2021
[8]

Facial expression recogni- tion in the wild via deep attentive center loss,

A. H. Farzaneh and X. Qi, “Facial expression recogni- tion in the wild via deep attentive center loss,” in Proc. IEEE Winter Conf. Comput. Vis. Appl. (WACV) , Virtual, Jan. 2021, pp. 2402–2411

work page 2021
[9]

Learning deep global multi-scale and local attention features for facial ex- pression recognition in the wild,

Z. Zhao, Q. Liu, and S. Wang, “Learning deep global multi-scale and local attention features for facial ex- pression recognition in the wild,” IEEE Trans. Image Process., vol. 30, pp. 6544–6556, 2021

work page 2021
[10]

Occlusion aware facial expression recognition using CNN with attention mechanism,

Y . Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process. , vol. 28, no. 5, pp. 2439–2450, 2018

work page 2018
[11]

Affective image content analysis: Two decades review and new perspectives,

S. Zhao et al., “Affective image content analysis: Two decades review and new perspectives,” IEEE Trans. Pat- tern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6729–6751, 2021

work page 2021
[12]

Coding facial expressions with gabor wavelets,

M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), 1998, pp. 200–205

work page 1998
[13]

Ica and gabor representation for facial expression recognition,

I. Buciu, I. Pitas, et al., “Ica and gabor representation for facial expression recognition,” in Proc. Int. Conf. Image Process. (ICIP) , vol. 2, 2003, pp. II–855

work page 2003
[14]

Sparse representation for accurate classi- ﬁcation of corrupted and occluded facial expressions,

S. F. Cotter, “Sparse representation for accurate classi- ﬁcation of corrupted and occluded facial expressions,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2010, pp. 838–841

work page 2010
[15]

Accurate and robust facial expressions recognition by fusing multiple sparse representation based classiﬁers,

Y . Ouyang, N. Sang, and R. Huang, “Accurate and robust facial expressions recognition by fusing multiple sparse representation based classiﬁers,” Neurocomput- ing, vol. 149, pp. 71–78, 2015

work page 2015
[16]

Selective transfer machine for personalized facial expression anal- ysis,

W.-S. Chu, F. De la Torre, and J. F. Cohn, “Selective transfer machine for personalized facial expression anal- ysis,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 3, pp. 529–545, 2016

work page 2016
[17]

Two- dimensional discriminant multi-manifolds locality pre- serving projection for facial expression recognition,

N. Zheng, X. Guo, L. Qi, and L. Guan, “Two- dimensional discriminant multi-manifolds locality pre- serving projection for facial expression recognition,” in Proc. Int. Symp. Circuits Syst. (ISCAS) , 2015, pp. 2065– 2068

work page 2015
[18]

Facial expression recognition using distance and shape signature features,

A. Barman and P . Dutta, “Facial expression recognition using distance and shape signature features,” Pattern Recognit. Lett. , vol. 145, pp. 254–261, 2021

work page 2021
[19]

A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines,

M. Sajjad et al., “A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines,” Alexandria Eng. J. , vol. 68, pp. 817–840, 2023

work page 2023
[20]

Adaptive weighting of handcrafted feature losses for facial expression recog- nition,

W. Xie, L. Shen, and J. Duan, “Adaptive weighting of handcrafted feature losses for facial expression recog- nition,” IEEE Trans. Cybern. , vol. 51, no. 5, pp. 2787– 2800, 2019

work page 2019
[21]

La-net: Landmark-aware learning for reliable facial expression recognition under label noise,

Z. Wu and J. Cui, “La-net: Landmark-aware learning for reliable facial expression recognition under label noise,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , 2023, pp. 20 698–20 707

work page 2023
[22]

A perception cnn for facial expression recognition,

C. Tian, J. Xie, L. Li, W. Zuo, Y . Zhang, and D. Zhang, “A perception cnn for facial expression recognition,” IEEE Trans. Image Process. , vol. 34, pp. 8101–8113, 2025

work page 2025
[23]

Fa- cial expression recognition through cross-modality at- tention fusion,

R. Ni, B. Y ang, X. Zhou, A. Cangelosi, and X. Liu, “Fa- cial expression recognition through cross-modality at- tention fusion,” IEEE Trans. Cognit. Dev. Syst. , vol. 15, no. 1, pp. 175–185, 2022

work page 2022
[24]

Feature decomposition and reconstruction learning for effective facial expression recognition,

D. Ruan, Y . Y an, S. Lai, Z. Chai, C. Shen, and H. Wang, “Feature decomposition and reconstruction learning for effective facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 7660–7669

work page 2021
[25]

FERMixNet: An occlusion robust facial expression recognition model with facial mixing augmentation and mid-level representation learning,

Y . Huang et al., “FERMixNet: An occlusion robust facial expression recognition model with facial mixing augmentation and mid-level representation learning,” IEEE Trans. Affective Comput. , 2024

work page 2024
[26]

Learning informative and discriminative features for facial expression recognition in the wild,

Y . Li et al., “Learning informative and discriminative features for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3178–3189, 2021

work page 2021
[27]

Cmdvit: A voluntary facial expression recognition model for complex mental disorders,

J. Y e et al., “Cmdvit: A voluntary facial expression recognition model for complex mental disorders,” IEEE Trans. Image Process. , 2025

work page 2025
[28]

Co-attentive multi-task convolu- tional neural network for facial expression recognition,

W. Y u and H. Xu, “Co-attentive multi-task convolu- tional neural network for facial expression recognition,” Pattern Recognit., vol. 123, p. 108 401, 2022

work page 2022
[29]

JADFER: Exploring spatial-contextual interaction with joint attention dropping for facial ex- 12 pression recognition,

Y . Gao et al., “JADFER: Exploring spatial-contextual interaction with joint attention dropping for facial ex- 12 pression recognition,” IEEE Trans. Affective Comput. , 2024

work page 2024
[30]

Mhan: Multi-head hybrid attention net- work for facial expression recognition,

X. Wang et al., “Mhan: Multi-head hybrid attention net- work for facial expression recognition,” Pattern Recog- nit., vol. 170, p. 112 015, 2026

work page 2026
[31]

Multi- relations aware network for in-the-wild facial expres- sion recognition,

D. Chen, G. Wen, H. Li, R. Chen, and C. Li, “Multi- relations aware network for in-the-wild facial expres- sion recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 8, pp. 3848–3859, 2023

work page 2023
[32]

Relation-aware facial expression recognition,

Y . Xia, H. Y u, X. Wang, M. Jian, and F.-Y . Wang, “Relation-aware facial expression recognition,” IEEE Trans. Cognit. Dev. Syst., vol. 14, no. 3, pp. 1143–1154, 2021

work page 2021
[33]

Adaptive multilayer perceptual attention network for facial ex- pression recognition,

H. Liu, H. Cai, Q. Lin, X. Li, and H. Xiao, “Adaptive multilayer perceptual attention network for facial ex- pression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9, pp. 6253–6266, 2022

work page 2022
[34]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Las V egas, Nevada, USA, Jun. 2016, pp. 770–778

work page 2016
[35]

CBAM: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in 2018 Proc. Eur . Conf. Comput. Vis. (ECCV) , Berlin, Heidelberg: Springer-V erlag, 2018, pp. 3–19

work page 2018
[36]

Probabilistic interpretation of feedforward classiﬁcation network outputs, with relationships to statistical pattern recognition,

J. S. Bridle, “Probabilistic interpretation of feedforward classiﬁcation network outputs, with relationships to statistical pattern recognition,” in Neurocomputing, F. F. Soulié and J. Hérault, Eds., Berlin, Heidelberg, 1990, pp. 227–236, ISBN : 978-3-642-76153-9

work page 1990
[37]

Context- aware emotion recognition networks,

J. Lee, S. Kim, S. Kim, J. Park, and K. Sohn, “Context- aware emotion recognition networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , Seoul, South Korea, 2019, pp. 10 143–10 152

work page 2019
[38]

Challenges in representation learning: A report on three machine learning contests,

I. J. Goodfellow et al., “Challenges in representation learning: A report on three machine learning contests,” Neural Networks , pp. 117–124, 2013

work page 2013
[39]

Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,

S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Honolulu, Hawaii, USA, Jun. 2017, pp. 2852–2861

work page 2017
[40]

Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affective Comput., vol. 10, no. 1, pp. 18–31, 2017

work page 2017
[41]

Video and image based emotion recogni- tion challenges in the wild: Emotiw 2015,

A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon, “Video and image based emotion recogni- tion challenges in the wild: Emotiw 2015,” in Proc. ACM Int. Conf. Multimodal Interaction ACM ICMI , Seattle, Washington, USA, Nov. 2015, pp. 423–426

work page 2015
[42]

Acted facial expressions in the wild database,

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Acted facial expressions in the wild database,” ANU Tech. Rep. TR-CS-11, vol. 2, no. 1, 2011

work page 2011
[43]

Ms- celeb-1m: A dataset and benchmark for large-scale face recognition,

Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao, “Ms- celeb-1m: A dataset and benchmark for large-scale face recognition,” in Proc. Eur . Conf. Comput. Vis. (ECCV) , Amsterdam, the Netherlands, Oct. 2016, pp. 87–102

work page 2016
[44]

Retinaface: Single-shot multi-level face localisation in the wild,

J. Deng, J. Guo, E. V erveras, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2020, pp. 5203–5212

work page 2020
[45]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014
[46]

Distract your attention: Multi-head cross attention network for facial expression recognition,

Z. Wen, W. Lin, T. Wang, and G. Xu, “Distract your attention: Multi-head cross attention network for facial expression recognition,” Biomimetics, vol. 8, no. 2, p. 199, 2023

work page 2023
[47]

A stochastic approximation method,

H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat. , pp. 400–407, 1951

work page 1951
[48]

Gradient-based learning applied to document recog- nition,

Y . LeCun, L. Bottou, Y . Bengio, and P . Haffner, “Gradient-based learning applied to document recog- nition,” Proc. IEEE , vol. 86, no. 11, pp. 2278–2324, 2002

work page 2002
[49]

Using the original and symmetrical facetraining samples to perform representation based two-step face recogni- tion,

Y . Xu, X. Zhu, Z. Li, G. Liu, Y . Lu, and H. Liu, “Using the original and symmetrical facetraining samples to perform representation based two-step face recogni- tion,” Pattern Recognit., vol. 46, no. 4, pp. 1151–1158, 2013

work page 2013
[50]

Grad-cam++: Generalized gradient- based visual explanations for deep convolutional net- works,

A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: Generalized gradient- based visual explanations for deep convolutional net- works,” in Proc. IEEE Winter Conf. Comput. Vis. Appl. (WACV), Nevada, USA: IEEE, Mar. 2018, pp. 839–847

work page 2018
[51]

Pose-adaptive hi- erarchical attention network for facial expression recog- nition,

Y . Liu, J. Peng, J. Zeng, and S. Shan, “Pose-adaptive hi- erarchical attention network for facial expression recog- nition,” arXiv:1905.10059, 2019

work page arXiv 1905
[52]

Robust lightweight facial expression recognition network with label distribution training,

Z. Zhao, Q. Liu, and F. Zhou, “Robust lightweight facial expression recognition network with label distribution training,” in AAAI Conf. Artif. Intell. , Issue: 4, vol. 35, Virtual, Feb. 2021, pp. 3510–3519

work page 2021
[53]

Facial expression recognition with inconsistently annotated datasets,

J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proc. Eur . Conf. Comput. Vis. (ECCV) , Munich, Ger- many, Sep. 2018, pp. 222–237

work page 2018
[54]

Sup- pressing uncertainties for large-scale facial expression recognition,

K. Wang, X. Peng, J. Y ang, S. Lu, and Y . Qiao, “Sup- pressing uncertainties for large-scale facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2020, pp. 6897–6906

work page 2020
[55]

Cnn-based facial affect anal- ysis on mobile devices,

C. Hewitt and H. Gunes, “Cnn-based facial affect anal- ysis on mobile devices,” arXiv:1807.08775, 2018

work page arXiv 2018
[56]

FG-AGR: Fine-grained associative graph representa- tion for facial expression recognition in the wild,

C. Li, X. Li, X. Wang, D. Huang, Z. Liu, and L. Liao, “FG-AGR: Fine-grained associative graph representa- tion for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol. , vol. 34, no. 2, pp. 882–896, 2023, Publisher: IEEE

work page 2023
[57]

Efﬁcient fa- cial feature learning with wide ensemble-based con- volutional neural networks,

H. Siqueira, S. Magg, and S. Wermter, “Efﬁcient fa- cial feature learning with wide ensemble-based con- volutional neural networks,” in AAAI Conf. Artif. In- tell., Issue: 04, vol. 34, New Y ork, USA, Feb. 2020, pp. 5800–5809

work page 2020
[58]

FE-SpikeFormer: A camera-based fa- cial expression recognition method for hospital health monitoring,

Z. Dong et al., “FE-SpikeFormer: A camera-based fa- cial expression recognition method for hospital health monitoring,” IEEE J. Biomed. Health. Inf. , pp. 1–11, 2025. 13

work page 2025
[59]

Unconstrained facial expression recognition with no- reference de-elements learning,

H. Li, N. Wang, X. Y ang, X. Wang, and X. Gao, “Unconstrained facial expression recognition with no- reference de-elements learning,” IEEE Trans. Affective Comput., vol. 15, no. 1, pp. 173–185, 2024

work page 2024
[60]

Learning a facial expression embedding disentangled from identity,

W. Zhang, X. Ji, K. Chen, Y . Ding, and C. Fan, “Learning a facial expression embedding disentangled from identity,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2021, pp. 6759–6768

work page 2021
[61]

MMA Trans: Muscle movement aware representation learning for facial expression recognition via transformers,

H. Liu et al., “MMA Trans: Muscle movement aware representation learning for facial expression recognition via transformers,” IEEE Trans. Ind. Inf. , 2024

work page 2024
[62]

Learn from all: Erasing attention consistency for noisy label facial expression recognition,

Y . Zhang, C. Wang, X. Ling, and W. Deng, “Learn from all: Erasing attention consistency for noisy label facial expression recognition,” in 2022 Proc. Eur . Conf. Comput. Vis. (ECCV) , Tel Aviv, Israel, Oct. 2022, pp. 418–434

work page 2022
[63]

Face2exp: Combating data biases for facial expression recognition,

D. Zeng, Z. Lin, X. Y an, Y . Liu, F. Wang, and B. Tang, “Face2exp: Combating data biases for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , New Orleans, Louisiana, USA, Jun. 2022, pp. 20 291–20 300

work page 2022
[64]

Adap- tively learning facial expression representation via c- f labels and distillation,

H. Li, N. Wang, X. Ding, X. Y ang, and X. Gao, “Adap- tively learning facial expression representation via c- f labels and distillation,” IEEE Trans. Image Process. , vol. 30, pp. 2016–2028, 2021

work page 2016
[65]

A novel lightweight facial expression recognition network based on deep shallow network fusion and attention mechanism,

Q. Y ang, Y . He, H. Chen, Y . Wu, and Z. Rao, “A novel lightweight facial expression recognition network based on deep shallow network fusion and attention mechanism,” Algorithms, vol. 18, no. 8, 2025

work page 2025
[66]

Decoding group emotional dynamics in a web-based collaborative environment: A novel framework utiliz- ing multi-person facial expression recognition,

Q. Li, Z. Liu, Z. Zhang, Q. Wang, and M. Ma, “Decoding group emotional dynamics in a web-based collaborative environment: A novel framework utiliz- ing multi-person facial expression recognition,” Int. J. Hum.-Comput. Interact., vol. 41, no. 5, pp. 3455–3473, 2025

work page 2025
[67]

Weighted classiﬁcation of deep and traditional histogram-based features with kernel representation for robust facial expression recognition,

M. Najmabadi, M. Masoudifar, and A. Hajipour, “Weighted classiﬁcation of deep and traditional histogram-based features with kernel representation for robust facial expression recognition,” Appl. Soft Com- put., vol. 182, p. 113 630, 2025

work page 2025
[68]

Facial expression recogni- tion with visual transformers and attentional selective fusion,

F. Ma, B. Sun, and S. Li, “Facial expression recogni- tion with visual transformers and attentional selective fusion,” IEEE Trans. Affective Comput. , vol. 14, no. 2, pp. 1236–1248, 2021, Publisher: IEEE

work page 2021
[69]

Learning vision transformer with squeeze and excitation for facial expression recogni- tion,

M. Aouayeb, W. Hamidouche, C. Soladie, K. Kpalma, and R. Seguier, “Learning vision transformer with squeeze and excitation for facial expression recogni- tion,” arXiv:2107.03107, 2021

work page arXiv 2021
[70]

Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,

J. She, Y . Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2021, pp. 6248–6257

work page 2021
[71]

A novel attention residual network expression recognition method,

H. Qi, X. Zhang, Y . Shi, and X. Qi, “A novel attention residual network expression recognition method,” IEEE Access, vol. 12, pp. 24 609–24 620, 2024

work page 2024
[72]

Pose-aware facial expression recognition assisted by expression descriptions,

S. Wang, Y . Wu, Y . Chang, G. Li, and M. Mao, “Pose-aware facial expression recognition assisted by expression descriptions,” IEEE Trans. Affective Com- put., vol. 15, no. 1, pp. 241–253, 2024

work page 2024
[73]

Human emotion recognition with relational region-level analysis,

W. Li, X. Dong, and Y . Wang, “Human emotion recognition with relational region-level analysis,” IEEE Trans. Affective Comput. , vol. 14, no. 1, pp. 650–663, 2023

work page 2023
[74]

Label distribution learning on auxiliary label space graphs for facial expression recognition,

S. Chen, J. Wang, Y . Chen, Z. Shi, X. Geng, and Y . Rui, “Label distribution learning on auxiliary label space graphs for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Virtual, Jun. 2020, pp. 13 984–13 993

work page 2020
[75]

Facial expression recognition in the wild using multi-level fea- tures and attention mechanisms,

Y . Li, G. Lu, J. Li, Z. Zhang, and D. Zhang, “Facial expression recognition in the wild using multi-level fea- tures and attention mechanisms,” IEEE Trans. Affective Comput., vol. 14, no. 1, pp. 451–462, 2020, Publisher: IEEE

work page 2020
[76]

Searching for mobilenetv3,

A. Howard et al., “Searching for mobilenetv3,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , Seoul, South Korea, Oct. 2019, pp. 1314–1324. Chunwei Tian (Senior Member, IEEE) received the Ph.D. degree from Harbin Institute of Tech- nology, Harbin, China, in 2021. He is currently a Professor with the School of Computer Science and Technology, Harbin Instit...

work page 2019

[1] [1]

Expression systems: Editorial overview,

A. R. Shatzman, “Expression systems: Editorial overview,” Curr . Opin. Biotechnol. , vol. 4, no. 5, pp. 517–519, 1993

work page 1993

[2] [2]

Predicting personalized image emotion perceptions in social networks,

S. Zhao, H. Y ao, Y . Gao, G. Ding, and T.-S. Chua, “Predicting personalized image emotion perceptions in social networks,” IEEE Trans. Affective Comput., vol. 9, no. 4, pp. 526–540, 2016

work page 2016

[3] [3]

To- ward label-efﬁcient emotion and sentiment analysis,

S. Zhao, X. Hong, J. Y ang, Y . Zhao, and G. Ding, “To- ward label-efﬁcient emotion and sentiment analysis,” Proc. IEEE , vol. 111, no. 10, pp. 1159–1197, 2023

work page 2023

[4] [4]

Constants across cultures in the face and emotion.,

P . Ekman and W. V . Friesen, “Constants across cultures in the face and emotion.,” J. Pers. Soc. Psychol., vol. 17, no. 2, p. 124, 1971

work page 1971

[5] [5]

Attention mechanisms in computer vision: A survey,

M.-H. Guo et al., “Attention mechanisms in computer vision: A survey,” Comput. Visual Media , vol. 8, no. 3, pp. 331–368, 2022

work page 2022

[6] [6]

Region attention networks for pose and occlusion ro- bust facial expression recognition,

K. Wang, X. Peng, J. Y ang, D. Meng, and Y . Qiao, “Region attention networks for pose and occlusion ro- bust facial expression recognition,” IEEE Trans. Image Process., vol. 29, pp. 4057–4069, 2020

work page 2020

[7] [7]

Light attention embedding for facial expression recognition,

C. Wang, J. Xue, K. Lu, and Y . Y an, “Light attention embedding for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol. , vol. 32, no. 4, pp. 1834–1847, 2021

work page 2021

[8] [8]

Facial expression recogni- tion in the wild via deep attentive center loss,

A. H. Farzaneh and X. Qi, “Facial expression recogni- tion in the wild via deep attentive center loss,” in Proc. IEEE Winter Conf. Comput. Vis. Appl. (WACV) , Virtual, Jan. 2021, pp. 2402–2411

work page 2021

[9] [9]

Learning deep global multi-scale and local attention features for facial ex- pression recognition in the wild,

Z. Zhao, Q. Liu, and S. Wang, “Learning deep global multi-scale and local attention features for facial ex- pression recognition in the wild,” IEEE Trans. Image Process., vol. 30, pp. 6544–6556, 2021

work page 2021

[10] [10]

Occlusion aware facial expression recognition using CNN with attention mechanism,

Y . Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process. , vol. 28, no. 5, pp. 2439–2450, 2018

work page 2018

[11] [11]

Affective image content analysis: Two decades review and new perspectives,

S. Zhao et al., “Affective image content analysis: Two decades review and new perspectives,” IEEE Trans. Pat- tern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6729–6751, 2021

work page 2021

[12] [12]

Coding facial expressions with gabor wavelets,

M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), 1998, pp. 200–205

work page 1998

[13] [13]

Ica and gabor representation for facial expression recognition,

I. Buciu, I. Pitas, et al., “Ica and gabor representation for facial expression recognition,” in Proc. Int. Conf. Image Process. (ICIP) , vol. 2, 2003, pp. II–855

work page 2003

[14] [14]

Sparse representation for accurate classi- ﬁcation of corrupted and occluded facial expressions,

S. F. Cotter, “Sparse representation for accurate classi- ﬁcation of corrupted and occluded facial expressions,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2010, pp. 838–841

work page 2010

[15] [15]

Accurate and robust facial expressions recognition by fusing multiple sparse representation based classiﬁers,

Y . Ouyang, N. Sang, and R. Huang, “Accurate and robust facial expressions recognition by fusing multiple sparse representation based classiﬁers,” Neurocomput- ing, vol. 149, pp. 71–78, 2015

work page 2015

[16] [16]

Selective transfer machine for personalized facial expression anal- ysis,

W.-S. Chu, F. De la Torre, and J. F. Cohn, “Selective transfer machine for personalized facial expression anal- ysis,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 3, pp. 529–545, 2016

work page 2016

[17] [17]

Two- dimensional discriminant multi-manifolds locality pre- serving projection for facial expression recognition,

N. Zheng, X. Guo, L. Qi, and L. Guan, “Two- dimensional discriminant multi-manifolds locality pre- serving projection for facial expression recognition,” in Proc. Int. Symp. Circuits Syst. (ISCAS) , 2015, pp. 2065– 2068

work page 2015

[18] [18]

Facial expression recognition using distance and shape signature features,

A. Barman and P . Dutta, “Facial expression recognition using distance and shape signature features,” Pattern Recognit. Lett. , vol. 145, pp. 254–261, 2021

work page 2021

[19] [19]

A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines,

M. Sajjad et al., “A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines,” Alexandria Eng. J. , vol. 68, pp. 817–840, 2023

work page 2023

[20] [20]

Adaptive weighting of handcrafted feature losses for facial expression recog- nition,

W. Xie, L. Shen, and J. Duan, “Adaptive weighting of handcrafted feature losses for facial expression recog- nition,” IEEE Trans. Cybern. , vol. 51, no. 5, pp. 2787– 2800, 2019

work page 2019

[21] [21]

La-net: Landmark-aware learning for reliable facial expression recognition under label noise,

Z. Wu and J. Cui, “La-net: Landmark-aware learning for reliable facial expression recognition under label noise,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , 2023, pp. 20 698–20 707

work page 2023

[22] [22]

A perception cnn for facial expression recognition,

C. Tian, J. Xie, L. Li, W. Zuo, Y . Zhang, and D. Zhang, “A perception cnn for facial expression recognition,” IEEE Trans. Image Process. , vol. 34, pp. 8101–8113, 2025

work page 2025

[23] [23]

Fa- cial expression recognition through cross-modality at- tention fusion,

R. Ni, B. Y ang, X. Zhou, A. Cangelosi, and X. Liu, “Fa- cial expression recognition through cross-modality at- tention fusion,” IEEE Trans. Cognit. Dev. Syst. , vol. 15, no. 1, pp. 175–185, 2022

work page 2022

[24] [24]

Feature decomposition and reconstruction learning for effective facial expression recognition,

D. Ruan, Y . Y an, S. Lai, Z. Chai, C. Shen, and H. Wang, “Feature decomposition and reconstruction learning for effective facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 7660–7669

work page 2021

[25] [25]

FERMixNet: An occlusion robust facial expression recognition model with facial mixing augmentation and mid-level representation learning,

Y . Huang et al., “FERMixNet: An occlusion robust facial expression recognition model with facial mixing augmentation and mid-level representation learning,” IEEE Trans. Affective Comput. , 2024

work page 2024

[26] [26]

Learning informative and discriminative features for facial expression recognition in the wild,

Y . Li et al., “Learning informative and discriminative features for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3178–3189, 2021

work page 2021

[27] [27]

Cmdvit: A voluntary facial expression recognition model for complex mental disorders,

J. Y e et al., “Cmdvit: A voluntary facial expression recognition model for complex mental disorders,” IEEE Trans. Image Process. , 2025

work page 2025

[28] [28]

Co-attentive multi-task convolu- tional neural network for facial expression recognition,

W. Y u and H. Xu, “Co-attentive multi-task convolu- tional neural network for facial expression recognition,” Pattern Recognit., vol. 123, p. 108 401, 2022

work page 2022

[29] [29]

JADFER: Exploring spatial-contextual interaction with joint attention dropping for facial ex- 12 pression recognition,

Y . Gao et al., “JADFER: Exploring spatial-contextual interaction with joint attention dropping for facial ex- 12 pression recognition,” IEEE Trans. Affective Comput. , 2024

work page 2024

[30] [30]

Mhan: Multi-head hybrid attention net- work for facial expression recognition,

X. Wang et al., “Mhan: Multi-head hybrid attention net- work for facial expression recognition,” Pattern Recog- nit., vol. 170, p. 112 015, 2026

work page 2026

[31] [31]

Multi- relations aware network for in-the-wild facial expres- sion recognition,

D. Chen, G. Wen, H. Li, R. Chen, and C. Li, “Multi- relations aware network for in-the-wild facial expres- sion recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 8, pp. 3848–3859, 2023

work page 2023

[32] [32]

Relation-aware facial expression recognition,

Y . Xia, H. Y u, X. Wang, M. Jian, and F.-Y . Wang, “Relation-aware facial expression recognition,” IEEE Trans. Cognit. Dev. Syst., vol. 14, no. 3, pp. 1143–1154, 2021

work page 2021

[33] [33]

Adaptive multilayer perceptual attention network for facial ex- pression recognition,

H. Liu, H. Cai, Q. Lin, X. Li, and H. Xiao, “Adaptive multilayer perceptual attention network for facial ex- pression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9, pp. 6253–6266, 2022

work page 2022

[34] [34]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Las V egas, Nevada, USA, Jun. 2016, pp. 770–778

work page 2016

[35] [35]

CBAM: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in 2018 Proc. Eur . Conf. Comput. Vis. (ECCV) , Berlin, Heidelberg: Springer-V erlag, 2018, pp. 3–19

work page 2018

[36] [36]

Probabilistic interpretation of feedforward classiﬁcation network outputs, with relationships to statistical pattern recognition,

J. S. Bridle, “Probabilistic interpretation of feedforward classiﬁcation network outputs, with relationships to statistical pattern recognition,” in Neurocomputing, F. F. Soulié and J. Hérault, Eds., Berlin, Heidelberg, 1990, pp. 227–236, ISBN : 978-3-642-76153-9

work page 1990

[37] [37]

Context- aware emotion recognition networks,

J. Lee, S. Kim, S. Kim, J. Park, and K. Sohn, “Context- aware emotion recognition networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , Seoul, South Korea, 2019, pp. 10 143–10 152

work page 2019

[38] [38]

Challenges in representation learning: A report on three machine learning contests,

I. J. Goodfellow et al., “Challenges in representation learning: A report on three machine learning contests,” Neural Networks , pp. 117–124, 2013

work page 2013

[39] [39]

Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,

S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Honolulu, Hawaii, USA, Jun. 2017, pp. 2852–2861

work page 2017

[40] [40]

Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affective Comput., vol. 10, no. 1, pp. 18–31, 2017

work page 2017

[41] [41]

Video and image based emotion recogni- tion challenges in the wild: Emotiw 2015,

A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon, “Video and image based emotion recogni- tion challenges in the wild: Emotiw 2015,” in Proc. ACM Int. Conf. Multimodal Interaction ACM ICMI , Seattle, Washington, USA, Nov. 2015, pp. 423–426

work page 2015

[42] [42]

Acted facial expressions in the wild database,

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Acted facial expressions in the wild database,” ANU Tech. Rep. TR-CS-11, vol. 2, no. 1, 2011

work page 2011

[43] [43]

Ms- celeb-1m: A dataset and benchmark for large-scale face recognition,

Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao, “Ms- celeb-1m: A dataset and benchmark for large-scale face recognition,” in Proc. Eur . Conf. Comput. Vis. (ECCV) , Amsterdam, the Netherlands, Oct. 2016, pp. 87–102

work page 2016

[44] [44]

Retinaface: Single-shot multi-level face localisation in the wild,

J. Deng, J. Guo, E. V erveras, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2020, pp. 5203–5212

work page 2020

[45] [45]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014

[46] [46]

Distract your attention: Multi-head cross attention network for facial expression recognition,

Z. Wen, W. Lin, T. Wang, and G. Xu, “Distract your attention: Multi-head cross attention network for facial expression recognition,” Biomimetics, vol. 8, no. 2, p. 199, 2023

work page 2023

[47] [47]

A stochastic approximation method,

H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat. , pp. 400–407, 1951

work page 1951

[48] [48]

Gradient-based learning applied to document recog- nition,

Y . LeCun, L. Bottou, Y . Bengio, and P . Haffner, “Gradient-based learning applied to document recog- nition,” Proc. IEEE , vol. 86, no. 11, pp. 2278–2324, 2002

work page 2002

[49] [49]

Using the original and symmetrical facetraining samples to perform representation based two-step face recogni- tion,

Y . Xu, X. Zhu, Z. Li, G. Liu, Y . Lu, and H. Liu, “Using the original and symmetrical facetraining samples to perform representation based two-step face recogni- tion,” Pattern Recognit., vol. 46, no. 4, pp. 1151–1158, 2013

work page 2013

[50] [50]

Grad-cam++: Generalized gradient- based visual explanations for deep convolutional net- works,

A. Chattopadhay, A. Sarkar, P . Howlader, and V . N. Balasubramanian, “Grad-cam++: Generalized gradient- based visual explanations for deep convolutional net- works,” in Proc. IEEE Winter Conf. Comput. Vis. Appl. (WACV), Nevada, USA: IEEE, Mar. 2018, pp. 839–847

work page 2018

[51] [51]

Pose-adaptive hi- erarchical attention network for facial expression recog- nition,

Y . Liu, J. Peng, J. Zeng, and S. Shan, “Pose-adaptive hi- erarchical attention network for facial expression recog- nition,” arXiv:1905.10059, 2019

work page arXiv 1905

[52] [52]

Robust lightweight facial expression recognition network with label distribution training,

Z. Zhao, Q. Liu, and F. Zhou, “Robust lightweight facial expression recognition network with label distribution training,” in AAAI Conf. Artif. Intell. , Issue: 4, vol. 35, Virtual, Feb. 2021, pp. 3510–3519

work page 2021

[53] [53]

Facial expression recognition with inconsistently annotated datasets,

J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proc. Eur . Conf. Comput. Vis. (ECCV) , Munich, Ger- many, Sep. 2018, pp. 222–237

work page 2018

[54] [54]

Sup- pressing uncertainties for large-scale facial expression recognition,

K. Wang, X. Peng, J. Y ang, S. Lu, and Y . Qiao, “Sup- pressing uncertainties for large-scale facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2020, pp. 6897–6906

work page 2020

[55] [55]

Cnn-based facial affect anal- ysis on mobile devices,

C. Hewitt and H. Gunes, “Cnn-based facial affect anal- ysis on mobile devices,” arXiv:1807.08775, 2018

work page arXiv 2018

[56] [56]

FG-AGR: Fine-grained associative graph representa- tion for facial expression recognition in the wild,

C. Li, X. Li, X. Wang, D. Huang, Z. Liu, and L. Liao, “FG-AGR: Fine-grained associative graph representa- tion for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol. , vol. 34, no. 2, pp. 882–896, 2023, Publisher: IEEE

work page 2023

[57] [57]

Efﬁcient fa- cial feature learning with wide ensemble-based con- volutional neural networks,

H. Siqueira, S. Magg, and S. Wermter, “Efﬁcient fa- cial feature learning with wide ensemble-based con- volutional neural networks,” in AAAI Conf. Artif. In- tell., Issue: 04, vol. 34, New Y ork, USA, Feb. 2020, pp. 5800–5809

work page 2020

[58] [58]

FE-SpikeFormer: A camera-based fa- cial expression recognition method for hospital health monitoring,

Z. Dong et al., “FE-SpikeFormer: A camera-based fa- cial expression recognition method for hospital health monitoring,” IEEE J. Biomed. Health. Inf. , pp. 1–11, 2025. 13

work page 2025

[59] [59]

Unconstrained facial expression recognition with no- reference de-elements learning,

H. Li, N. Wang, X. Y ang, X. Wang, and X. Gao, “Unconstrained facial expression recognition with no- reference de-elements learning,” IEEE Trans. Affective Comput., vol. 15, no. 1, pp. 173–185, 2024

work page 2024

[60] [60]

Learning a facial expression embedding disentangled from identity,

W. Zhang, X. Ji, K. Chen, Y . Ding, and C. Fan, “Learning a facial expression embedding disentangled from identity,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2021, pp. 6759–6768

work page 2021

[61] [61]

MMA Trans: Muscle movement aware representation learning for facial expression recognition via transformers,

H. Liu et al., “MMA Trans: Muscle movement aware representation learning for facial expression recognition via transformers,” IEEE Trans. Ind. Inf. , 2024

work page 2024

[62] [62]

Learn from all: Erasing attention consistency for noisy label facial expression recognition,

Y . Zhang, C. Wang, X. Ling, and W. Deng, “Learn from all: Erasing attention consistency for noisy label facial expression recognition,” in 2022 Proc. Eur . Conf. Comput. Vis. (ECCV) , Tel Aviv, Israel, Oct. 2022, pp. 418–434

work page 2022

[63] [63]

Face2exp: Combating data biases for facial expression recognition,

D. Zeng, Z. Lin, X. Y an, Y . Liu, F. Wang, and B. Tang, “Face2exp: Combating data biases for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , New Orleans, Louisiana, USA, Jun. 2022, pp. 20 291–20 300

work page 2022

[64] [64]

Adap- tively learning facial expression representation via c- f labels and distillation,

H. Li, N. Wang, X. Ding, X. Y ang, and X. Gao, “Adap- tively learning facial expression representation via c- f labels and distillation,” IEEE Trans. Image Process. , vol. 30, pp. 2016–2028, 2021

work page 2016

[65] [65]

A novel lightweight facial expression recognition network based on deep shallow network fusion and attention mechanism,

Q. Y ang, Y . He, H. Chen, Y . Wu, and Z. Rao, “A novel lightweight facial expression recognition network based on deep shallow network fusion and attention mechanism,” Algorithms, vol. 18, no. 8, 2025

work page 2025

[66] [66]

Decoding group emotional dynamics in a web-based collaborative environment: A novel framework utiliz- ing multi-person facial expression recognition,

Q. Li, Z. Liu, Z. Zhang, Q. Wang, and M. Ma, “Decoding group emotional dynamics in a web-based collaborative environment: A novel framework utiliz- ing multi-person facial expression recognition,” Int. J. Hum.-Comput. Interact., vol. 41, no. 5, pp. 3455–3473, 2025

work page 2025

[67] [67]

Weighted classiﬁcation of deep and traditional histogram-based features with kernel representation for robust facial expression recognition,

M. Najmabadi, M. Masoudifar, and A. Hajipour, “Weighted classiﬁcation of deep and traditional histogram-based features with kernel representation for robust facial expression recognition,” Appl. Soft Com- put., vol. 182, p. 113 630, 2025

work page 2025

[68] [68]

Facial expression recogni- tion with visual transformers and attentional selective fusion,

F. Ma, B. Sun, and S. Li, “Facial expression recogni- tion with visual transformers and attentional selective fusion,” IEEE Trans. Affective Comput. , vol. 14, no. 2, pp. 1236–1248, 2021, Publisher: IEEE

work page 2021

[69] [69]

Learning vision transformer with squeeze and excitation for facial expression recogni- tion,

M. Aouayeb, W. Hamidouche, C. Soladie, K. Kpalma, and R. Seguier, “Learning vision transformer with squeeze and excitation for facial expression recogni- tion,” arXiv:2107.03107, 2021

work page arXiv 2021

[70] [70]

Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,

J. She, Y . Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR) , Virtual, Jun. 2021, pp. 6248–6257

work page 2021

[71] [71]

A novel attention residual network expression recognition method,

H. Qi, X. Zhang, Y . Shi, and X. Qi, “A novel attention residual network expression recognition method,” IEEE Access, vol. 12, pp. 24 609–24 620, 2024

work page 2024

[72] [72]

Pose-aware facial expression recognition assisted by expression descriptions,

S. Wang, Y . Wu, Y . Chang, G. Li, and M. Mao, “Pose-aware facial expression recognition assisted by expression descriptions,” IEEE Trans. Affective Com- put., vol. 15, no. 1, pp. 241–253, 2024

work page 2024

[73] [73]

Human emotion recognition with relational region-level analysis,

W. Li, X. Dong, and Y . Wang, “Human emotion recognition with relational region-level analysis,” IEEE Trans. Affective Comput. , vol. 14, no. 1, pp. 650–663, 2023

work page 2023

[74] [74]

Label distribution learning on auxiliary label space graphs for facial expression recognition,

S. Chen, J. Wang, Y . Chen, Z. Shi, X. Geng, and Y . Rui, “Label distribution learning on auxiliary label space graphs for facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Virtual, Jun. 2020, pp. 13 984–13 993

work page 2020

[75] [75]

Facial expression recognition in the wild using multi-level fea- tures and attention mechanisms,

Y . Li, G. Lu, J. Li, Z. Zhang, and D. Zhang, “Facial expression recognition in the wild using multi-level fea- tures and attention mechanisms,” IEEE Trans. Affective Comput., vol. 14, no. 1, pp. 451–462, 2020, Publisher: IEEE

work page 2020

[76] [76]

Searching for mobilenetv3,

A. Howard et al., “Searching for mobilenetv3,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) , Seoul, South Korea, Oct. 2019, pp. 1314–1324. Chunwei Tian (Senior Member, IEEE) received the Ph.D. degree from Harbin Institute of Tech- nology, Harbin, China, in 2021. He is currently a Professor with the School of Computer Science and Technology, Harbin Instit...

work page 2019