pith. sign in

arxiv: 2605.17483 · v1 · pith:XJKYEJIInew · submitted 2026-05-17 · 💻 cs.CV

On Applicability of Synthetic Datasets for Facial Expression Recognition

Pith reviewed 2026-05-20 14:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords facial expression recognitionsynthetic datasetspseudo-labelingdiffusion modelsGAN editingprivacy preservationclass imbalancecross-dataset evaluation
0
0 comments X

The pith

Synthetic datasets from pseudo-labeling and generative models can substitute or combine with real data for facial expression recognition

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Facial expression recognition struggles with skewed class distributions in public datasets and with privacy rules that block creation of large balanced collections. The paper tests three ways to build usable synthetic data instead: labeling big unlabeled face sets with a teacher model and confidence thresholds, generating new faces via diffusion models prompted with demographic attributes, and editing expressions on existing faces using GANs while preserving identity. Experiments train IR50 and POSTERv1 models on mixtures of real benchmarks such as AffectNet, RAF-DB, and FER2013 together with synthetic sources including DigiFace and FFHQ, then measure cross-dataset accuracy. Results indicate the synthetic sets can replace or augment real data and still deliver workable generalization. If correct, this removes a major barrier to building balanced, shareable training resources without direct collection or distribution of real faces.

Core claim

The paper evaluates three complementary strategies for constructing privacy-preserving FER datasets in the standard seven discrete facial expression classes setting: (i) pseudo-labeling large unlabeled face collections with a teacher model under a confidence-thresholding scheme, (ii) prompt-driven synthesis using diffusion models conditioned on demographic attributes, and (iii) task-aware GAN-based expression editing that modifies facial expression while preserving identity and realism. Using cross-dataset evaluations on AffectNet, RAF-DB, and FER2013 with synthetic sources such as DigiFace, DCFace, EmoNet-Face BIG, and FFHQ, the findings demonstrate how synthetic data can effectively be a 1

What carries the argument

Three strategies for synthetic dataset construction: confidence-thresholded pseudo-labeling of unlabeled collections, demographic-prompted diffusion synthesis, and identity-preserving task-aware GAN expression editing

Load-bearing premise

The pseudo-labels from the teacher model and the images produced by diffusion and GAN synthesis retain enough label accuracy and visual realism to support improved or maintained generalization in cross-dataset evaluations.

What would settle it

Training models exclusively on the synthetic datasets and finding markedly lower accuracy on real held-out sets such as AffectNet or FER2013 compared with real-data baselines would disprove the substitution claim.

Figures

Figures reproduced from arXiv: 2605.17483 by Ali Azmoudeh, Erdi Sar{\i}ta\c{s}, Haz{\i}m Kemal Ekenel, \"Omer Y{\i}ld{\i}r{\i}m.

Figure 1
Figure 1. Figure 1: Abstract overview of our FER dataset curation pipelines, with the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our dataset curation pipelines. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the GANmut polar coordinate system. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Generated face image samples from the constructed datasets. The [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Larger representation of samples in curated datasets [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Facial Expression Recognition faces two core challenges. The first is class imbalance in public datasets, which skews the learning process and weakens generalization. The second is related to privacy and data collection constraints, which limit the sharing of facial images and restrict the creation of large, balanced datasets. To address these issues, we examine three complementary strategies for constructing privacy-preserving FER datasets in the standard seven discrete facial expression classes setting. Our strategies are: (i) pseudo-labeling large unlabeled face collections with a teacher model under a confidence-thresholding scheme, (ii) prompt-driven synthesis using diffusion models conditioned on demographic attributes, and (iii) task-aware GAN-based expression editing that modifies facial expression while preserving identity and realism. For training and evaluation, we employed widely adopted datasets, including AffectNet, RAF-DB, and FER2013. We utilized the synthetic datasets DigiFace, DCFace, and EmoNet-Face BIG as unlabeled sources for pseudo-labeling. Additionally, we utilized the FFHQ dataset as the source for generative synthesis. The main experiments are conducted using a classic CNN backbone, IR50, and we also explore a more complex architecture, POSTERv1, to assess its feasibility and robustness. Using cross-dataset evaluations, we analyze the trade-offs each strategy presents in curated datasets. The findings demonstrate how synthetic data can effectively substitute or be combined with real datasets to mitigate imbalance and privacy limitations. Code and generated datasets:https://www.github.com/AliAZ98/SyntFER

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper examines three strategies for building privacy-preserving synthetic FER datasets in the seven-class setting: (i) confidence-thresholded pseudo-labeling of large unlabeled collections (DigiFace, DCFace, EmoNet-Face BIG) using a teacher model, (ii) prompt-driven diffusion synthesis on FFHQ conditioned on demographic attributes, and (iii) task-aware GAN expression editing that preserves identity. Models (IR50 and POSTERv1) are trained on real (AffectNet, RAF-DB, FER2013), synthetic, and combined data and evaluated in cross-dataset protocols. The central claim is that these synthetic sources can substitute for or augment real data to reduce class imbalance and privacy constraints.

Significance. If the empirical gains are robust, the work is significant because it supplies concrete, reproducible pipelines for generating balanced, privacy-safe FER training sets at scale. The multi-strategy design (pseudo-labeling + diffusion + GAN editing) and dual-backbone evaluation (lightweight CNN and transformer-style POSTERv1) allow direct comparison of trade-offs that are practically relevant to the community.

major comments (3)
  1. [Pseudo-labeling strategy] Pseudo-labeling description (prior to §4 experiments): no error-rate quantification, confusion-matrix analysis, or human validation of the teacher model's pseudo-labels on held-out ground-truth subsets is reported. Without this, it is impossible to separate genuine generalization gains from possible label noise or demographic balancing effects.
  2. [Cross-dataset evaluation] Cross-dataset evaluation protocol (§4): the manuscript provides no tables or figures that directly compare accuracy/F1 scores for real-only, synthetic-only, and mixed training regimes against standard baselines (e.g., training on AffectNet alone). The abstract and high-level description therefore leave the substitution claim without the quantitative support needed to assess its magnitude.
  3. [Generative synthesis methods] Expression fidelity in generative methods (diffusion and GAN sections): no automated or human study measures how often the generated/edited images retain the intended expression label at rates comparable to real data. This validation is load-bearing for the claim that the synthetic images support improved cross-dataset generalization.
minor comments (2)
  1. [Experimental setup] The GitHub repository link is given but the manuscript does not specify exact train/validation splits, confidence thresholds, or prompt templates used for reproducibility.
  2. [Introduction] Notation for the seven expression classes is introduced without an explicit mapping table; readers must infer the standard ordering from context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of validation and presentation that will improve the clarity and rigor of the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Pseudo-labeling strategy] Pseudo-labeling description (prior to §4 experiments): no error-rate quantification, confusion-matrix analysis, or human validation of the teacher model's pseudo-labels on held-out ground-truth subsets is reported. Without this, it is impossible to separate genuine generalization gains from possible label noise or demographic balancing effects.

    Authors: We agree that a quantitative assessment of pseudo-label quality is necessary to strengthen the interpretation of results. In the revised manuscript we will add a dedicated subsection that reports error rates and a confusion matrix for the teacher model evaluated on a held-out ground-truth subset drawn from AffectNet. We will also discuss the effect of the chosen confidence threshold on label noise versus demographic balancing. These additions will allow readers to better separate the contributions of accurate labeling from other factors. revision: yes

  2. Referee: [Cross-dataset evaluation] Cross-dataset evaluation protocol (§4): the manuscript provides no tables or figures that directly compare accuracy/F1 scores for real-only, synthetic-only, and mixed training regimes against standard baselines (e.g., training on AffectNet alone). The abstract and high-level description therefore leave the substitution claim without the quantitative support needed to assess its magnitude.

    Authors: The current cross-dataset protocol already evaluates models trained on real, synthetic, and combined data, yet we acknowledge that the presentation could be more explicit. We will insert new summary tables that directly juxtapose accuracy and macro-F1 for (i) real-only baselines (AffectNet, RAF-DB, FER2013), (ii) each synthetic-only regime, and (iii) mixed real+synthetic training, all under the same cross-dataset splits. These tables will also include the standard single-dataset baselines the referee mentions, thereby providing the quantitative support needed to evaluate the substitution and augmentation claims. revision: yes

  3. Referee: [Generative synthesis methods] Expression fidelity in generative methods (diffusion and GAN sections): no automated or human study measures how often the generated/edited images retain the intended expression label at rates comparable to real data. This validation is load-bearing for the claim that the synthetic images support improved cross-dataset generalization.

    Authors: We recognize that direct evidence of expression fidelity is essential for the generative pipelines. In the revision we will report an automated fidelity study that applies a strong pre-trained FER classifier to the diffusion- and GAN-generated images and measures the fraction that match the intended label. We will additionally include a modest human validation study on randomly sampled images, reporting retention rates and comparing them to the same metric on real data. These results will be presented alongside the main experiments to support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on independent cross-dataset evaluations

full rationale

The paper reports experimental outcomes from training IR50 and POSTERv1 models on combinations of real datasets (AffectNet, RAF-DB, FER2013) with pseudo-labeled collections (DigiFace, DCFace, EmoNet-Face) and generated images (from FFHQ via diffusion/GAN). No equations, derivations, or fitted parameters are present that could reduce a claimed prediction to its own inputs by construction. Performance metrics arise from standard supervised training and cross-dataset testing rather than any self-referential normalization or re-labeling loop. Self-citations, if any, are not load-bearing for the core claims, which remain falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard machine-learning assumptions about data quality and generalization rather than introducing new mathematical axioms or invented physical entities. The main unverified premise is the reliability of the teacher model's pseudo-labels.

axioms (1)
  • domain assumption A teacher model can produce sufficiently accurate pseudo-labels for unlabeled face images when a confidence threshold is applied.
    Invoked in the first strategy for constructing privacy-preserving datasets from large unlabeled collections.

pith-pipeline@v0.9.0 · 5826 in / 1321 out tokens · 60171 ms · 2026-05-20T14:09:45.615978+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

  1. [1]

    Anand, A

    T. Anand, A. Garg, and K. Mitra. IP-FaceDiff: Identity-preserving facial video editing with diffusion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 248– 258, 2025

  2. [2]

    Azmoudeh, C

    A. Azmoudeh, C. A. Gumussoy, and H. K. Ekenel. Advanced facial expression classification with CNN-Transformer integration for human-computer interaction. In2024 9th International Conference on Computer Science and Engineering (UBMK), pages 1–6, 2024

  3. [3]

    G. Bae, M. de La Gorce, T. Baltru ˇsaitis, C. Hewitt, D. Chen, J. Valentin, R. Cipolla, and J. Shen. DigiFace-1M: 1 million digital face images for face recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3526– 3535, 2023

  4. [4]

    V . S. Bhati, N. Tiwari, and M. Chawla. A generalized zero-shot deep learning classifier for emotion recognition using facial expression images.IEEE Access, 2025

  5. [5]

    Bi ´nkowski, D

    M. Bi ´nkowski, D. J. Sutherland, M. Arbel, and A. Gretton. De- mystifying MMD GANs. InInternational Conference on Learning Representations, 2018

  6. [6]

    Bounareli, C

    S. Bounareli, C. Tzelepis, V . Argyriou, I. Patras, and G. Tzimiropou- los. DiffusionAct: Controllable diffusion autoencoder for one-shot face reenactment. In2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–11. IEEE, 2025

  7. [7]

    Bozorgtabar, M

    B. Bozorgtabar, M. S. Rad, H. K. Ekenel, and J.-P. Thiran. Using photorealistic face synthesis and domain adaptation to improve facial expression analysis. In2019 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019), pages 1–8. IEEE, 2019

  8. [8]

    C. Chen. PyTorch Face Landmark: A fast and accurate fa- cial landmark detector, 2021. Open-source software available at https://github.com/cunjian/pytorch face landmark

  9. [9]

    X. Chen, Y . Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable representation learning by infor- mation maximizing generative adversarial nets.Advances in Neural Information Processing Systems, 29, 2016

  10. [10]

    Y . Cho, C. Kim, H. Cho, Y . Ku, E. Kim, M. Boboev, J. Lee, and S. Baek. RMFER: Semi-supervised contrastive learning for facial expression recognition with reaction mashup video. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5913–5922, 2024

  11. [11]

    Y . Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. StarGAN: Unified generative adversarial networks for multi-domain image-to- image translation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018

  12. [12]

    d’Apolito, D

    S. d’Apolito, D. P. Paudel, Z. Huang, A. Romero, and L. Van Gool. GANmut: Learning interpretable conditional space for gamut of emo- tions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 568–577, 2021

  13. [13]

    J. Deng, J. Guo, N. Xue, and S. Zafeiriou. ArcFace: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690– 4699, 2019

  14. [14]

    Foret, A

    P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InInternational Conference on Learning Representations, 2021

  15. [15]

    I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y . Tang, D. Thaler, D.-H. Lee, et al. Challenges in representation learning: A report on three machine learning contests. InInternational Conference on Neural Information Processing, pages 117–124. Springer, 2013

  16. [16]

    Green, Y

    D. Green, Y . Shang, J. Cheong, Y . Liu, and H. Gunes. Gender fairness of machine learning algorithms for pain detection. In2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–9, 2025

  17. [17]

    Y . Gu, H. Yan, X. Zhang, Y . Wang, Y . Ji, and F. Ren. Toward facial expression recognition in the wild via noise-tolerant network. IEEE Transactions on Circuits and Systems for Video Technology, 33(5):2033–2047, 2022

  18. [18]

    Gulrajani, F

    I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, and A. C. Courville. Improved training of wasserstein GANs.Advances in Neural Information Processing Systems, 30, 2017

  19. [19]

    Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. InEuropean Conference on Computer Vision, pages 87–102. Springer, 2016

  20. [20]

    X. He, C. Luo, X. Xian, B. Li, M. H. Khan, Z. Ge, W. Xie, S. Song, L. Shen, B. Ghanem, et al. SynFER: Towards boosting facial expression recognition with synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10184–10195, 2025

  21. [21]

    Heusel, H

    M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium.Advances in Neural Information Processing Systems, 30, 2017

  22. [22]

    Huang, H

    P.-J. Huang, H. Xie, H.-C. Huang, H.-H. Shuai, and W.-H. Cheng. CA-FER: mitigating spurious correlation with counterfactual attention in facial expression recognition.IEEE Transactions on Affective Computing, 15(3):977–989, 2023

  23. [23]

    Y . Ji, Y . Hu, Y . Yang, and H. T. Shen. Region attention enhanced unsu- pervised cross-domain facial emotion recognition.IEEE Transactions on Knowledge and Data Engineering, 35(4):4190–4201, 2021

  24. [24]

    Karali, A

    A. Karali, A. Bassiouny, and M. El-Saban. Facial expression recogni- tion in the wild using rich deep features. In2015 IEEE International Conference on Image Processing (ICIP), pages 3442–3446. IEEE, 2015

  25. [25]

    Karras, T

    T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. InInternational Conference on Learning Representations, 2018

  26. [26]

    Karras, S

    T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401– 4410, 2019

  27. [27]

    M. Kim, F. Liu, A. Jain, and X. Liu. DCFace: Synthetic face generation with dual condition diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12715–12725, 2023

  28. [28]

    Kollias, P

    D. Kollias, P. Tzirakis, A. Cowen, S. Zafeiriou, I. Kotsia, A. Baird, C. Gagne, C. Shao, and G. Hu. The 6th affective behavior analysis in-the-wild (ABAW) competition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4587– 4598, 2024

  29. [29]

    Kopalidis, V

    T. Kopalidis, V . Solachidis, N. Vretos, and P. Daras. Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets.Information, 15(3):135, 2024

  30. [30]

    Laishram, M

    L. Laishram, M. Shaheryar, J. T. Lee, and S. K. Jung. Toward a privacy-preserving face recognition system: A survey of leakages and solutions.ACM Computing Surveys, 57(6):1–38, 2025

  31. [31]

    N. Le, K. Nguyen, Q. Tran, E. Tjiputra, B. Le, and A. Nguyen. Uncertainty-aware label distribution learning for facial expression recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6088–6097, 2023

  32. [32]

    H. Li, Y . Luo, T. Gu, and L. Chang. BFFN: A novel balanced feature fusion network for fair facial expression recognition.Engineering Applications of Artificial Intelligence, 138:109277, 2024

  33. [33]

    H. Li, Y . Zhang, J. Yao, N. Wang, and B. Han. Towards regularized mixture of predictions for class-imbalanced semi-supervised facial expression recognition. In J. Kwok, editor,Proceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence, IJCAI- 25, pages 1377–1385. International Joint Conferences on Artificial Intelligence ...

  34. [34]

    J. Li, J. Nie, D. Guo, R. Hong, and M. Wang. Emotion separation and recognition from a facial expression by generating the poker face with vision transformers.IEEE Transactions on Computational Social Systems, 2024

  35. [35]

    S. Li, W. Deng, and J. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2852–2861, 2017

  36. [36]

    Z. Li, Y . Wang, B. Guan, and J. Yin. Semantic data augmentation for long-tailed facial expression recognition. In2023 8th International Conference on Computer and Communication Systems (ICCCS), pages 1052–1055. IEEE, 2023

  37. [37]

    H. Liu, R. An, Z. Zhang, B. Ma, W. Zhang, Y . Song, Y . Hu, W. Chen, and Y . Ding. Norface: Improving facial expression analysis by identity normalization. InEuropean Conference on Computer Vision, pages 293–314. Springer, 2024

  38. [38]

    Y . M. Liu, K. H. M. Cheng, M. R. Savic, H. Chen, Z. Yu, and G. Zhao. 3-d face de-identification with preserving multi-facial attributes: A benchmark.IEEE Transactions on Biometrics, Behavior, and Identity Science, 7:681–694, 2025

  39. [39]

    Lucey, J

    P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pages 94–101. IEEE, 2010

  40. [40]

    J. Mao, R. Xu, X. Yin, Y . Chang, B. Nie, A. Huang, and Y . Wang. POSTER++: A simpler and stronger facial expression recognition network.Pattern Recognition, 157:110951, 2025

  41. [41]

    Melzi, R

    P. Melzi, R. Tolosana, R. Vera-Rodriguez, M. Kim, C. Rathgeb, X. Liu, I. DeAndres-Tame, A. Morales, J. Fierrez, J. Ortega-Garcia, et al. FRCSyn challenge at W ACV 2024: Face recognition challenge in the era of synthetic data. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 892–901, 2024

  42. [42]

    Melzi, R

    P. Melzi, R. Tolosana, R. Vera-Rodriguez, M. Kim, C. Rathgeb, X. Liu, I. DeAndres-Tame, A. Morales, J. Fierrez, J. Ortega-Garcia, et al. FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems. Information Fusion, 107:102322, 2024

  43. [43]

    Mollahosseini, B

    A. Mollahosseini, B. Hasani, and M. H. Mahoor. AffectNet: A database for facial expression, valence, and arousal computing in the wild.IEEE Transactions on Affective Computing, 10(1):18–31, 2017

  44. [44]

    Narayan, V

    K. Narayan, V . VS, R. Chellappa, and V . M. Patel. FaceX- Former: A unified transformer for facial analysis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11369–11382, 2025

  45. [45]

    ChatGPT — GPT-5 thinking, 2025

    OpenAI. ChatGPT — GPT-5 thinking, 2025. Large language model; responses generated via conversational interface

  46. [46]

    F. P. Papantoniou, A. Lattas, S. Moschoglou, J. Deng, B. Kainz, and S. Zafeiriou. Arc2Face: A foundation model for ID-consistent human faces. InEuropean Conference on Computer Vision, pages 241–261. Springer, 2024

  47. [47]

    F. D. Protection. The EU general data protection regulation (GDPR). https://gdpr-info.eu/, 2018

  48. [48]

    H. Qiu, B. Yu, D. Gong, Z. Li, W. Liu, and D. Tao. SynFace: Face recognition with synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10880–10890, 2021

  49. [49]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021

  50. [50]

    Razghandi, H

    M. Razghandi, H. Zhou, M. Erol-Kantarci, and D. Turgut. Variational autoencoder generative adversarial network for synthetic data genera- tion in smart home. InICC 2022-IEEE International Conference on Communications, pages 4781–4786. IEEE, 2022

  51. [51]

    Rombach, A

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

  52. [52]

    Schuhmann, R

    C. Schuhmann, R. Kaczmarczyk, G. Rabby, F. Friedrich, M. Kraus, K. Kalyan, K. Nadi, H. Nguyen, K. Kersting, and S. Auer. EmoNet- Face: An expert-annotated benchmark for synthetic emotion recogni- tion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025

  53. [53]

    H. Shin, B. Lee, B. Ku, and H. Ko. Noisy label facial expression recognition via face-specific label distribution learning.Image and Vision Computing, 143:104901, 2024

  54. [54]

    SynPAIN: A Synthetic Dataset of Pain and Non-Pain Facial Expressions

    B. Taati, M. Muzammil, Y . Zarghami, A. Moturu, A. Kazerouni, H. Reimer, A. Mihailidis, and T. Hadjistavropoulos. SynPAIN: A synthetic dataset of pain and non-pain facial expressions.arXiv preprint arXiv:2507.19673, 2025

  55. [55]

    Toma ˇsevi´c, F

    D. Toma ˇsevi´c, F. Boutros, C. Lin, N. Damer, V . ˇStruc, and P. Peer. ID-Booth: Identity-consistent face generation with diffusion models. In2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–10. IEEE, 2025

  56. [56]

    G. I. Tutuianu, Y . Liu, A. Alam ¨aki, and J. Kauttonen. Benchmark- ing deep facial expression recognition: An extensive protocol with balanced dataset in the wild.Engineering Applications of Artificial Intelligence, 136:108983, 2024

  57. [57]

    Van Den Oord, O

    A. Van Den Oord, O. Vinyals, et al. Neural discrete representation learning.Advances in Neural Information Processing Systems, 30, 2017

  58. [58]

    Varanka, H.-Q

    T. Varanka, H.-Q. Khor, Y . Li, M. Wei, H. Kung, N. Sebe, and G. Zhao. Towards localized fine-grained control for facial expression generation. arXiv preprint arXiv:2407.20175, 2024

  59. [59]

    von Platen, S

    P. von Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, D. Nair, S. Paul, W. Berman, Y . Xu, S. Liu, and T. Wolf. Diffusers: State-of-the-art diffusion models.https:// github.com/huggingface/diffusers, 2022

  60. [60]

    Yeung, T

    M. Yeung, T. Teramoto, S. Wu, T. Fujiwara, K. Suzuki, and T. Ko- jima. VariFace: Fair and diverse synthetic dataset generation for face recognition.arXiv preprint arXiv:2412.06235, 2024

  61. [61]

    J. Yu, Z. Wei, Z. Cai, G. Zhao, Z. Zhang, Y . Wang, G. Xie, J. Zhu, W. Zhu, Q. Liu, et al. Exploring facial expression recognition through semi-supervised pretraining and temporal modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 4880–4887, 2024

  62. [62]

    Zhang, Z

    K. Zhang, Z. Zhang, Z. Li, and Y . Qiao. Joint face detection and alignment using multitask cascaded convolutional networks.IEEE Signal Processing Letters, 23(10):1499–1503, Oct 2016

  63. [63]

    Zhang, Y

    Y . Zhang, Y . Li, X. Liu, W. Deng, et al. Leave no stone unturned: Mine extra knowledge for imbalanced facial expression recognition. Advances in Neural Information Processing Systems, 36:14414–14426, 2023

  64. [64]

    Zheng, M

    C. Zheng, M. Mendieta, and C. Chen. POSTER: A pyramid cross- fusion transformer network for facial expression recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3146–3155, 2023

  65. [65]

    R. Zhi, M. Liu, and D. Zhang. A comprehensive survey on automatic facial action unit analysis.The Visual Computer, 36(5):1067–1093, 2020

  66. [66]

    J. Zhou, J. Li, Y . Yan, L. Wu, and H. Xu. Mixing global and local features for long-tailed expression recognition.Information, 14(2):83, 2023

  67. [67]

    S. Zhou, K. Chan, C. Li, and C. C. Loy. Towards robust blind face restoration with codebook lookup transformer.Advances in Neural Information Processing Systems, 35:30599–30611, 2022

  68. [68]

    J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2223–2232, 2017. SUPPLEMENTARYMATERIALS Additional Figures Fig. 5. Larger representation of samples in curated datasets Additional Tables TABLE VII...