SynPAIN: A Synthetic Dataset of Pain and Non-Pain Facial Expressions

Abhishek Moturu; Alex Mihailidis; Amirhossein Kazerouni; Babak Taati; Hailey Reimer; Muhammad Muzammil; Thomas Hadjistavropoulos; Yasamin Zarghami

arxiv: 2507.19673 · v3 · submitted 2025-07-25 · 💻 cs.CV

SynPAIN: A Synthetic Dataset of Pain and Non-Pain Facial Expressions

Babak Taati , Muhammad Muzammil , Yasamin Zarghami , Abhishek Moturu , Amirhossein Kazerouni , Hailey Reimer , Alex Mihailidis , Thomas Hadjistavropoulos This is my paper

Pith reviewed 2026-05-19 01:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords synthetic datasetpain detectionfacial expressionsalgorithmic biasolder adultsfacial action unitsdata augmentationdemographic diversity

0 comments

The pith

A synthetic dataset of pain expressions reveals demographic biases in automated pain detection and boosts model performance when used for augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates SynPAIN, a large collection of 10,710 synthetic facial images balanced across five ethnic groups, two age brackets including older adults, and two genders. It uses commercial image generators to produce expressions that score as clinically meaningful pain via standard facial action unit measures. Tests on existing pain detection models show clear performance gaps across age, gender, and ethnicity that prior smaller datasets could not detect. Adding age-matched synthetic examples to real clinical data raises average precision by 2.4 percentage points. The work targets privacy limits and underrepresentation of older adults with dementia who cannot self-report pain.

Core claim

SynPAIN supplies 10,710 demographically balanced synthetic facial images that display pain and non-pain expressions generated by commercial AI tools. Facial action unit analysis confirms that the synthetic pain images produce higher pain scores than neutral or non-pain images. When applied to existing pain detection models, the dataset exposes large performance differences across demographic categories. Augmenting real clinical training sets with age-matched synthetic images improves average precision on held-out clinical data by 2.4 percentage points.

What carries the argument

Demographically balanced synthetic facial images generated by commercial AI tools and validated for pain content through facial action unit scoring.

If this is right

Existing pain detection models display measurable accuracy gaps across age, ethnicity, and gender once tested on balanced data.
Age-matched synthetic augmentation produces a 2.4 percentage point gain in average precision on real clinical pain data.
Synthetic data removes privacy barriers that previously limited studies of pain assessment in older adults with dementia.
Comprehensive bias audits become possible at larger scale without relying solely on scarce real patient recordings.
The same generation and validation approach supplies a reusable template for building demographically diverse medical expression datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar synthetic pipelines could be applied to other facial cues such as fatigue or confusion in geriatric care.
Routine augmentation with age-balanced synthetic examples may become a standard step for fairness testing in clinical computer vision.
Models trained on this dataset could be checked for generalization to additional underrepresented groups not yet included.
The bias measurement framework offers a template that other medical imaging tasks could adapt to quantify demographic disparities.

Load-bearing premise

AI-generated faces accurately reproduce the facial movements that occur in real older adults experiencing clinical pain.

What would settle it

Direct comparison of facial action unit scores or pain detection accuracy between the synthetic images and video of actual older adult patients with documented pain behaviors.

read the original abstract

Accurate pain assessment in patients with limited ability to communicate, such as older adults with severe dementia, represents a critical healthcare challenge. Robust automated systems of pain behavior detection may facilitate such assessments. Existing pain detection datasets, however, suffer from limited ethnic/racial diversity, privacy constraints, and underrepresentation of older adults who are the primary target population for clinical deployment. We present SynPAIN, a large-scale synthetic dataset containing 10,710 facial expression images across five ethnicities/races, representing two age groups, and two genders. Using commercial generative AI tools, we created demographically balanced synthetic identities with clinically meaningful pain expressions. Our validation demonstrates that synthetic pain expressions exhibit expected pain patterns, scoring significantly higher than neutral and non-pain expressions using clinically validated pain assessment tools based on facial action unit analysis. We experimentally demonstrate SynPAIN's utility in identifying algorithmic bias in existing pain detection models. Through comprehensive bias evaluation, we reveal substantial performance disparities across demographics characteristics. These performance disparities were previously undetectable with smaller, less diverse datasets. Furthermore, we demonstrate that age-matched synthetic data augmentation improves pain detection performance on real clinical data, achieving a 2.4 percentage point improvement in average precision. SynPAIN addresses critical gaps in pain assessment research by providing the first publicly available, demographically diverse synthetic dataset specifically designed for older adult pain detection, while establishing a framework for measuring and mitigating algorithmic bias. The dataset, code, and trained models is available at https://mmzml.github.io/SynPAIN

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces SynPAIN, a synthetic dataset of 10,710 facial expression images spanning five ethnicities/races, two age groups, and two genders. Generated via commercial AI tools, the dataset targets gaps in existing pain-detection resources by providing demographically balanced synthetic identities with pain and non-pain expressions. Validation uses clinically validated pain assessment tools based on facial action unit analysis, with synthetic pain expressions scoring significantly higher than neutral and non-pain ones. The work claims to reveal previously undetectable algorithmic bias across demographics in existing models and reports a 2.4 percentage point gain in average precision on real clinical data via age-matched synthetic augmentation. The dataset, code, and models are released publicly.

Significance. If the validation and augmentation results prove robust, SynPAIN could meaningfully advance automated pain assessment for older adults and diverse populations by mitigating data scarcity, privacy constraints, and underrepresentation. The public release and bias-mitigation framework would support reproducibility and downstream fairness research in computer vision for healthcare.

major comments (2)

[Abstract] Abstract: The central utility claims—that SynPAIN identifies previously undetectable demographic performance disparities and yields a 2.4 pp average-precision improvement via age-matched augmentation—lack any description of the pain-detection models tested, the real clinical dataset(s) used for evaluation, the augmentation protocol, sample sizes, or statistical testing. These omissions are load-bearing for the headline results.
[Abstract] Abstract: Validation is described only as synthetic pain expressions 'scoring significantly higher' on clinically validated tools based on facial action unit analysis, without naming specific AUs, reporting effect sizes, or providing direct quantitative comparison to real older-adult pain recordings. This leaves the claim of clinical meaningfulness unsupported.

minor comments (1)

[Abstract] Abstract: The total image count (10,710) is given without breakdown by pain/non-pain, age, or ethnicity, which would clarify balance and support reproducibility claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below and will revise the abstract to improve clarity and support for the central claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central utility claims—that SynPAIN identifies previously undetectable demographic performance disparities and yields a 2.4 pp average-precision improvement via age-matched augmentation—lack any description of the pain-detection models tested, the real clinical dataset(s) used for evaluation, the augmentation protocol, sample sizes, or statistical testing. These omissions are load-bearing for the headline results.

Authors: We agree that the abstract would benefit from greater specificity on these experimental elements to make the utility claims more self-contained. In the revised manuscript we will add concise descriptions of the pain-detection models evaluated, the real clinical dataset(s) used for testing, the age-matched augmentation protocol, relevant sample sizes, and the statistical tests applied. These details are already present in the methods and results sections and can be summarized without lengthening the abstract excessively. revision: yes
Referee: [Abstract] Abstract: Validation is described only as synthetic pain expressions 'scoring significantly higher' on clinically validated tools based on facial action unit analysis, without naming specific AUs, reporting effect sizes, or providing direct quantitative comparison to real older-adult pain recordings. This leaves the claim of clinical meaningfulness unsupported.

Authors: We accept that naming the specific action units, reporting effect sizes, and providing a brief quantitative comparison to real older-adult recordings would strengthen the validation claim in the abstract. We will revise the abstract to include the key AUs examined, the associated effect sizes, and a short statement on how the synthetic pain scores relate to those observed in real clinical recordings of older adults, drawing directly from the validation analysis already conducted in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset creation and validation with no derivations or fitted predictions

full rationale

The paper presents a synthetic dataset generated via commercial AI tools, validated through facial action unit scoring and used in empirical bias-detection and augmentation experiments. No mathematical derivations, equations, parameter fitting, or predictions that reduce to inputs by construction appear in the provided abstract. All central claims rest on reported experimental outcomes rather than self-definitional steps, self-citation chains, or renamed known results. The work is self-contained as an empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on generative AI producing expressions that match real clinical pain patterns and on facial action unit analysis being a sufficient validator for synthetic data quality.

axioms (1)

domain assumption Commercial generative AI tools can produce realistic and clinically meaningful facial expressions of pain across demographics.
Directly invoked in dataset creation using these tools as described in the abstract.

pith-pipeline@v0.9.0 · 5813 in / 1368 out tokens · 64714 ms · 2026-05-19T01:44:03.634448+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present SynPAIN, a large-scale synthetic dataset containing 10,710 facial expression images... using clinically validated pain assessment tools based on facial action unit analysis.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We experimentally demonstrate SynPAIN's utility in identifying algorithmic bias... achieving a 2.4 percentage point improvement in average precision.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Applicability of Synthetic Datasets for Facial Expression Recognition
cs.CV 2026-05 unverdicted novelty 5.0

Synthetic datasets created via diffusion models, GAN editing, and pseudo-labeling can substitute for or augment real data to improve facial expression recognition while respecting privacy constraints.