WELD: The First Naturalistic Long-Period Small-Team Workplace Emotion Dataset for Ubiquitous Affective Computing
Pith reviewed 2026-05-21 20:50 UTC · model grok-4.3
The pith
WELD is the first dataset to combine multi-year duration, naturalistic workplace setting, stable small team, and fully passive IRB-compliant sensing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WELD comprises 733,780 per-frame seven-class facial-expression probability vectors from 49 employees of a Chinese software company over 30.1 months, the first corpus to satisfy the four simultaneous criteria of long naturalistic duration, workplace context, stable small-team structure, and fully passive sensing that survives institutional review, thereby supporting both within-individual longitudinal and within-team relational analyses on the same subjects.
What carries the argument
The WELD corpus itself, a multi-year collection of passive facial-expression probability vectors gathered from a fixed workplace team.
If this is right
- Variance decomposition shows 19.3 percent of daily-valence variance is between-person and 29.8 percent is seasonal, establishing a ceiling for predictive models.
- Hidden Markov modeling identifies six emotional regimes with negative-state dwell times of 16-18 days versus 3 days for other states.
- Leave-one-person-out turnover prediction reaches AUC 0.79 but only 0.52 concordance index, showing that AUC alone misleads without survival-aware baselines.
- An off-the-shelf FER model over-predicts angry expressions on neutral Asian faces (0.194 versus Western priors near 0.05), supplying material for fairness audits.
- The same subjects support both individual longitudinal tracking and team relational analyses without requiring active participation.
Where Pith is reading between the lines
- The observed asymmetry in negative-state dwell times could be tested against external work-event logs to check whether deadlines or conflicts extend those regimes.
- Replicating the passive protocol in non-software workplaces would reveal whether the reported seasonality and regime structure generalize beyond one industry.
- The four-tier access model may encourage other teams to begin similar long-term collections while still protecting individual identities.
- Combining the variance ceiling with the turnover result suggests that future models should jointly optimize for both mood prediction and survival outcomes rather than classification accuracy alone.
Load-bearing premise
No prior published dataset has simultaneously satisfied long duration, naturalistic workplace context, stable small-team structure, and fully passive IRB-compliant sensing.
What would settle it
Any published dataset that contains at least two years of passive facial data from the same fixed workplace team of roughly fifty people and carries ethics approval.
Figures
read the original abstract
Affective computing has matured rapidly in laboratory settings, yet no prior dataset combines (i) months-to-years of duration, (ii) a naturalistic workplace context, (iii) a stable small-team social structure, and (iv) a fully passive sensing protocol that survives institutional review. We introduce WELD, the first dataset to satisfy all four. WELD comprises 733,780 per-frame seven-class facial-expression probability vectors from 49 employees of a Chinese software company over 30.1 months (Nov 2021 - May 2024) -- the longest naturalistic in-the-wild emotion corpus and the only multi-year corpus supporting both within-individual longitudinal and within-team relational analyses on the same subjects. Data are released under a four-tier access model with only aggregated probabilities publicly downloadable. We validate the corpus by replicating three established phenomena (+43.1% weekend valence boost; 13:00-trough diurnal cycle; Shanghai 2022 lockdown effect d=-0.40), and report four novel findings: (1) variance decomposition attributes 19.3% of daily-valence variance to between-person differences and 29.8% to month seasonality -- a quantitative ceiling for future predictive models; (2) Hidden Markov decomposition reveals six emotional regimes with asymmetric negative-state dwell times (16-18 d vs 3 d); (3) leave-one-person-out turnover prediction reaches AUC=0.79 yet a Cox concordance index of only 0.52, exposing a metric-trap when AUC is reported without survival-aware baselines; (4) the corpus reveals systematic over-prediction of "angry" by an off-the-shelf FER model on neutral Asian faces (0.194 vs ~0.05 Western priors), making WELD valuable for FER fairness audits. A complex-systems analysis of the corpus appears as a companion preprint (arXiv:2510.16046).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces WELD as the first dataset to combine (i) multi-month to multi-year duration, (ii) naturalistic workplace setting, (iii) stable small-team social structure, and (iv) fully passive IRB-compliant sensing. It reports 733,780 seven-class facial-expression probability vectors collected from 49 employees over 30.1 months, validates the corpus by replicating three established phenomena (weekend valence boost of +43.1%, 13:00 diurnal trough, Shanghai 2022 lockdown effect with d=-0.40), and presents four novel analyses: variance decomposition (19.3% between-person, 29.8% month seasonality), Hidden Markov Model emotional regimes with asymmetric dwell times, leave-one-person-out turnover prediction (AUC=0.79 but Cox index 0.52), and systematic over-prediction of 'angry' by an off-the-shelf FER model on neutral Asian faces.
Significance. If the uniqueness claim is secured, the dataset would be a valuable contribution to affective computing by supporting both within-individual longitudinal and within-team relational analyses on the same subjects over an unprecedented naturalistic timeframe. Strengths include the scale and duration, the four-tier access model with public aggregated probabilities, explicit replication of external phenomena, and the companion complex-systems preprint (arXiv:2510.16046).
major comments (2)
- [Related Work / Dataset Comparison] §Related Work / Dataset Comparison: the headline claim that no prior published corpus simultaneously satisfies the four listed criteria requires an explicit enumeration or comparison table that rules out every plausible candidate on all four axes at once. The abstract asserts uniqueness directly, but without a documented search protocol or exhaustive side-by-side evaluation the central positioning of the work remains unsecured.
- [Validation section] Validation section: full details of the replication analyses (exact data exclusion rules, error bars or confidence intervals on the +43.1% weekend boost and d=-0.40 lockdown effect, and the statistical tests used) are not visible in the provided abstract and must be expanded to confirm that the three established phenomena are reproduced with the reported precision.
minor comments (2)
- [Abstract] Abstract: the total frame count (733,780) and the four-tier access model are mentioned but could be stated more concisely to improve readability for dataset-focused readers.
- [Notation] Notation: ensure consistent capitalization and abbreviation of 'FER' and 'HMM' on first and subsequent uses.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Related Work / Dataset Comparison] the headline claim that no prior published corpus simultaneously satisfies the four listed criteria requires an explicit enumeration or comparison table that rules out every plausible candidate on all four axes at once. The abstract asserts uniqueness directly, but without a documented search protocol or exhaustive side-by-side evaluation the central positioning of the work remains unsecured.
Authors: We agree that an explicit side-by-side comparison would make the uniqueness claim more transparent. Our literature review (via targeted searches in ACM, IEEE, and Google Scholar using terms for naturalistic, longitudinal, workplace, and passive facial emotion datasets) found no prior corpus meeting all four criteria at once. To address the concern, we will add a comparison table in the revised Related Work section listing major existing datasets and the specific criteria each fails to satisfy, along with a concise description of our search approach. This will secure the positioning without altering the core claim. revision: yes
-
Referee: [Validation section] full details of the replication analyses (exact data exclusion rules, error bars or confidence intervals on the +43.1% weekend boost and d=-0.40 lockdown effect, and the statistical tests used) are not visible in the provided abstract and must be expanded to confirm that the three established phenomena are reproduced with the reported precision.
Authors: We appreciate the request for greater methodological transparency. While the full manuscript text contains descriptions of the replication procedures, we will expand the Validation section to include exact data exclusion criteria (e.g., minimum daily frames per participant), confidence intervals or standard errors for the +43.1% weekend boost and d=-0.40 lockdown effect, and the precise statistical tests (mixed-effects models and paired comparisons) used. These additions will confirm the reported precision and enhance reproducibility. revision: yes
Circularity Check
No circularity: dataset introduction and external replications are self-contained
full rationale
The paper presents WELD as a new corpus and supports its value through replication of three established external phenomena (weekend valence boost, diurnal cycle, lockdown effect) plus novel descriptive findings on variance decomposition and regime dwell times. No equations, fitted parameters renamed as predictions, or self-referential definitions appear in the abstract or described content. The 'first to satisfy all four criteria' statement is a literature-review claim about prior datasets rather than a derivation that reduces to the paper's own inputs by construction. The work is therefore self-contained against external benchmarks with no load-bearing self-citation chains or ansatz smuggling.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The facial expression recognition pipeline produces reliable seven-class probability vectors in a real workplace environment without active user participation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
WELD comprises 733,780 per-frame seven-class facial-expression probability vectors... replicating three established phenomena (+43.1% weekend valence boost; 13:00-trough diurnal cycle)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. W. Picard,Affective Computing. Cambridge, MA: MIT Press, 1997
work page 1997
-
[2]
A review of affective computing: From unimodal analysis to multimodal fusion,
S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: From unimodal analysis to multimodal fusion,”Information Fusion, vol. 37, pp. 98–125, 2017. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. XX, NO. X, MONTH YEAR 17
work page 2017
-
[3]
Deep facial expression recognition: A survey,
S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Trans. Affective Computing, vol. 13, no. 3, pp. 1195–1215, 2022
work page 2022
-
[4]
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” inProc. IEEE CVPR Workshops, 2010, pp. 94–101
work page 2010
-
[5]
Web-based database for facial expression analysis,
M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” inProc. IEEE ICME, 2005
work page 2005
-
[6]
Stress recognition using wearable sensors and mobile phones,
A. Sano and R. W. Picard, “Stress recognition using wearable sensors and mobile phones,” inProc. ACII, 2013, pp. 671–676
work page 2013
-
[7]
Why does affect matter in organiza- tions?
S. G. Barsade and D. E. Gibson, “Why does affect matter in organiza- tions?”Academy of Management Perspectives, vol. 21, no. 1, pp. 36–59, 2007
work page 2007
-
[8]
Organizational behavior: Affect in the workplace,
A. P. Brief and H. M. Weiss, “Organizational behavior: Affect in the workplace,”Annual Review of Psychology, vol. 53, pp. 279–307, 2002
work page 2002
-
[9]
Affect detection: An interdisciplinary re- view of models, methods, and their applications,
R. A. Calvo and S. D’Mello, “Affect detection: An interdisciplinary re- view of models, methods, and their applications,”IEEE Trans. Affective Computing, vol. 1, no. 1, pp. 18–37, 2010
work page 2010
-
[10]
Emotional regulation in the workplace: A new way to conceptualize emotional labor,
A. A. Grandey, “Emotional regulation in the workplace: A new way to conceptualize emotional labor,”J. Occupational Health Psychology, vol. 5, no. 1, pp. 95–110, 2000
work page 2000
-
[11]
The psychological impact of quarantine and how to reduce it: Rapid review of the evidence,
S. K. Brookset al., “The psychological impact of quarantine and how to reduce it: Rapid review of the evidence,”The Lancet, vol. 395, no. 10227, pp. 912–920, 2020
work page 2020
-
[12]
Challenges in representation learning: A report on three machine learning contests,
I. J. Goodfellowet al., “Challenges in representation learning: A report on three machine learning contests,”Neural Networks, vol. 64, pp. 59– 63, 2013
work page 2013
-
[13]
AffectNet: A database for facial expression, valence, and arousal computing in the wild,
A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Trans. Affective Computing, vol. 10, no. 1, pp. 18–31, 2019
work page 2019
-
[14]
S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality- preserving learning for expression recognition in the wild,” inProc. IEEE CVPR, 2017, pp. 2852–2861
work page 2017
-
[15]
Collecting large, richly annotated facial-expression databases from movies,
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Collecting large, richly annotated facial-expression databases from movies,”IEEE Multimedia, vol. 19, no. 3, pp. 34–41, 2012
work page 2012
-
[16]
DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,
H. Jiang, X. Wu, N. Guo, Y . Liu, and X. Xu, “DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,” in Proc. ACM MM, 2020, pp. 2881–2889
work page 2020
-
[17]
AMIGOS: A dataset for affect, personality and mood research on individuals and groups,
J. A. Miranda-Correa, M. K. Abadi, N. Sebe, and I. Patras, “AMIGOS: A dataset for affect, personality and mood research on individuals and groups,”IEEE Trans. Affective Computing, vol. 12, no. 2, pp. 479–493, 2021
work page 2021
-
[18]
A multimodal database for affect recognition and implicit tagging,
M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect recognition and implicit tagging,”IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 42–55, 2012
work page 2012
-
[19]
Belief and feeling: Evidence for an accessibility model of emotional self-report,
M. D. Robinson and G. L. Clore, “Belief and feeling: Evidence for an accessibility model of emotional self-report,”Psychological Bulletin, vol. 128, no. 6, pp. 934–960, 2002
work page 2002
-
[20]
Experience sampling: Promises and pitfalls, strengths and weaknesses,
C. N. Scollon, C. Kim-Prieto, and E. Diener, “Experience sampling: Promises and pitfalls, strengths and weaknesses,”J. Happiness Studies, vol. 4, no. 1, pp. 5–34, 2003
work page 2003
-
[21]
Prediction of happy-sad mood from daily behaviors and previous sleep history,
A. Sanoet al., “Prediction of happy-sad mood from daily behaviors and previous sleep history,” inProc. IEEE EMBC, 2015, pp. 6796–6799
work page 2015
-
[22]
MoodScope: Building a mood sensor from smartphone usage patterns,
R. LiKamWa, Y . Liu, N. D. Lane, and L. Zhong, “MoodScope: Building a mood sensor from smartphone usage patterns,” inProc. ACM MobiSys, 2013, pp. 389–402
work page 2013
-
[23]
G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder, “The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent,”IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 5–17, 2012
work page 2012
-
[24]
Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,
F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,” inProc. IEEE FG, 2013, pp. 1–8
work page 2013
-
[25]
Call center stress recognition with person-specific models,
J. Hernandez, D. J. McDuff, and R. W. Picard, “Call center stress recognition with person-specific models,” inProc. ACII, 2014, pp. 125– 134
work page 2014
-
[26]
Validation of the five-factor model of personality across instruments and observers,
R. R. McCrae and P. T. Costa, “Validation of the five-factor model of personality across instruments and observers,”J. Personality and Social Psychology, vol. 52, no. 1, pp. 81–90, 1987
work page 1987
-
[27]
J. A. Russell, “A circumplex model of affect,”J. Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178, 1980
work page 1980
-
[28]
Recovery from job stress: The stressor- detachment model as an integrative framework,
S. Sonnentag and C. Fritz, “Recovery from job stress: The stressor- detachment model as an integrative framework,”J. Organizational Behavior, vol. 36, no. S1, pp. S72–S103, 2015
work page 2015
-
[29]
Nature’s clocks and human mood: The circadian system modulates reward motivation,
G. Murrayet al., “Nature’s clocks and human mood: The circadian system modulates reward motivation,”Emotion, vol. 9, no. 5, pp. 705– 716, 2009. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. XX, NO. X, MONTH YEAR 18 APPENDIXA SUPPLEMENTARYMATERIAL: INDIVIDUALEMOTIONAL TRAJECTORIES To provide deeper insight into the longitudinal nature of our dataset, we p...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.