pith. sign in

arxiv: 2510.15221 · v2 · pith:6LFFQB5Knew · submitted 2025-10-17 · 💻 cs.AI · cs.CY· cs.LG

WELD: The First Naturalistic Long-Period Small-Team Workplace Emotion Dataset for Ubiquitous Affective Computing

Pith reviewed 2026-05-21 20:50 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.LG
keywords affective computingemotion datasetworkplacenaturalistic sensinglongitudinal analysisfacial expressionspassive monitoringteam dynamics
0
0 comments X

The pith

WELD is the first dataset to combine multi-year duration, naturalistic workplace setting, stable small team, and fully passive IRB-compliant sensing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that prior emotion datasets all fell short on at least one of four requirements: long time span, real workplace environment, fixed small team, and passive collection that passes ethics review. WELD meets all four at once with 733,780 frames from 49 employees across 30 months, allowing both individual mood trajectories and team-level interactions to be studied in the same data. Validation reproduces known patterns such as weekend valence increases and lockdown effects, while new analyses quantify variance sources, identify emotional regimes with long negative dwell times, and flag biases in existing recognition models. A reader would care because this scale and setting finally let affective computing move from short lab snapshots to sustained workplace observation.

Core claim

WELD comprises 733,780 per-frame seven-class facial-expression probability vectors from 49 employees of a Chinese software company over 30.1 months, the first corpus to satisfy the four simultaneous criteria of long naturalistic duration, workplace context, stable small-team structure, and fully passive sensing that survives institutional review, thereby supporting both within-individual longitudinal and within-team relational analyses on the same subjects.

What carries the argument

The WELD corpus itself, a multi-year collection of passive facial-expression probability vectors gathered from a fixed workplace team.

If this is right

  • Variance decomposition shows 19.3 percent of daily-valence variance is between-person and 29.8 percent is seasonal, establishing a ceiling for predictive models.
  • Hidden Markov modeling identifies six emotional regimes with negative-state dwell times of 16-18 days versus 3 days for other states.
  • Leave-one-person-out turnover prediction reaches AUC 0.79 but only 0.52 concordance index, showing that AUC alone misleads without survival-aware baselines.
  • An off-the-shelf FER model over-predicts angry expressions on neutral Asian faces (0.194 versus Western priors near 0.05), supplying material for fairness audits.
  • The same subjects support both individual longitudinal tracking and team relational analyses without requiring active participation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed asymmetry in negative-state dwell times could be tested against external work-event logs to check whether deadlines or conflicts extend those regimes.
  • Replicating the passive protocol in non-software workplaces would reveal whether the reported seasonality and regime structure generalize beyond one industry.
  • The four-tier access model may encourage other teams to begin similar long-term collections while still protecting individual identities.
  • Combining the variance ceiling with the turnover result suggests that future models should jointly optimize for both mood prediction and survival outcomes rather than classification accuracy alone.

Load-bearing premise

No prior published dataset has simultaneously satisfied long duration, naturalistic workplace context, stable small-team structure, and fully passive IRB-compliant sensing.

What would settle it

Any published dataset that contains at least two years of passive facial data from the same fixed workplace team of roughly fifty people and carries ethics approval.

Figures

Figures reproduced from arXiv: 2510.15221 by Xiao Sun.

Figure 1
Figure 1. Figure 1: Dataset Overview. (A) Temporal coverage illustrating data collection density over 30.5 months, with key COVID-19 events highlighted. (B) Distribution [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Extended Emotional Metrics. (A) Scatter plot of volatility versus predictability showing a negative correlation ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Emotion Patterns and Technical Validation. (A) Valence distribution exhibiting the negative skew typical of workplace stress. (B) Diurnal rhythm with [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Individual emotional trajectories for 12 representative participants over 30.5 months. Each line shows the daily average valence score (smoothed with [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
read the original abstract

Affective computing has matured rapidly in laboratory settings, yet no prior dataset combines (i) months-to-years of duration, (ii) a naturalistic workplace context, (iii) a stable small-team social structure, and (iv) a fully passive sensing protocol that survives institutional review. We introduce WELD, the first dataset to satisfy all four. WELD comprises 733,780 per-frame seven-class facial-expression probability vectors from 49 employees of a Chinese software company over 30.1 months (Nov 2021 - May 2024) -- the longest naturalistic in-the-wild emotion corpus and the only multi-year corpus supporting both within-individual longitudinal and within-team relational analyses on the same subjects. Data are released under a four-tier access model with only aggregated probabilities publicly downloadable. We validate the corpus by replicating three established phenomena (+43.1% weekend valence boost; 13:00-trough diurnal cycle; Shanghai 2022 lockdown effect d=-0.40), and report four novel findings: (1) variance decomposition attributes 19.3% of daily-valence variance to between-person differences and 29.8% to month seasonality -- a quantitative ceiling for future predictive models; (2) Hidden Markov decomposition reveals six emotional regimes with asymmetric negative-state dwell times (16-18 d vs 3 d); (3) leave-one-person-out turnover prediction reaches AUC=0.79 yet a Cox concordance index of only 0.52, exposing a metric-trap when AUC is reported without survival-aware baselines; (4) the corpus reveals systematic over-prediction of "angry" by an off-the-shelf FER model on neutral Asian faces (0.194 vs ~0.05 Western priors), making WELD valuable for FER fairness audits. A complex-systems analysis of the corpus appears as a companion preprint (arXiv:2510.16046).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces WELD as the first dataset to combine (i) multi-month to multi-year duration, (ii) naturalistic workplace setting, (iii) stable small-team social structure, and (iv) fully passive IRB-compliant sensing. It reports 733,780 seven-class facial-expression probability vectors collected from 49 employees over 30.1 months, validates the corpus by replicating three established phenomena (weekend valence boost of +43.1%, 13:00 diurnal trough, Shanghai 2022 lockdown effect with d=-0.40), and presents four novel analyses: variance decomposition (19.3% between-person, 29.8% month seasonality), Hidden Markov Model emotional regimes with asymmetric dwell times, leave-one-person-out turnover prediction (AUC=0.79 but Cox index 0.52), and systematic over-prediction of 'angry' by an off-the-shelf FER model on neutral Asian faces.

Significance. If the uniqueness claim is secured, the dataset would be a valuable contribution to affective computing by supporting both within-individual longitudinal and within-team relational analyses on the same subjects over an unprecedented naturalistic timeframe. Strengths include the scale and duration, the four-tier access model with public aggregated probabilities, explicit replication of external phenomena, and the companion complex-systems preprint (arXiv:2510.16046).

major comments (2)
  1. [Related Work / Dataset Comparison] §Related Work / Dataset Comparison: the headline claim that no prior published corpus simultaneously satisfies the four listed criteria requires an explicit enumeration or comparison table that rules out every plausible candidate on all four axes at once. The abstract asserts uniqueness directly, but without a documented search protocol or exhaustive side-by-side evaluation the central positioning of the work remains unsecured.
  2. [Validation section] Validation section: full details of the replication analyses (exact data exclusion rules, error bars or confidence intervals on the +43.1% weekend boost and d=-0.40 lockdown effect, and the statistical tests used) are not visible in the provided abstract and must be expanded to confirm that the three established phenomena are reproduced with the reported precision.
minor comments (2)
  1. [Abstract] Abstract: the total frame count (733,780) and the four-tier access model are mentioned but could be stated more concisely to improve readability for dataset-focused readers.
  2. [Notation] Notation: ensure consistent capitalization and abbreviation of 'FER' and 'HMM' on first and subsequent uses.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Related Work / Dataset Comparison] the headline claim that no prior published corpus simultaneously satisfies the four listed criteria requires an explicit enumeration or comparison table that rules out every plausible candidate on all four axes at once. The abstract asserts uniqueness directly, but without a documented search protocol or exhaustive side-by-side evaluation the central positioning of the work remains unsecured.

    Authors: We agree that an explicit side-by-side comparison would make the uniqueness claim more transparent. Our literature review (via targeted searches in ACM, IEEE, and Google Scholar using terms for naturalistic, longitudinal, workplace, and passive facial emotion datasets) found no prior corpus meeting all four criteria at once. To address the concern, we will add a comparison table in the revised Related Work section listing major existing datasets and the specific criteria each fails to satisfy, along with a concise description of our search approach. This will secure the positioning without altering the core claim. revision: yes

  2. Referee: [Validation section] full details of the replication analyses (exact data exclusion rules, error bars or confidence intervals on the +43.1% weekend boost and d=-0.40 lockdown effect, and the statistical tests used) are not visible in the provided abstract and must be expanded to confirm that the three established phenomena are reproduced with the reported precision.

    Authors: We appreciate the request for greater methodological transparency. While the full manuscript text contains descriptions of the replication procedures, we will expand the Validation section to include exact data exclusion criteria (e.g., minimum daily frames per participant), confidence intervals or standard errors for the +43.1% weekend boost and d=-0.40 lockdown effect, and the precise statistical tests (mixed-effects models and paired comparisons) used. These additions will confirm the reported precision and enhance reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset introduction and external replications are self-contained

full rationale

The paper presents WELD as a new corpus and supports its value through replication of three established external phenomena (weekend valence boost, diurnal cycle, lockdown effect) plus novel descriptive findings on variance decomposition and regime dwell times. No equations, fitted parameters renamed as predictions, or self-referential definitions appear in the abstract or described content. The 'first to satisfy all four criteria' statement is a literature-review claim about prior datasets rather than a derivation that reduces to the paper's own inputs by construction. The work is therefore self-contained against external benchmarks with no load-bearing self-citation chains or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the deployed sensing protocol is fully passive and IRB-compliant with no active participant input required, plus the empirical claim that no earlier corpus meets all four criteria.

axioms (1)
  • domain assumption The facial expression recognition pipeline produces reliable seven-class probability vectors in a real workplace environment without active user participation.
    Invoked to generate the 733,780 per-frame vectors described in the abstract.

pith-pipeline@v0.9.0 · 5883 in / 1243 out tokens · 42993 ms · 2026-05-21T20:50:56.067799+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    R. W. Picard,Affective Computing. Cambridge, MA: MIT Press, 1997

  2. [2]

    A review of affective computing: From unimodal analysis to multimodal fusion,

    S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: From unimodal analysis to multimodal fusion,”Information Fusion, vol. 37, pp. 98–125, 2017. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. XX, NO. X, MONTH YEAR 17

  3. [3]

    Deep facial expression recognition: A survey,

    S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Trans. Affective Computing, vol. 13, no. 3, pp. 1195–1215, 2022

  4. [4]

    The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,

    P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” inProc. IEEE CVPR Workshops, 2010, pp. 94–101

  5. [5]

    Web-based database for facial expression analysis,

    M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” inProc. IEEE ICME, 2005

  6. [6]

    Stress recognition using wearable sensors and mobile phones,

    A. Sano and R. W. Picard, “Stress recognition using wearable sensors and mobile phones,” inProc. ACII, 2013, pp. 671–676

  7. [7]

    Why does affect matter in organiza- tions?

    S. G. Barsade and D. E. Gibson, “Why does affect matter in organiza- tions?”Academy of Management Perspectives, vol. 21, no. 1, pp. 36–59, 2007

  8. [8]

    Organizational behavior: Affect in the workplace,

    A. P. Brief and H. M. Weiss, “Organizational behavior: Affect in the workplace,”Annual Review of Psychology, vol. 53, pp. 279–307, 2002

  9. [9]

    Affect detection: An interdisciplinary re- view of models, methods, and their applications,

    R. A. Calvo and S. D’Mello, “Affect detection: An interdisciplinary re- view of models, methods, and their applications,”IEEE Trans. Affective Computing, vol. 1, no. 1, pp. 18–37, 2010

  10. [10]

    Emotional regulation in the workplace: A new way to conceptualize emotional labor,

    A. A. Grandey, “Emotional regulation in the workplace: A new way to conceptualize emotional labor,”J. Occupational Health Psychology, vol. 5, no. 1, pp. 95–110, 2000

  11. [11]

    The psychological impact of quarantine and how to reduce it: Rapid review of the evidence,

    S. K. Brookset al., “The psychological impact of quarantine and how to reduce it: Rapid review of the evidence,”The Lancet, vol. 395, no. 10227, pp. 912–920, 2020

  12. [12]

    Challenges in representation learning: A report on three machine learning contests,

    I. J. Goodfellowet al., “Challenges in representation learning: A report on three machine learning contests,”Neural Networks, vol. 64, pp. 59– 63, 2013

  13. [13]

    AffectNet: A database for facial expression, valence, and arousal computing in the wild,

    A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Trans. Affective Computing, vol. 10, no. 1, pp. 18–31, 2019

  14. [14]

    Reliable crowdsourcing and deep locality- preserving learning for expression recognition in the wild,

    S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality- preserving learning for expression recognition in the wild,” inProc. IEEE CVPR, 2017, pp. 2852–2861

  15. [15]

    Collecting large, richly annotated facial-expression databases from movies,

    A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Collecting large, richly annotated facial-expression databases from movies,”IEEE Multimedia, vol. 19, no. 3, pp. 34–41, 2012

  16. [16]

    DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,

    H. Jiang, X. Wu, N. Guo, Y . Liu, and X. Xu, “DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,” in Proc. ACM MM, 2020, pp. 2881–2889

  17. [17]

    AMIGOS: A dataset for affect, personality and mood research on individuals and groups,

    J. A. Miranda-Correa, M. K. Abadi, N. Sebe, and I. Patras, “AMIGOS: A dataset for affect, personality and mood research on individuals and groups,”IEEE Trans. Affective Computing, vol. 12, no. 2, pp. 479–493, 2021

  18. [18]

    A multimodal database for affect recognition and implicit tagging,

    M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect recognition and implicit tagging,”IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 42–55, 2012

  19. [19]

    Belief and feeling: Evidence for an accessibility model of emotional self-report,

    M. D. Robinson and G. L. Clore, “Belief and feeling: Evidence for an accessibility model of emotional self-report,”Psychological Bulletin, vol. 128, no. 6, pp. 934–960, 2002

  20. [20]

    Experience sampling: Promises and pitfalls, strengths and weaknesses,

    C. N. Scollon, C. Kim-Prieto, and E. Diener, “Experience sampling: Promises and pitfalls, strengths and weaknesses,”J. Happiness Studies, vol. 4, no. 1, pp. 5–34, 2003

  21. [21]

    Prediction of happy-sad mood from daily behaviors and previous sleep history,

    A. Sanoet al., “Prediction of happy-sad mood from daily behaviors and previous sleep history,” inProc. IEEE EMBC, 2015, pp. 6796–6799

  22. [22]

    MoodScope: Building a mood sensor from smartphone usage patterns,

    R. LiKamWa, Y . Liu, N. D. Lane, and L. Zhong, “MoodScope: Building a mood sensor from smartphone usage patterns,” inProc. ACM MobiSys, 2013, pp. 389–402

  23. [23]

    The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent,

    G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder, “The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent,”IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 5–17, 2012

  24. [24]

    Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,

    F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,” inProc. IEEE FG, 2013, pp. 1–8

  25. [25]

    Call center stress recognition with person-specific models,

    J. Hernandez, D. J. McDuff, and R. W. Picard, “Call center stress recognition with person-specific models,” inProc. ACII, 2014, pp. 125– 134

  26. [26]

    Validation of the five-factor model of personality across instruments and observers,

    R. R. McCrae and P. T. Costa, “Validation of the five-factor model of personality across instruments and observers,”J. Personality and Social Psychology, vol. 52, no. 1, pp. 81–90, 1987

  27. [27]

    A circumplex model of affect,

    J. A. Russell, “A circumplex model of affect,”J. Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178, 1980

  28. [28]

    Recovery from job stress: The stressor- detachment model as an integrative framework,

    S. Sonnentag and C. Fritz, “Recovery from job stress: The stressor- detachment model as an integrative framework,”J. Organizational Behavior, vol. 36, no. S1, pp. S72–S103, 2015

  29. [29]

    Nature’s clocks and human mood: The circadian system modulates reward motivation,

    G. Murrayet al., “Nature’s clocks and human mood: The circadian system modulates reward motivation,”Emotion, vol. 9, no. 5, pp. 705– 716, 2009. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. XX, NO. X, MONTH YEAR 18 APPENDIXA SUPPLEMENTARYMATERIAL: INDIVIDUALEMOTIONAL TRAJECTORIES To provide deeper insight into the longitudinal nature of our dataset, we p...