pith. sign in

arxiv: 2605.19765 · v1 · pith:QQ46CAYRnew · submitted 2026-05-19 · 💻 cs.AI · cs.DB

GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Pith reviewed 2026-05-20 04:57 UTC · model grok-4.3

classification 💻 cs.AI cs.DB
keywords multimodal datasetgroup affectcollaborative interactionphysiologyeye trackingaffective computingsocial signalsfour-person groups
0
0 comments X

The pith

GroupAffect-4 supplies a synchronized multimodal dataset from ten four-person groups to study affect at individual, interpersonal, and collective levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents GroupAffect-4, a new corpus that records multimodal signals from ten groups of four people each as they work on four different collaborative tasks. The signals include wrist physiology, eye tracking, audio from close-talk mics, continuous affect ratings, personality scores, and task results, all synchronized to one clock. By collecting these in one place for co-located groups, the dataset aims to support research on how emotions and affect play out not just inside one person but between people and across the whole group. High data quality is reported with over 91 percent coverage for physiology and 98 percent for eye tracking, plus checks that the tasks actually change affect as intended. The release includes benchmarks for testing models on within-person states, between-person traits, and group dynamics.

Core claim

The authors create and release GroupAffect-4 as a multimodal dataset of 40 participants in 10 four-person groups completing four tasks: information pooling, negotiation, idea generation, and a public-goods game. Each participant is equipped with a wrist physiology sensor, eye-tracking glasses, and close-talk microphone, with all data time-aligned along with self-reports, questionnaires, task outcomes, and Big-Five personality scores. The dataset achieves high coverage rates and includes fifteen benchmark targets across three analysis levels with initial feasibility baselines.

What carries the argument

The GroupAffect-4 corpus, a synchronized collection of physiology, eye movement, audio, and report data from collaborative group sessions that enables joint analysis of affect at individual, interpersonal, and group scales.

If this is right

  • Affective computing models can now be evaluated on aligned signals that link personal states to interpersonal and group patterns.
  • The defined leave-one-group-out baselines provide a starting point for standardized tests of group dynamics prediction.
  • High coverage of physiology and eye-tracking windows supports extraction of continuous features across entire sessions.
  • Public structure with quality reports and processing scripts enables direct replication and extension by other teams.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This kind of aligned multi-level recording could support tools that monitor real-time team emotional climate during meetings.
  • Future comparisons with remote or virtual groups could test whether co-location changes the strength of interpersonal affect links.
  • Combining these recordings with existing meeting datasets might allow larger-scale studies of how group size influences affective coordination.

Load-bearing premise

The four selected collaborative tasks and the chosen sensor suite of wrist physiology monitors, eye-tracking glasses, and close-talk microphones produce recordings that reflect natural affective processes at multiple levels without major intrusion or distortion.

What would settle it

Absence of expected affective differences in self-reports during the negotiation task or data coverage falling below levels needed for reliable multi-level modeling would show that the dataset does not support its intended analyses of coupled group affect.

Figures

Figures reproduced from arXiv: 2605.19765 by Alice Modica, Andrew Burke Dittberner, Anna Obara, Daniel Barratt, Daniel Overholt, Fabricio Batista Narcizo, Jesper Bunsow Boldt, Karim Haddad, Meisam Jamshidi Seikavandi, Paolo Burelli, Shan Ahmed Shaffi, Tanya Ignatenko, Ted Vucurevich.

Figure 1
Figure 1. Figure 1: GroupAffect-4 study design and release structure. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Per-benchmark ranked feature importance: top-15 features by mean normalised [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Valence, arousal, and dominance probe summaries by task. T2 negotiation produces the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Participant demographics (age, sex, education) and group composition across 40 participants [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: T1 (Hidden-Profile Decision) big-screen layout. The shared display shows the task brief, [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: T2 (Mini-Negotiation) big-screen layout. The shared display shows the topic and format [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: T3 (Idea Generation and Selection) big-screen layout. The shared display shows the [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: T4 (Public-Goods Micro-Game) big-screen layout. The shared display reveals individual [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Participant tablet view of a VAD affect probe during Task 1. Valence, Arousal, and Dominance are shown simultaneously on a 1–9 Likert scale. Post-Block Questionnaires After each task, participants completed a short individual questionnaire on their tablets ( [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Participant tablet view of the post-block questionnaire (T4 shown, 7 items shown). All [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Task Outcomes Dashboard (T1–T4). Panel 1: Hidden-profile decision outcome (T1). All 10 groups selected Candidate C. Panel 2: Mini-negotiation outcome (T2). Topics and formats are colour-differentiated. Most common: “AI for productivity” (5 groups). Panel 3: Idea Generation outcome (T3). Winning ideas are grouped by themes. Panel 4: Public-Goods Contribution (T4). Per-group mean contributions (0–10 scale) … view at source ↗
Figure 12
Figure 12. Figure 12: Time-to-decision durations (T1–T3). Boxplots show discussion durations from onset to the moderator “finish” prompt for T1–T3. The dashed horizontal line marks the nominal 8- minute guideline; groups typically overran this duration, especially in T2. T4 is excluded because its discussion phase did not terminate in a single group decision. Durations were derived directly from events_grp-XX.tsv files (one pe… view at source ↗
Figure 13
Figure 13. Figure 13: Task-level physiological feature effect sizes (Cohen’s [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Cross-modal Spearman correlation matrix (physiological and audio features vs. [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Modality ablation heatmap. Colour encodes performance relative to chance (white = [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Per-benchmark ranked feature importance: top-15 features by mean normalised [PITH_FULL_IMAGE:figures/full_fig_p030_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Feasibility baselines for the 31-feature set (biomarker composites and annotation process [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Cross-benchmark feature importance heatmap. Rows are all 31 sensor/behavioural features [PITH_FULL_IMAGE:figures/full_fig_p037_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: LOGO-CV per-fold performance strip plot across all benchmarks. Each dot is one fold; [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗
read the original abstract

Existing affective-computing, social-signal-processing, and meeting corpora capture important parts of human interaction, but they rarely support analysis of affect in co-located groups as a coupled individual, interpersonal, and group-level process. The required signals (per-participant physiology, eye movement, audio, self-report, task outcomes, and personality) are usually fragmented across separate dataset traditions. We introduce GroupAffect-4, a multimodal corpus of 40 participants in 10 four-person groups, each completing four ecologically varied collaborative tasks spanning information pooling, negotiation, idea generation, and a public-goods game. Each participant is instrumented with a wrist-worn physiology sensor, eye-tracking glasses, and a close-talk microphone; sessions include continuous affect self-reports, post-task questionnaires, task outcomes, and Big-Five personality scores, all time-aligned to a shared clock. The dataset covers over 91% of expected physiology windows and 98% of eye-tracking windows, with strong task validity confirmed by a clear affective manipulation check across the negotiation block. We define fifteen benchmarkable targets spanning three analysis levels -- within-person state, between-person traits, and group dynamics -- and report leave-one-group-out feasibility baselines establishing the dataset's evaluative scope. GroupAffect-4 is released with a BIDS-inspired structure, Croissant metadata, a datasheet, per-session quality reports, and open processing scripts. Code and processing scripts are available at https://github.com/meisamjam/GroupAffect-4; the dataset is publicly archived at https://zenodo.org/records/20037847.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces GroupAffect-4, a multimodal dataset of 40 participants in 10 four-person groups completing four collaborative tasks (information pooling, negotiation, idea generation, public-goods game). Participants are instrumented with wrist physiology sensors, eye-tracking glasses, and close-talk microphones; data include continuous affect self-reports, post-task questionnaires, task outcomes, and Big-Five personality scores, all time-aligned. The dataset reports >91% physiology and 98% eye-tracking coverage, a manipulation check for affective validity in negotiation, 15 benchmark targets across within-person, between-person, and group levels, and leave-one-group-out feasibility baselines. It is released with BIDS-inspired structure, Croissant metadata, datasheet, quality reports, and open processing scripts.

Significance. If the coverage, alignment, and validity claims hold, the dataset fills a notable gap by providing time-synchronized multimodal signals for studying affect as a coupled individual-interpersonal-group process in co-located settings. The open release with standardized metadata, per-session quality reports, and reproducible scripts strengthens its utility for the community. The leave-one-group-out baselines establish a concrete evaluative scope without introducing new fitted parameters.

major comments (2)
  1. Abstract: The central claim that the recordings capture ecologically valid affect at individual, interpersonal, and group levels rests on the untested premise that the chosen sensor suite (eye-tracking glasses + wrist physiology + close-talk mic) is minimally intrusive. No quantitative evidence (comfort ratings, behavioral reactivity metrics, or uninstrumented control comparisons) is reported despite mention of post-task questionnaires, leaving the ecological-validity foundation unsupported.
  2. Abstract and methods description: Participant selection criteria, exact baseline implementations for the 15 benchmark targets, and any post-collection data exclusions are not detailed. These omissions directly affect reproducibility of the reported 91% physiology and 98% eye-tracking coverage figures and the leave-one-group-out feasibility results.
minor comments (2)
  1. Abstract: The phrase 'strong task validity confirmed by a clear affective manipulation check' would benefit from a brief parenthetical note on the specific measure (e.g., self-report scale or statistical test) used in the negotiation block.
  2. Release description: The GitHub and Zenodo links are helpful; adding a short table summarizing per-task sensor coverage statistics would improve immediate usability for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation of GroupAffect-4's contribution and for the constructive comments on ecological validity and reproducibility. We address each major comment below and will incorporate the suggested clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: Abstract: The central claim that the recordings capture ecologically valid affect at individual, interpersonal, and group levels rests on the untested premise that the chosen sensor suite (eye-tracking glasses + wrist physiology + close-talk mic) is minimally intrusive. No quantitative evidence (comfort ratings, behavioral reactivity metrics, or uninstrumented control comparisons) is reported despite mention of post-task questionnaires, leaving the ecological-validity foundation unsupported.

    Authors: We agree that explicit quantitative support for minimal intrusiveness would strengthen the ecological-validity claim. Although post-task questionnaires were administered and contain relevant items, comfort and reactivity metrics were not analyzed or reported in the submitted version. In the revision we will add a short subsection (or appendix table) presenting mean comfort ratings, any self-reported interference, and observed behavioral reactivity indicators drawn directly from those questionnaires. This addition will provide the requested quantitative grounding without requiring new data collection. revision: yes

  2. Referee: Abstract and methods description: Participant selection criteria, exact baseline implementations for the 15 benchmark targets, and any post-collection data exclusions are not detailed. These omissions directly affect reproducibility of the reported 91% physiology and 98% eye-tracking coverage figures and the leave-one-group-out feasibility results.

    Authors: We concur that these methodological details are necessary for full reproducibility. The revised manuscript will expand the Participants and Benchmark Targets subsections to specify: (i) inclusion/exclusion criteria and recruitment procedures, (ii) precise algorithmic descriptions and any hyper-parameters used for each of the 15 benchmark targets, and (iii) the exact post-collection exclusion rules together with the number of sessions or segments removed. These additions will allow readers to replicate the coverage statistics and leave-one-group-out baselines exactly. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive dataset paper with no derivations or fitted predictions

full rationale

This is a dataset introduction paper whose central claims consist of describing the collection protocol, reporting coverage statistics (91% physiology, 98% eye-tracking), confirming a manipulation check, and releasing benchmark targets with leave-one-group-out baselines. No equations, first-principles derivations, parameter fits, or predictions are presented that could reduce to their own inputs. The fifteen benchmarkable targets are defined explicitly rather than derived; the feasibility baselines are reported as evaluative scope rather than claimed as novel predictions. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the provided text. The paper is therefore self-contained as a descriptive corpus release.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the selected tasks and sensors yield representative group-affect data; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The four chosen tasks are ecologically valid representations of collaborative interaction.
    Invoked when the abstract states the tasks span information pooling, negotiation, idea generation, and a public-goods game.

pith-pipeline@v0.9.0 · 5882 in / 1416 out tokens · 65506 ms · 2026-05-20T04:57:52.297621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

  1. [1]

    Croissant: A metadata format for ML-ready datasets

    Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Joan Giner-Miguelez, Pieter Gijsbers, Sujata Goswami, Nitisha Jain, Michalis Karamousadakis, Michael Kuchnik, et al. Croissant: A metadata format for ML-ready datasets. InAdvances in Neural Information Processing Systems, 2024. Also available as arXiv:2403.19546

  2. [2]

    Sigal G. Barsade. The ripple effect: Emotional contagion and its influence on group behavior. Administrative Science Quarterly, 47(4):644–675, 2002

  3. [3]

    Indrani Bhattacharya, Daniel Foley, Nicholas Zhang, Tong Zhang, Christopher Mine, Qi Ji, and Richard J. Radke. UGI: An unobtrusive group interaction dataset. InProceedings of the 10th ACM Multimedia Systems Conference, 2019

  4. [4]

    G-REx: A real-world dataset of group emotion experiences based on physiological data

    Patricia Bota, Joana Brito, Ana Fred, Pablo Cesar, and Hugo Plácido Silva. G-REx: A real-world dataset of group emotion experiences based on physiological data. 2023

  5. [5]

    Bradley and Peter J

    Margaret M. Bradley and Peter J. Lang. Measuring emotion: The self-assessment manikin and the semantic differential.Journal of Behavior Therapy and Experimental Psychiatry, 25(1):49–59, 1994

  6. [6]

    Chang, Sungbok Lee, and Shrikanth S

    Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. IEMOCAP: Interactive emotional dyadic motion capture database.Language Resources and Evaluation, 42(4):335– 359, 2008

  7. [7]

    Gebre, and Hayley Hung

    Laura Cabrera-Quiros, Andrew Demetriou, Egor Balog, Astrid van der Meijden, Esma Gedik, Binyam G. Gebre, and Hayley Hung. The MatchNMingle dataset: A novel multi-sensor resource for the analysis of social interactions and nonverbal communication in unstructured mingle and speed-dating scenarios.IEEE Transactions on Affective Computing, 12(1):148–164, 2021

  8. [8]

    The AMI meeting corpus: A pre-announcement

    Jean Carletta, Simone Ashby, Sebastien Bourban, Mike Flynn, Mael Guillemot, Thomas Hain, Jaroslav Kadlec, Vasilis Karaiskos, Wessel Kraaij, Melissa Kronenthal, Guillaume Lathoud, Mike Lincoln, Agnes Lisowska, Iain McCowan, Wilfried Post, Dennis Reidsma, and Pierre Wellner. The AMI meeting corpus: A pre-announcement. InProceedings of the Second Inter- nati...

  9. [9]

    Cawley and Nicola L

    Gavin C. Cawley and Nicola L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation.Journal of Machine Learning Research, 11:2079–2107, 2010

  10. [10]

    Sustaining cooperation in laboratory public goods experiments: A selective survey of the literature.Experimental Economics, 14(1):47–83, 2011

    Ananish Chaudhuri. Sustaining cooperation in laboratory public goods experiments: A selective survey of the literature.Experimental Economics, 14(1):47–83, 2011

  11. [11]

    The gamma corpus of danish polyadic conversations with gaze speech and motion data in quiet and noise.Scientific Data, 2026

    Mark Dourado, Henrik Gert Hassager, Jesper Udesen, and Stefania Serafin. The gamma corpus of danish polyadic conversations with gaze speech and motion data in quiet and noise.Scientific Data, 2026

  12. [12]

    Cooperation and punishment in public goods experiments

    Ernst Fehr and Simon Gächter. Cooperation and punishment in public goods experiments. American Economic Review, 90(4):980–994, 2000

  13. [13]

    Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daume III, and Kate Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021

  14. [14]

    Larentzakis, Nader N

    Kyriakos Georgiou, Andreas V . Larentzakis, Nader N. Khamis, Ghadah I. Alsuhaibani, Yasser A. Alaska, and Elias J. Giallafos. Can wearable devices accurately measure heart rate variability? a systematic review.Folia Medica, 60(1):7–20, 2018

  15. [15]

    Academic Press, 1981

    Charles Goodwin.Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, 1981. 10

  16. [16]

    Gorgolewski, Tibor Auer, Vince D

    Krzysztof J. Gorgolewski, Tibor Auer, Vince D. Calhoun, R. Cameron Craddock, Samir Das, Eugene P. Duff, Guillaume Flandin, Satrajit S. Ghosh, Tristan Glatard, Yaroslav O. Halchenko, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific Data, 3:160044, 2016

  17. [17]

    The ICSI meeting corpus

    Adam Janin, Don Baron, Jane Edwards, Dan Ellis, David Gelbart, Nelson Morgan, Barbara Peskin, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, and Chuck Wooters. The ICSI meeting corpus. InProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003

  18. [18]

    O. P. John and S. Srivastava. The big five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin and O. P. John, editors,Handbook of personality: Theory and research, pages 102–138. Guilford Press, 2nd edition, 1999

  19. [19]

    DEAP: A database for emotion analysis using physiological signals.IEEE Transactions on Affective Computing, 3(1):18–31, 2012

    Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. DEAP: A database for emotion analysis using physiological signals.IEEE Transactions on Affective Computing, 3(1):18–31, 2012

  20. [20]

    Grivich, Fiorenzo Artoni, Tim Mullen, Arnaud Delorme, and Scott Makeig

    Christian Kothe, Seyed Yahya Shirazi, Tristan Stenner, David Medine, Chadwick Boulay, Matthew I. Grivich, Fiorenzo Artoni, Tim Mullen, Arnaud Delorme, and Scott Makeig. The lab streaming layer for synchronized multimodal recording.Imaging Neuroscience, 3:IMAG.a.136, 2025

  21. [21]

    Lazarus.Emotion and Adaptation

    Richard S. Lazarus.Emotion and Adaptation. Oxford University Press, 1991

  22. [22]

    Connie Yuan, and Poppy Lauretta McLeod

    Li Lu, Y . Connie Yuan, and Poppy Lauretta McLeod. Twenty-five years of hidden profiles in group decision making: A meta-analysis.Personality and Social Psychology Review, 16(1):54– 75, 2012

  23. [23]

    Marks, John E

    Michelle A. Marks, John E. Mathieu, and Stephen J. Zaccaro. A temporally based framework and taxonomy of team processes.Academy of Management Review, 26(3):356–376, 2001

  24. [24]

    Pupillometry: Psychology, physiology, and function.Journal of Cognition, 1(1):16, 2018

    Sebastiaan Mathot. Pupillometry: Psychology, physiology, and function.Journal of Cognition, 1(1):16, 2018

  25. [25]

    Russell.An Approach to Environmental Psychology

    Albert Mehrabian and James A. Russell.An Approach to Environmental Psychology. MIT Press, 1974

  26. [26]

    A multimodal experimental dataset on agile software development team interactions.Data in Brief, 61:111828, 2025

    Diego Miranda, Carlos Escobedo, Dayana Palma, Rene Noel, Adrián Fernández, Cristian Cechinel, Jaime Godoy, and Roberto Munoz. A multimodal experimental dataset on agile software development team interactions.Data in Brief, 61:111828, 2025

  27. [27]

    AMI- GOS: A dataset for affect, personality and mood research on individuals and groups.IEEE Transactions on Affective Computing, 12(2):479–493, 2021

    Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. AMI- GOS: A dataset for affect, personality and mood research on individuals and groups.IEEE Transactions on Affective Computing, 12(2):479–493, 2021

  28. [28]

    Model cards for model reporting

    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchin- son, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. Model cards for model reporting. InProceedings of the Conference on Fairness, Accountability, and Transparency, pages 220– 229, 2019

  29. [29]

    Croissant format specification, version 1.0

    MLCommons. Croissant format specification, version 1.0. https://docs.mlcommons.org/ croissant/docs/croissant-spec.html, 2024. Published 2024-03-01; accessed 2026-05- 03

  30. [30]

    Detecting low rapport during natural interactions in small groups from non-verbal behaviour

    Philipp Müller, Michael Xuelin Huang, and Andreas Bulling. Detecting low rapport during natural interactions in small groups from non-verbal behaviour. pages 153–164, 2018

  31. [31]

    Preserving privacy in speaker and speech characterisation.Computer Speech & Language, 58:441–480, 2019

    Andreas Nautsch, Andrés Jiménez, Alexander Treiber, Jan Kolberg, Catherine Jasserand, Els Kindt, Héctor Delgado, Massimiliano Todisco, Pierre Héroux, Nicholas Evans, et al. Preserving privacy in speaker and speech characterisation.Computer Speech & Language, 58:441–480, 2019. 11

  32. [32]

    NeurIPS 2026 evaluations & datasets hosting guidelines

    NeurIPS. NeurIPS 2026 evaluations & datasets hosting guidelines. https://neurips.cc/ Conferences/2026/EvaluationsDatasetsHosting, 2026. Accessed 2026-05-01

  33. [33]

    NeurIPS 2026 evaluations & datasets track — call for papers

    NeurIPS. NeurIPS 2026 evaluations & datasets track — call for papers. https://neurips. cc/Conferences/2026/CallForEvaluationsDatasets, 2026. Accessed 2026-05-01

  34. [34]

    Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, Albert Clapés, Alba Moseguí, Zejian Zhang, David Gallardo, Georgina Guilera, David Leiva, Hugo Jair Escalante, Isabelle Guyon, Xavier Baró, and Sergio Escalera. Context-aware personality inference in dyadic scenarios: Introducing the UDIV A dataset. InProceedings of the IEEE/CVF...

  35. [35]

    Khandoker, Leontios J

    Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan H. Khandoker, Leontios J. Hadjileontiadis, Alice Oh, Youn-Byoung Jeong, and Uichin Lee. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations.Scientific Data, 7(1):293, 2020

  36. [36]

    Emotions are social.British Journal of Psychology, 87(4):663–683, 1996

    Brian Parkinson. Emotions are social.British Journal of Psychology, 87(4):663–683, 1996

  37. [37]

    Posada-Quintero and Ki H

    Hugo F. Posada-Quintero and Ki H. Chon. Innovations in electrodermal activity data collection and signal processing: A systematic review.Sensors, 20(2):479, 2020

  38. [38]

    Rietzschel, Bernard A

    Eric F. Rietzschel, Bernard A. Nijstad, and Wolfgang Stroebe. The selection of creative ideas after individual idea generation: Choosing between creativity and impact.British Journal of Psychology, 101(1):47–68, 2010

  39. [39]

    Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions

    Fabien Ringeval, Andreas Sonderegger, Jürgen Sauer, and Denis Lalanne. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. InProceedings of the IEEE International Conference on Automatic Face and Gesture Recognition Workshops, 2013

  40. [40]

    James A. Russell. A circumplex model of affect.Journal of Personality and Social Psychology, 39(6):1161–1178, 1980

  41. [41]

    Schegloff, and Gail Jefferson

    Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. A simplest systematics for the organization of turn-taking for conversation.Language, 50(4):696–735, 1974

  42. [42]

    A multimodal corpus for the study of small group interactions

    Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. A multimodal corpus for the study of small group interactions. InProceedings of the International Conference on Multimodal Interaction Workshops, 2011

  43. [43]

    Klaus R. Scherer. Appraisal considered as a process of multilevel sequential checking. In Klaus R. Scherer, Angela Schorr, and Tom Johnstone, editors,Appraisal Processes in Emotion: Theory, Methods, Research, pages 92–120. Oxford University Press, 2001

  44. [44]

    Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge.Speech Communication, 53(9–10):1062–1087, 2011

    Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge.Speech Communication, 53(9–10):1062–1087, 2011

  45. [45]

    Advancing face-to-face emotion communication: A multimodal dataset (affec)

    Meisam J. Seikavandi, Laurits Dixen, Jostein Fimland, Sree Keerthi Desu, Antonia-Bianca Zserai, Ye Sul Lee, Maria Barrett, and Paolo Burelli. Advancing face-to-face emotion commu- nication: A multimodal dataset (AFFEC).arXiv preprint arXiv:2504.18969, 2025

  46. [46]

    Seikavandi, Jostein Fimland, Fabricio Batista Narcizo, Maria Barrett, Ted Vucurevich, Jesper Bünsow Boldt, Andrew Burke Dittberner, and Paolo Burelli

    Meisam J. Seikavandi, Jostein Fimland, Fabricio Batista Narcizo, Maria Barrett, Ted Vucurevich, Jesper Bünsow Boldt, Andrew Burke Dittberner, and Paolo Burelli. Modelling the interplay of eye-tracking temporal dynamics and personality for emotion detection in face-to-face settings. arXiv preprint arXiv:2510.24720, 2025

  47. [47]

    Gaze reveals emotion perception: Insights from modelling naturalistic face viewing

    Meisam Jamshidi Seikavandi and Maria Jung Barrett. Gaze reveals emotion perception: Insights from modelling naturalistic face viewing. InProceedings of the 22nd IEEE International Conference on Machine Learning and Applications (ICMLA), pages 2022–2025. IEEE, 2023. 12

  48. [48]

    MuMTAffect: A multimodal multitask affective framework for personality and emotion recognition from physiological signals

    Meisam Jamshidi Seikavandi, Fabricio Batista Narcizo, Ted Vucurevich, Andrew Burke Dit- tberner, and Paolo Burelli. MuMTAffect: A multimodal multitask affective framework for personality and emotion recognition from physiological signals. InProceedings of the 3rd International Workshop on Multimodal and Responsible Affective Computing, pages 100–108, 2025

  49. [49]

    Fred Shaffer and J. P. Ginsberg. An overview of heart rate variability metrics and norms. Frontiers in Public Health, 5:258, 2017

  50. [50]

    Pooling of unshared information in group decision making: Biased information sampling during discussion.Journal of Personality and Social Psychology, 48(6):1467–1478, 1985

    Garold Stasser and William Titus. Pooling of unshared information in group decision making: Biased information sampling during discussion.Journal of Personality and Social Psychology, 48(6):1467–1478, 1985

  51. [51]

    Heart rate variability: Standards of measurement, physiological interpretation, and clinical use.Circulation, 93(5):1043–1065, 1996

    Task Force of the European Society of Cardiology and the North American Society of Pac- ing and Electrophysiology. Heart rate variability: Standards of measurement, physiological interpretation, and clinical use.Circulation, 93(5):1043–1065, 1996

  52. [52]

    Sara Taylor, Natasha Jaques, Weixuan Chen, Szymon Fedor, Akane Sano, and Rosalind W. Picard. Automatic identification of artifacts in electrodermal activity data. InProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 1934–1937, 2015

  53. [53]

    The V oicePrivacy 2020 challenge: Results and findings

    Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O’Brien, et al. The V oicePrivacy 2020 challenge: Results and findings. InProceedings of Interspeech, pages 1399–1403, 2021

  54. [54]

    Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tib- shirani, David Botstein, and Russ B. Altman. Missing value estimation methods for DNA microarrays.Bioinformatics, 17(6):520–525, 2001

  55. [55]

    Van Kleef

    Gerben A. Van Kleef. How emotions regulate social life: The emotions as social information (EASI) model.Current Directions in Psychological Science, 18(3):184–188, 2009

  56. [56]

    Van Kleef, Carsten K

    Gerben A. Van Kleef, Carsten K. W. De Dreu, and Antony S. R. Manstead. The interpersonal effects of emotions in negotiations: A motivated information processing approach.Journal of Personality and Social Psychology, 87(4):510–528, 2004

  57. [57]

    Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics, 7:91, 2006

    Sudhir Varma and Richard Simon. Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics, 7:91, 2006

  58. [58]

    Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes

    Roel Vertegaal, Robert Slagter, Gerrit van der Veer, and Anton Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 301–308, 2001

  59. [59]

    Social signal processing: Survey of an emerging domain.Image and Vision Computing, 27(12):1743–1759, 2009

    Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. Social signal processing: Survey of an emerging domain.Image and Vision Computing, 27(12):1743–1759, 2009

  60. [60]

    Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, et al

    Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, et al. The FAIR guiding principles for scientific data management and stewardship.Scientific Data, 3:160018, 2016

  61. [61]

    Roisman, and Thomas S

    Zhihong Zeng, Maja Pantic, Glenn I. Roisman, and Thomas S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58, 2009

  62. [62]

    GAP corpus: Group affect and performance corpus

    Justine Zhang, Ravi Kumar, and Cristian Danescu-Niculescu-Mizil. GAP corpus: Group affect and performance corpus. https://convokit.cornell.edu/documentation/gap.html,

  63. [63]

    big screen

    Dataset documentation. 13 Appendix Table of Contents AList of Acronyms BStimuli and Task Orchestration CExtended Limitations and Caveats DBFI-44 Scoring and Item List EAudio T0 Baseline Reliability FSynchronisation Pipeline Detail GPreprocessing Steps HExtended Dataset Characterization IExtended Benchmarks: Sequential Conversation Tasks JBenchmark Interpr...

  64. [64]

    No directional label bias, but within-person effect sizes are inflated

    Within-person z-score (affects B0–B3d).Normalisation statistics (median, median absolute deviation ( MAD)) are computed over all four task rows per participant before the LOGO-CV split, so held-out participants contribute their own unsupervised T1–T4 distribution to test-time normalisation. No directional label bias, but within-person effect sizes are inf...

  65. [65]

    Cannot favour any particular label direction; impact on AUC is negligible, but represents a strict-protocol deviation

    Global feature selection.Missing-rate and correlation-based filtering is applied to the full dataset before splitting. Cannot favour any particular label direction; impact on AUC is negligible, but represents a strict-protocol deviation. 22 Table 13: Retained feature set after global feature selection (35 features total; 31 used in all reported benchmarks...

  66. [66]

    never met

    Annotation process-metadata features.The four annotation features ( answers_n, ann_total_events_n, ann_response_postblock_n, ann_event_span_s) are excluded from the 31-feature benchmark set. ann_event_span_s and answers_n vary systematically by task (T2 has more V ADprobes and a longer anno- tation span), creating a direct shortcut for the B0 task-classif...