pith. sign in

arxiv: 2505.19328 · v7 · submitted 2025-05-25 · 💻 cs.CV · cs.LG

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

Pith reviewed 2026-05-19 13:04 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords ambivalencehesitancyvideo datasetdigital healthbehaviour changemultimodal recognitionmachine learningannotation
0
0 comments X

The pith

This paper introduces the BAH dataset of 1,427 annotated videos to train machine learning models that detect ambivalence and hesitancy during digital health interventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fill the absence of training resources for automatic detection of ambivalence and hesitancy, the conflicting states that commonly block health behavior change. It presents the BAH dataset built from 300 participants across Canada who answered targeted questions while being recorded on video. The collection supplies expert timestamps marking A/H occurrences, frame-level and video-level cue labels, transcripts, aligned faces, and participant details. Baseline experiments on the data reveal that standard models achieve only limited accuracy, pointing to the need for better multimodal and time-aware techniques. Readers would care because reliable automatic recognition could let online interventions respond in real time to user hesitation without constant human monitoring.

Core claim

The central claim is that the Behavioural Ambivalence/Hesitancy (BAH) dataset, comprising 1,427 videos totaling 10.60 hours from 300 participants responding to predefined elicitation questions, supplies the multimodal video material and expert annotations required to develop machine learning models for A/H recognition in digital behaviour change settings. The dataset supplies binary presence/absence labels, cue annotations at frame and video level, transcripts, cropped faces, and metadata. Reported baseline results on frame- and video-level tasks show modest performance and thereby indicate that existing approaches must be adapted for this subtle, cross-modal phenomenon.

What carries the argument

The BAH dataset itself, which records participants answering questions designed to provoke ambivalence or hesitancy and supplies expert-provided timestamps and labels for A/H presence and cues across video, audio, and text modalities.

If this is right

  • Researchers can now train and benchmark multimodal models specifically for ambivalence and hesitancy recognition using a public resource.
  • Digital health platforms could incorporate real-time A/H detection to adjust messaging or support when users show hesitation.
  • Standardized evaluation of spatio-temporal and cross-modal architectures becomes possible for this class of subtle emotional states.
  • The binary A/H label scheme offers a practical starting point for deployment even though ambivalence and hesitancy are closely related.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integration of BAH-trained models into smartphone apps might increase completion rates for behavior change programs by catching hesitation early.
  • The dataset's Canadian participant base leaves open the question of whether similar elicitation and annotation protocols would work across other cultural groups.
  • Future work could test whether adding physiological signals from wearables improves recognition accuracy beyond the current video-plus-transcript setup.

Load-bearing premise

The predefined questions produce genuine ambivalence or hesitancy that experts can annotate consistently and that the resulting labels will transfer to real digital interventions without major domain shift.

What would settle it

Demonstrating low inter-annotator agreement among experts on the A/H labels, or showing that models trained on BAH perform no better than chance when tested on videos collected from actual deployed digital behaviour change programs.

Figures

Figures reproduced from arXiv: 2505.19328 by Alessandro Lameiras Koerich, Eric Granger, Manuela Gonz\'alez-Gonz\'alez, Marco Pedersoli, Masoumeh Sharafi, Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Simon L Bacon, Soufiane Belharbi.

Figure 1
Figure 1. Figure 1: Examples of body language cues used by annotators to identify the occurrence of A/H: “looking away,” and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: BAH dataset collection and annotation procedure. First, participants access our web platform. They undergo an initial test/calibration to ensure the quality of the data. An avatar guides them throughout the entire process. Seven questions are presented to each participant. They are recorded while answering them. Once the data is captured, it is transferred by the Administrator to our local server. It is th… view at source ↗
Figure 3
Figure 3. Figure 3: Multimodal model used to produce baseline perfor [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows the file structure of the shared BAH dataset. The file “BAH_dataset_documentation.pdf” contains the de￾tailed documentation about all files/directory, including anno￾tation structure [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples taken from the platform to present our [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Video duration (a), and videos/participant (b), question distribution (c), and sex distribution (d) over BAH dataset. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of A/H segment duration in seconds ( [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Participants’ age distribution in BAH dataset. (a) Participants’ age range distribution. (b) Distribution of Canada provinces where participants live [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Participants’ age range (a), and where the provinces where they live (b) over BAH dataset. Name of provinces: ’Manitoba (MB)’, ’Alberta (AB)’, ’Nova Scotia (NS)’, ’Newfoundland and Labrador (NL)’, ’Saskatchewan (SK)’, ’New Brunswick (NB)’, ’Ontario (ON)’, ’Quebec (QC)’, ’British Columbia (BC)’. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of participants’ simplified ethnicity ( [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: A data card styled (nutrition label) for [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Participants age and age-range distribution across all splits (train, validation, and test) in BAH dataset. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Participants sex and simplified ethnicity distribution across all splits (train, validation, and test) in BAH dataset. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Participants provinces distribution across all splits (train, validation, and test) in BAH dataset. G Analysis of Annotators Cues Figure 15a shows the distribution of modalities used by anno￾tators to assess the presence of A/H. Overall, the four modal￾ities contribution almost at the same rate in detecting A/H. Facial cues lead, followed by language, then body and audio. This suggests that the four modal… view at source ↗
Figure 15
Figure 15. Figure 15: Modalities distribution used by annotators in BAH dataset. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Per-modality cues distribution used by annotators in BAH dataset. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Word cloud of per-modality cues used by annotators in BAH dataset. (a) Cues inconsistencies distribution. (b) Cues inconsistencies co-occurrence distribution [PITH_FULL_IMAGE:figures/full_fig_p029_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Cues inconsistencies and their co-occurrence distribution used by annotators in BAH dataset. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Distribution of cues with and without cross-modality inconsistencies over segments; in addition to question-based distribution used by annotators in BAH dataset. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Impact of context (window) length on the performance of frame-level classification when using visual modality [PITH_FULL_IMAGE:figures/full_fig_p038_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Comparison between different age groups and sex on DA methods. potential for further advancements in positive-class detection within this context. Even slight degradations in negative-class performance disproportionately impact the overall AP score. For example, Sub-based method emphasizes enhancing pos￾itive class identification, likely incurs a cost in precision or recall on the more frequent negative c… view at source ↗
read the original abstract

Ambivalence and hesitancy (A/H), closely related constructs, are the primary reasons why individuals delay, avoid, or abandon health behaviour changes. They are subtle and conflicting emotions that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. They manifest as a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exist for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours, captured from 300 participants across Canada, answering predefined questions to elicit A/H. It is intended to mirror real-world digital behaviour change interventions delivered online. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participant metadata are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, and different learning setups. The limited performance highlights the need for adapted multimodal and spatio-temporal models for A/H recognition. The data and code are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset containing 1,427 videos (10.60 hours total) from 300 participants across Canada who responded to a fixed set of predefined questions intended to elicit ambivalence/hesitancy (A/H). The dataset supplies multimodal video data, three-expert annotations with timestamps and frame-/video-level binary A/H labels plus cues, transcripts, cropped faces, and participant metadata. Baseline benchmarking results for frame- and video-level recognition are reported under several learning setups, with limited performance noted, and the data and code are released publicly to support ML model development for digital behaviour-change interventions.

Significance. If the expert labels are shown to be reliable, the BAH dataset would constitute the first public resource for multimodal A/H recognition and could enable progress on personalised digital health interventions. The public release of data and code, together with the honest reporting of modest baseline performance, are clear strengths that support reproducibility and further model development in affective computing and computer vision.

major comments (2)
  1. [Dataset Collection and Annotation] Dataset Collection and Annotation: No inter-rater reliability statistics (e.g., Fleiss' kappa or percentage agreement) are reported for the three experts' binary A/H annotations and timestamps. Because the dataset's utility for training ML models rests directly on label quality and consistency, this omission is load-bearing and must be addressed before the resource can be confidently used for model development.
  2. [Data Collection Procedure] Data Collection Procedure: The manuscript provides no pilot validation, operational definition of A/H cues, or evidence that the fixed set of questions reliably induces genuine ambivalence/hesitancy states rather than other affective responses or demand characteristics. This directly affects the claim that the dataset mirrors real-world digital interventions and transfers without substantial domain shift.
minor comments (2)
  1. [Benchmarking] Benchmarking section: The architectures, hyperparameters, and training protocols of the baseline models should be described in greater detail (including exact loss functions and data splits) to allow independent reproduction of the reported frame- and video-level results.
  2. [Figures and Supplementary Material] Figure captions and supplementary material: Video examples and annotation visualisations would benefit from more explicit descriptions of the depicted A/H cues so that readers can interpret them without direct access to the released data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate the revisions we will make to the next version of the paper.

read point-by-point responses
  1. Referee: [Dataset Collection and Annotation] Dataset Collection and Annotation: No inter-rater reliability statistics (e.g., Fleiss' kappa or percentage agreement) are reported for the three experts' binary A/H annotations and timestamps. Because the dataset's utility for training ML models rests directly on label quality and consistency, this omission is load-bearing and must be addressed before the resource can be confidently used for model development.

    Authors: We agree that inter-rater reliability statistics are essential to demonstrate label quality and consistency. We have computed Fleiss' kappa and percentage agreement for the binary A/H annotations as well as for the timestamped segments. These metrics will be reported in a new subsection of the annotation protocol in the revised manuscript, confirming substantial agreement among the three experts. revision: yes

  2. Referee: [Data Collection Procedure] Data Collection Procedure: The manuscript provides no pilot validation, operational definition of A/H cues, or evidence that the fixed set of questions reliably induces genuine ambivalence/hesitancy states rather than other affective responses or demand characteristics. This directly affects the claim that the dataset mirrors real-world digital interventions and transfers without substantial domain shift.

    Authors: We acknowledge that additional procedural details would strengthen the manuscript. In the revision we will expand the data collection section to include the operational definitions of A/H cues supplied to the annotators and the rationale for the question set, which was drawn from established behavioral-change instruments designed to surface mixed feelings. A formal pilot validation study with quantitative induction metrics was not performed prior to the main collection; we will therefore note this limitation and discuss potential domain-shift considerations when using the dataset for real-world interventions. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset collection and benchmarking paper with no derivations or predictions

full rationale

The paper is a data collection and benchmarking effort that introduces the BAH dataset from 300 participants answering predefined questions, with expert annotations for A/H presence, timestamps, and cues, plus baseline model results. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. The central claim (no prior datasets exist for A/H recognition, and BAH fills the gap) is a factual statement about data availability rather than a reduction to its own inputs. Annotations and benchmarks are presented as empirical outputs without circular self-definition or uniqueness theorems imported from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on standard practices for video data collection, expert annotation, and baseline model evaluation without introducing new mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5893 in / 1066 out tokens · 50703 ms · 2026-05-19T13:04:34.124440+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

118 extracted references · 118 canonical work pages · 3 internal anchors

  1. [1]

    Arabian, T

    H. Arabian, T. Abdulbaki Alshirbaji, R. Schmid, V . Wagner- Hartl, J. Chase, and K. Moeller. Harnessing wearable devices for emotional intelligence: Therapeutic applications in digital health.Sensors, 23(19):8092, 2023

  2. [2]

    C. J. Armitage and M. Conner. Attitudinal ambivalence: A test of three key hypotheses.Personality and Social Psychology Bulletin, 26(11):1421–1432, 2000

  3. [3]

    Aslam, M

    M. Aslam, M. Zeeshan, S. Belharbi, M. Pedersoli, A. Koerich, S. Bacon, and E. Granger. Distilling privileged multimodal in- formation for expression recognition using optimal transport. InInternational Conference on Automatic Face and Gesture Recognition (FG), 2024

  4. [4]

    S. Bai, J. Kolter, and V . Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.CoRR, abs/1803.01271, 2018

  5. [5]

    Belharbi, M

    S. Belharbi, M. Pedersoli, A. Koerich, S. Bacon, and E. Granger. Spatial action unit cues for interpretable deep facial expression recognition. InAI and Digital Health Sym- posium, 2024

  6. [6]

    Belharbi, M

    S. Belharbi, M. Pedersoli, A. L. Koerich, S. Bacon, and E. Granger. Guided interpretable facial expression recognition via spatial action unit cues. InInternational Conference on Automatic Face and Gesture Recognition (FG), 2024

  7. [7]

    Bishay, P

    M. Bishay, P. Palasek, S. Priebe, and I. Patras. Schinet: Auto- matic estimation of symptoms of schizophrenia from facial behaviour analysis.IEEE Transactions on Affective Comput- ing, 12(4):949–961, 2019

  8. [8]

    Bonnard, A

    J. Bonnard, A. Dapogny, F. Dhombres, and K. Bailly. Privi- leged attribution constrained deep networks for facial expres- sion recognition. InICPR, 2022

  9. [9]

    Bradley, L

    E. Bradley, L. Curry, and K. Devers. Qualitative data analysis for health services research: developing taxonomy, themes, and theory.Health services research, 42(4):1758–1772, 2007

  10. [10]

    Busso, M

    C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, and S. Narayanan. Iemocap: Inter- active emotional dyadic motion capture database.Language resources and evaluation, 42:335–359, 2008

  11. [11]

    Chaptoukaev, V

    H. Chaptoukaev, V . Strizhkova, M. Panariello, B. Dalpaos, A. Reka, V . Manera, S. Thummler, E. ISMAILOV A, N. Evans, F. Bremond, M. Todisco, M. A. Zuluaga, and L. M. Ferrari. StressID: a multimodal dataset for stress identification. In NeurIPS, 2023

  12. [12]

    J. Choe, S. Oh, S. Chun, S. Lee, Z. Akata, and H. Shim. Eval- uation for weakly supervised object localization: Protocol, metrics, and datasets.TPAMI, pages 1–1, 2022

  13. [13]

    Y . Chu, L. Liao, Z. Zhou, C.-W. Ngo, and R. Hong. Towards multimodal emotional support conversation systems.CoRR, abs/2408.03650, 2024

  14. [14]

    Conner and C

    M. Conner and C. Armitage. Attitudinal ambivalence. 2008

  15. [15]

    Conner and P

    M. Conner and P. Sparks. Ambivalence and attitudes.Euro- pean review of social psychology, 12(1):37–70, 2002

  16. [16]

    Davidson and U

    K. Davidson and U. Scholz. Understanding and predicting health behaviour change: a contemporary view through the lenses of meta-reviews.Health psychology review, 14(1):1–5, 2020

  17. [17]

    De-la Torre, E

    M. De-la Torre, E. Granger, P. V . Radtke, R. Sabourin, and D. Gorodnichy. Partially-supervised learning from facial tra- jectories for face recognition in video surveillance.Informa- tion fusion, 24:31–53, 2015

  18. [18]

    J. Deng, J. Guo, Y . Zhou, J. Yu, I. Kotsia, and S. Zafeiriou. Retinaface: Single-stage dense face localisation in the wild. CoRR, abs/1905.00641, 2019

  19. [19]

    Devlin, M

    J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. InNAACL-HLT, pages 4171–4186, 2019

  20. [20]

    Dhuheir, A

    M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah, and M. Hamdi. Emotion recognition for healthcare surveil- lance systems using neural networks: A survey. In2021 In- ternational Wireless Communications and Mobile Computing (IWCMC), pages 681–687, 2021

  21. [21]

    L. Dong, X. Wang, S. Setlur, V . Govindaraju, and I. Nwogu. Ig3d: Integrating 3d face representations in facial expression inference.CoRR, abs/2408.16907, 2024

  22. [22]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021

  23. [23]

    Y . Fan, J. Lam, and V . Li. Facial action unit intensity esti- mation via semantic correspondence learning with dynamic graph convolution. InAAAI, 2020

  24. [24]

    J. Fleiss. Measuring nominal scale agreement among many raters.Psychological bulletin, 76(5):378, 1971

  25. [25]

    R. Guo, H. Guo, L. Wang, M. Chen, D. Yang, and B. Li. Development and application of emotion recognition tech- nology—a systematic literature review.BMC psychology, 12(1):95, 2024

  26. [26]

    X. Guo, B. Zhu, L. Polanía, C. Boncelet, and K. Barner. Group-level emotion recognition using hybrid deep models based on faces, scenes, skeletons and visual attentions. In Proceedings of the 20th ACM international conference on multimodal interaction, pages 635–639, 2018

  27. [27]

    J. Hall, J. Harrigan, and R. Rosenthal. Nonverbal behav- ior in clinician—patient interaction.Applied and preventive psychology, 4(1):21–37, 1995

  28. [28]

    Hallmen, R.-N

    T. Hallmen, R.-N. Kampa, F. Deuser, N. Oswald, and E. An- dré. Semantic matters: Multimodal features for affective analysis. InABAW workshop at CVPR, 2025

  29. [29]

    J. Han, L. Xie, J. Liu, and X. Li. Personalized broad learning system for facial expression.Multimedia Tools and Applica- tions, 2020

  30. [30]

    Hayashi, S

    D. Hayashi, S. Carvalho, P. Ribeiro, R. Rodrigues, T. São- João, K. Lavoie, S. Bacon, and M. E. Cornelio. Methods to assess ambivalence towards food and diet: a scoping review. Journal of Human Nutrition and Dietetics, 36(5):2010–2025, 2023

  31. [31]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InCVPR, 2016

  32. [32]

    Z. He, Z. Li, F. Yang, L. Wang, J. Li, C. Zhou, and J. Pan. Advances in multimodal emotion recognition based on brain– computer interfaces.Brain sciences, 10(10):687, 2020

  33. [33]

    Heisel and M

    M. Heisel and M. Mongrain. Facial expressions and ambiva- lence: Looking for conflict in all the right faces.Journal of Nonverbal Behavior, 28:35–52, 2004

  34. [34]

    Hershey, S

    S. Hershey, S. Chaudhuri, D. Ellis, J. Gemmeke, A. Jansen, R. Moore, M. Plakal, D. Platt, R. Saurous, B. Seybold, M. Slaney, R. Weiss, and K. Wilson. Cnn architectures for large-scale audio classification. InICASSP, 2017

  35. [35]

    Hohman, W

    Z. Hohman, W. Crano, and E. Niedbala. Attitude ambiva- lence, social norms, and behavioral intentions: Developing effective antitobacco persuasive communications.Psychology of Addictive Behaviors, 30(2):209, 2016

  36. [36]

    Hornstein, K

    S. Hornstein, K. Zantvoort, U. Lueken, B. Funk, and K. Hilbert. Personalization strategies in digital mental health 43 González et al. [ICLR 2026] interventions: a systematic review and conceptual frame- work for depressive symptoms.Frontiers in digital health, 5:1170002, 2023

  37. [37]

    G. M. Jacob and B. Stenger. Facial action unit detection with transformers. InCVPR, 2021

  38. [38]

    Jiang, L

    D. Jiang, L. Yan, and F. Mayrand. Emotion expressions and cognitive impairments in the elderly: review of the contactless detection approach.Frontiers in Digital Health, 6:1335289, 2024

  39. [39]

    H. Jin. A comparative analysis of single and multi-modality- based emotion recognition for disease prevention. InInterna- tional Conference on Artificial Intelligence and Communica- tion (ICAIC), volume 185, page 323, 2024

  40. [40]

    W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vi- jayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, and M. Suleyman. The kinetics human action video dataset.CoRR, abs/1705.06950, 2017

  41. [41]

    Khanna, N

    R. Khanna, N. Robinson, M. O’Donnell, H. Eyre, and E. Smith. Affective computing in psychotherapy.Advances in Psychiatry and Behavioral Health, 2(1):95–105, Sep 2022

  42. [42]

    D. Kollias. Multi-label compound expression recognition: C-expr database & network. InCVPR, 2023

  43. [43]

    Kollias, P

    D. Kollias, P. Tzirakis, A. Cowen, S. Zafeiriou, I. Kotsia, E. Granger, M. Pedersoli, S. Bacon, A. Baird, C. Gagne, C. Shao, G. Hu, S. Belharbi, and M. H. Aslam. Advancements in affective and behavior analysis: The 8th abaw workshop and competition. InComputer Vision and Pattern Recognition Conference (CVPR) workshop, 2025

  44. [44]

    Kollias and S

    D. Kollias and S. Zafeiriou. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. CoRR, 2019

  45. [45]

    Kossaifi, R

    J. Kossaifi, R. Walecki, Y . Panagakis, J. Shen, M. Schmitt, F. Ringeval, J. Han, V . Pandit, A. Toisoul, B. Schuller, et al. Sewa db: A rich database for audio-visual emotion and senti- ment research in the wild.TPAMI, 43(3):1022–1040, 2019

  46. [46]

    K. Kraack. A multimodal emotion recognition system: In- tegrating facial expressions, body movement, speech, and spoken language.CoRR, abs/2412.17907, 2024

  47. [47]

    Labbé, I

    S. Labbé, I. Colmegna, V . Valerio, V . Boucher, S. Peláez, A. Dragomir, C. Laurin, E. Hazel, S. Bacon, and K. Lavoie. Training physicians in motivational communication to address influenza vaccine hesitation: a proof-of-concept study.Vac- cines, 10(2):143, 2022

  48. [48]

    I. Lee, E. Lee, and S. Yoo. Latent-ofer: Detect, mask, and reconstruct with latent vectors for occluded facial expression recognition. InICCV, 2023

  49. [49]

    Li and W

    S. Li and W. Deng. Deep emotion transfer network for cross- database facial expression recognition. InICPR, 2018

  50. [50]

    S. Li, W. Deng, and J. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. InCVPR, 2017

  51. [51]

    Liang, D

    J. Liang, D. Hu, and J. Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InICML, 2020

  52. [52]

    B. Lin, Y . Ye, B. Zhu, J. Cui, M. Ning, P. Jin, and L. Yuan. Video-llava: Learning united visual representation by align- ment before projection, 2024

  53. [53]

    C. Liu, X. Zhang, X. Liu, T. Zhang, L. Meng, Y . Liu, Y . Deng, and W. Jiang. Facial expression recognition based on multi- modal features for videos in the wild. InCVPR, 2023

  54. [54]

    D. Liu, H. Zhang, and P. Zhou. Video-based facial expression recognition using graph convolutional networks. InICPR, 2021

  55. [55]

    H. Liu, R. An, Z. Zhang, B. Ma, W. Zhang, Y . Song, Y . Hu, W. Chen, and Y . Ding. Norface: Improving facial expression analysis by identity normalization.ECCV, 2024

  56. [56]

    X. Liu, L. Jin, X. Han, J. Lu, J. You, and L. Kong. Identity- aware facial expression recognition in compressed video. In ICPR, 2021

  57. [57]

    Y . Liu, W. Wang, C. Feng, H. Zhang, Z. Chen, and Y . Zhan. Expression snippet transformer for robust video-based facial expression recognition.Pattern Recognition, 138:109368, 2023

  58. [58]

    Y . Liu, Y . Zhang, and Y . Wang. Application of deep learning- based image processing in emotion recognition and psycho- logical therapy.Traitement du Signal, 41(6):2923, 2024

  59. [59]

    Lokhande, C

    H. Lokhande, C. Garware, T. Kudale, and R. Kumar. Personal- ized well-being interventions (pwis): A new frontier in mental health. InAffective Computing for Social Good: Enhancing Well-being, Empathy, and Equity, pages 183–200. 2024

  60. [60]

    Loshchilov and F

    I. Loshchilov and F. Hutter. SGDR: stochastic gradient de- scent with warm restarts. InICLR, 2017

  61. [61]

    C. Luo, S. Song, W. Xie, L. Shen, and H. Gunes. Learning multi-dimensional edge feature-based AU relation graph for facial action unit recognition. InIJCAI, 2022

  62. [62]

    MacDonald

    N. MacDonald. Vaccine hesitancy: Definition, scope and determinants.Vaccine, 33(34):4161–4164, 2015

  63. [63]

    happiness

    Y . Maki, H. Yoshida, T. Yamaguchi, and H. Yamaguchi. Rela- tive preservation of the recognition of positive facial expres- sion “happiness” in alzheimer disease.International Psy- chogeriatrics, 25(1):105–110, 2013

  64. [64]

    Manuel and T

    J. Manuel and T. Moyers. The role of ambivalence in behavior change.Addiction, 111(11):1910–1912, Nov. 2016

  65. [65]

    J. Mao, R. Xu, X. Yin, Y . Chang, B. Nie, A. Huang, and Y . Wang. Poster++: A simpler and stronger facial expression recognition network.Pattern Recognition, page 110951, 2024

  66. [66]

    McDonald, A

    H. McDonald, A. Garg, and R. Haynes. Interventions to en- hance patient adherence to medication prescriptions: scientific review.Jama, 288(22):2868–2879, 2002

  67. [67]

    Michie, M

    S. Michie, M. Richardson, M. Johnston, C. Abraham, J. Fran- cis, W. Hardeman, M. Eccles, J. Cane, and C. Wood. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions.Annals of behavioral medicine, 46(1):81–95, 2013

  68. [68]

    Michie, R

    S. Michie, R. West, and B. Spring. Moving from theory to practice and back in social and health psychology. 2013

  69. [69]

    Miller and G

    W. Miller and G. Rose. Motivational interviewing and deci- sional balance: contrasting responses to client ambivalence. Behavioural and cognitive psychotherapy, 43(2):129–141, 2015

  70. [70]

    Miranda Calero, L

    J. Miranda Calero, L. Gutiérrez-Martín, E. Rituerto-González, E. Romero-Perales, J. Lanza-Gutiérrez, C. Peláez-Moreno, and C. López-Ongil. Wemac: Women and emotion multi-modal affective computing dataset.Scientific data, 11(1):1182, 2024

  71. [71]

    Mollahosseini, B

    A. Mollahosseini, B. Hassani, and M. H. Mahoor. Affectnet: A database for facial expression, valence, and arousal com- puting in the wild.IEEE Trans. Affect. Comput., 10(1):18–31, 2019

  72. [72]

    Murtaza, S

    S. Murtaza, S. Belharbi, M. Pedersoli, and E. Granger. A realistic protocol for evaluation of weakly supervised object localization. InWACV, 2025

  73. [73]

    Nasimzada, J

    J. Nasimzada, J. Kleesiek, K. Herrmann, A. Roitberg, and C. Seibold. Towards synthetic data generation for improved pain recognition in videos under patient constraints.CoRR, abs/2409.16382, 2024. 44 González et al. [ICLR 2026]

  74. [74]

    O’Donnell, M

    A. O’Donnell, M. Addison, L. Spencer, H. Zurhold, M. Rosenkranz, R. McGovern, E. Gilvarry, M.-S. Martens, U. Verthein, and E. Kaner. Which individual, social and envi- ronmental influences shape key phases in the amphetamine type stimulant use trajectory? a systematic narrative review and thematic synthesis of the qualitative literature.Addiction, 114(1):...

  75. [75]

    Ortiz, T

    C. Ortiz, T. López-Cuadrado, A. Ayuso-Álvarez, C. Rodríguez-Blázquez, and I. Galán. Co-occurrence of behavioural risk factors for non-communicable diseases and mortality risk in spain: a population-based cohort study. BMJ open, 15(1):e093037, 2025

  76. [76]

    Pantic and L

    M. Pantic and L. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction.Proceedings of the IEEE, 91(9):1370–1390, 2003

  77. [77]

    L. Pepa, L. Spalazzi, M. Capecci, and M. G. Ceravolo. Auto- matic emotion recognition in clinical scenario: a systematic review of methods.IEEE Transactions on Affective Comput- ing, 14(2):1675–1695, 2021

  78. [78]

    Poria, D

    S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. MELD: A multimodal multi-party dataset for emotion recognition in conversations. InConference of the Association for Computational Linguistics, pages 527–536, 2019

  79. [79]

    Praveen and J

    R. Praveen and J. Alam. Recursive joint cross-modal attention for multimodal fusion in dimensional emotion recognition. In ABAW workshop at CVPR, 2024

  80. [80]

    R. G. Praveen and J. Alam. Inconsistency-aware cross- attention for audio-visual fusion in dimensional emotion recognition.CoRR, abs/405.12853, 2024

Showing first 80 references.