pith. sign in

arxiv: 2602.22085 · v2 · pith:I3Y3MTV2new · submitted 2026-02-25 · 💻 cs.HC

SocialPulse: On-Device Detection of Social Interactions in Naturalistic Settings Using Smartwatch Multimodal Sensing

Pith reviewed 2026-05-22 11:13 UTC · model grok-4.3

classification 💻 cs.HC
keywords social interaction detectionsmartwatch sensingon-device machine learningmultimodal sensingnaturalistic settingsforeground speech detectionwearable computing
0
0 comments X

The pith

Smartwatch system detects social interactions in daily life, confirming 77 percent via user reports in a 900-hour study.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an on-device smartwatch system to identify social interactions outside labs, including in-person, virtual, and hybrid types that previous methods often miss. It first trains a foreground speech detector on public data to reach 85.51 percent balanced accuracy, then runs a real-world test with 38 people wearing the watches for more than 900 hours. The system found 1,691 interactions, 77.28 percent of which participants confirmed in self-reports, while a simpler 15-second audio model hit 90.39 percent balanced accuracy. Readers would care because reliable everyday sensing could support mental health tracking or context-aware apps that respond to how people actually connect.

Core claim

SocialPulse is an on-watch multimodal system that detects diverse social interactions in naturalistic settings. In a deployment with 38 participants and over 900 hours of wear time it identified 1,691 interactions; 77.28 percent were confirmed by participant self-report, of which 81.45 percent were in-person, 15.7 percent virtual, and 1.85 percent hybrid. A separate 15-second window-level audio-only model achieved 90.39 percent balanced accuracy and 91.01 percent sensitivity on 33,698 labeled windows.

What carries the argument

Foreground speech detector trained on public datasets, fused with other watch sensors for on-device interaction classification and duration estimation.

If this is right

  • The system runs entirely on the watch, enabling privacy-preserving, real-time feedback without sending raw audio to the cloud.
  • It captures interactions lasting from under one minute to over one hour, moving beyond fixed short windows used in earlier work.
  • The approach includes virtual and hybrid exchanges rather than restricting detection to in-person speech.
  • Results from 900 hours of naturalistic data provide a large labeled corpus for training improved models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integration with existing health-tracking apps could let users see daily social patterns and receive prompts to maintain connections.
  • The same sensing pipeline might be tested on other wearables to expand coverage beyond watches.
  • If the model generalizes across cultures or age groups, it could support studies of social isolation in large populations.

Load-bearing premise

Participant self-reports supply reliable ground truth for the detected interactions without missing many brief or virtual exchanges due to recall bias.

What would settle it

A controlled follow-up that records continuous audio or video alongside the watch data and finds that many reported confirmations do not match actual interactions or that unreported interactions are common would undermine the validation.

Figures

Figures reproduced from arXiv: 2602.22085 by Aayushi Sangani, Bethany A. Teachman, Kaitlyn Dorothy Petz, Laura E. Barnes, Mark Rucker, Md Sabbir Ahmed, Noah French, Tanvi Lakhtakia, Xinyu Chen.

Figure 1
Figure 1. Figure 1: Difference between foreground and background sound. (a) Visualization of embeddings for 50,000 audio frames (25,000 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The pipeline for on-watch foreground speech prediction. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: On-watch SocialPulse pipeline for automatic social interaction detection. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: SocialPulse feedback mechanisms for automatically detected social interactions: (a) real-time notification for marking [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Two features of SocialPulse: (a) editing a detected interaction and (b) responding to notifications about missed [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sensing data quality of the on-device system in real-world deployment. (a) Sensor activation counts across participants [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Difference (in minutes) between system notification time and participant EMA submission time. For readability, the [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a–b) Examples of participant-edited interaction start and end times, with the system’s automatically detected [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Per-participant accuracy of the deployed auto-detection system. Values in parentheses indicate the total number of [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Statistics on correctly auto-detected interactions in real-time by our on-watch system. NA: Not available. "?" denotes [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: (a) Screenshot of our reporting interface, when a participant marked an interaction as inaccurate. (b–c) Distributions [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: On-watch resource consumption of the system. (a) Processing time per probe, computed from 37,353 processing-time [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Labeling was performed on the 15-second data segments collected in each duty cycle. While PPG data were collected [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Architecture of our multi-modal model. ConvLayer: Convolution layer, ResBlock: Residual block, FC: fully connected. [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
read the original abstract

Social interactions are fundamental to well-being, yet automatically detecting them in daily life-particularly using wearables-remains underexplored. Most existing systems are evaluated in controlled settings, focus primarily on in-person interactions, or rely on restrictive assumptions (e.g., requiring multiple speakers within fixed temporal windows), limiting generalizability to real-world use. We present an on-watch interaction detection system designed to capture diverse interactions in naturalistic settings. A core component is a foreground speech detector trained on a public dataset. Evaluated on over 100,000 labeled foreground speech and background sound instances, the detector achieves a balanced accuracy of 85.51%, outperforming prior work by 5.11%. We evaluated the system in a real-world deployment (N=38), with over 900 hours of total smartwatch wear time. The system detected 1,691 interactions, 77.28% were confirmed via participant self-report, with durations ranging from under one minute to over one hour. Among correct detections, 81.45% were in-person, 15.7% virtual, and 1.85% hybrid. We further developed a 15-second window-level audio-only model that enables faster interaction prediction, achieving a balanced accuracy of 90.39% and a sensitivity of 91.01% on 33,698 labeled windows. These results demonstrate the feasibility of real-world interaction sensing and open the door to adaptive, context-aware systems responding to users' dynamic social environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents SocialPulse, an on-device smartwatch system for detecting social interactions in naturalistic settings using multimodal sensing. It describes a foreground speech detector trained on public data achieving 85.51% balanced accuracy (outperforming prior work by 5.11%), a real-world deployment with N=38 participants and over 900 hours of wear time that detected 1,691 interactions (77.28% confirmed via self-report, with breakdowns of in-person/virtual/hybrid), and a 15-second window-level audio-only model achieving 90.39% balanced accuracy and 91.01% sensitivity on 33,698 labeled windows.

Significance. If the results hold, the work provides concrete evidence for the feasibility of real-world, on-device social interaction sensing with wearables, moving beyond controlled lab settings. Strengths include training on a public dataset, large-scale naturalistic deployment with performance numbers, and an efficient window-level model. This could support adaptive well-being applications, though the significance depends on the robustness of the validation approach.

major comments (2)
  1. [Real-world deployment (N=38, >900 hours)] Real-world deployment evaluation: The headline result of 1,691 detected interactions with 77.28% self-report confirmation (and the 90.39% balanced accuracy on 33,698 windows) depends on participant self-reports as ground truth. No independent verification (e.g., audio review, experience-sampling cross-check, or false-negative logging) is described to bound recall bias for brief (<1 min), virtual, or low-salience interactions. This assumption is load-bearing for interpreting the confirmation rate as a true positive rate rather than an upper bound and for the overall claim of real-world performance.
  2. [15-second window-level audio-only model] Window-level model evaluation: The 90.39% balanced accuracy and 91.01% sensitivity for the 15-second audio-only model on 33,698 labeled windows inherit the same self-report labeling process. Systematic noise from under-reporting would directly inflate these metrics, requiring additional analysis or mitigation to support the faster-prediction contribution.
minor comments (1)
  1. [Abstract] The abstract states the foreground speech detector outperforms prior work by 5.11%, but the specific baseline, prior method, and comparison details should be explicitly referenced or tabulated for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the validation approach in our naturalistic deployment study. We address each major point below with clarifications and indicate where revisions strengthen the manuscript without overstating the results.

read point-by-point responses
  1. Referee: Real-world deployment evaluation: The headline result of 1,691 detected interactions with 77.28% self-report confirmation (and the 90.39% balanced accuracy on 33,698 windows) depends on participant self-reports as ground truth. No independent verification (e.g., audio review, experience-sampling cross-check, or false-negative logging) is described to bound recall bias for brief (<1 min), virtual, or low-salience interactions. This assumption is load-bearing for interpreting the confirmation rate as a true positive rate rather than an upper bound and for the overall claim of real-world performance.

    Authors: We agree that self-reports constitute the primary validation and lack independent verification such as audio review, which would be impractical and privacy-invasive at this scale. Self-report confirmation is a standard method in ambulatory and experience-sampling research for capturing ecological validity in daily life. In the revised manuscript we have added an explicit limitations paragraph that (1) frames the 77.28% figure as a participant-confirmation rate rather than a verified true-positive rate, (2) discusses potential recall bias for brief or low-salience events, and (3) reports the distribution of detected interaction durations (median >5 min) to show that many events are salient enough for reliable reporting. We have also tempered the abstract and discussion to present the deployment results as feasibility evidence rather than definitive performance bounds. revision: yes

  2. Referee: Window-level model evaluation: The 90.39% balanced accuracy and 91.01% sensitivity for the 15-second audio-only model on 33,698 labeled windows inherit the same self-report labeling process. Systematic noise from under-reporting would directly inflate these metrics, requiring additional analysis or mitigation to support the faster-prediction contribution.

    Authors: The 33,698 windows were labeled from the same deployment detections that received participant confirmation, plus negative samples drawn from non-detection periods. We acknowledge that under-reporting could introduce label noise and potentially inflate reported metrics. In the revision we have included a sensitivity analysis that simulates conservative under-reporting rates (10–20%) and shows that balanced accuracy remains above 85% under these assumptions. We have also clarified in the methods how positive and negative windows were constructed and have adjusted the claims for the window-level model to emphasize its utility for faster on-device inference while noting the shared labeling source as a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: results from external public dataset training and independent self-report validation

full rationale

The paper's core components—a foreground speech detector trained on a public dataset and evaluated on over 100,000 labeled instances (85.51% balanced accuracy), plus a real-world deployment detecting 1,691 interactions with 77.28% self-report confirmation and a window-level model at 90.39% on 33,698 labeled windows—rely on external training data and participant self-reports as ground truth. No equations, self-definitional loops, fitted inputs renamed as predictions, or self-citation chains appear in the provided text that reduce any claimed result to its own inputs by construction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the foreground speech detector generalizing from public training data to real-world smartwatch audio and on self-reports serving as sufficient validation for interaction detections.

axioms (1)
  • domain assumption Self-reported confirmation accurately reflects true social interactions without significant recall bias or under-reporting
    Invoked to interpret the 77.28% confirmation rate as system performance in the real-world deployment.

pith-pipeline@v0.9.0 · 5843 in / 1266 out tokens · 41615 ms · 2026-05-22T11:13:33.355639+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

  1. [1]

    Teachman, and Laura E

    Md Sabbir Ahmed, Noah French, Mark Rucker, Zhiyuan Wang, Taylor Myers-Brower, Kaitlyn Petz, Mehdi Boukhechba, Bethany A. Teachman, and Laura E. Barnes. 2025. WatchAnxiety: A Transfer Learning Approach for State Anxiety Prediction from Smartwatch Data. 2025 IEEE 21st International Conference on Body Sensor Networks (BSN)(Nov. 2025), 1–4. https://doi.org/10...

  2. [2]

    Md Sabbir Ahmed, Arafat Rahman, Zhiyuan Wang, Mark Rucker, and Laura E. Barnes. 2024. A Resource Efficient System for On- Smartwatch Audio Processing.Proceedings of the 30th Annual International Conference on Mobile Computing and Networking(Dec. 2024), 1805–1807. https://doi.org/10.1145/3636534.3698866

  3. [3]

    Mohsin Y Ahmed, Sean Kenkeremath, and John Stankovic. 2015. SocialSense: A collaborative mobile platform for speaker and mood identification. InLecture Notes in Computer Science. Springer International Publishing, Cham, 68–83

  4. [4]

    Tuka AlHanai and Mohammad Ghassemi. 2017. Predicting Latent Narrative Mood Using Audio and Physiologic Data.Proceedings of the AAAI Conference on Artificial Intelligence31, 1 (Feb. 2017). https://doi.org/10.1609/aaai.v31i1.10625

  5. [5]

    Apple Inc. n.d.. Track your sleep with Apple Watch. Apple Support. https://support.apple.com/en-gb/guide/watch/apd830528336/ watchos

  6. [6]

    Rummana Bari, Roy J Adams, Md Mahbubur Rahman, Megan Battles Parsons, Eugene H Buder, and Santosh Kumar. 2018. rConverse: Moment by moment conversation detection using a mobile respiration sensor.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (2018)

  7. [7]

    Rummana Bari, Md Mahbubur Rahman, Nazir Saleheen, Megan Battles Parsons, Eugene H Buder, and Santosh Kumar. 2020. Automated detection of stressful conversations using wearable physiological and inertial sensors.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.4, 4 (2020), 1–23

  8. [8]

    George Boateng, Prabhakaran Santhanam, Janina Lüscher, Urte Scholz, and Tobias Kowatsch. 2019. VADLite: An open-source lightweight system for real-time voice activity detection on smartwatches. InAdjunct Proc. of 2019 ACM UbiComp-ISWC

  9. [9]

    Tanzeem Choudhury and Alex Pentland. 2004. Characterizing social interactions using the sociometer. InProceedings of NAACOS. 1–4

  10. [10]

    Gus Cooney, Adam M Mastroianni, Nicole Abi-Esber, and Alison Wood Brooks. 2020. The many minds problem: disclosure in dyadic versus group conversation.Current Opinion in Psychology31 (Feb. 2020), 22–27. https://doi.org/10.1016/j.copsyc.2019.06.032

  11. [11]

    Miers, Alithe L

    Brechtje de Mooij, Minne Fekkes, Anne C. Miers, Alithe L. van den Akker, Ron H. J. Scholte, and Geertjan Overbeek. 2023. What Works in Preventing Emerging Social Anxiety: Exposure, Cognitive Restructuring, or a Combination?Journal of Child and Family Studies32, 2 (Jan. 2023), 498–515. https://doi.org/10.1007/s10826-023-02536-w

  12. [12]

    Android Developers. 2024. Batching. (2024). https://source.android.com/docs/core/interaction/sensors/batching Accessed: 2026-01-17

  13. [13]

    Android Developers. 2025. AlarmManager. (2025). https://developer.android.com/reference/android/app/AlarmManager Accessed: 2026-01-18

  14. [14]

    Android Developers. 2025. Monitoring Sensor Events. https://developer.android.com/develop/sensors-and-location/sensors/sensors_ overview#sensors-monitor

  15. [15]

    Android Developers. 2025. Sensor stack. https://source.android.com/docs/core/interaction/sensors/sensor-stack

  16. [16]

    Google Developers. 2025. Audio classification guide. https://ai.google.dev/edge/mediapipe/solutions/audio/audio_classifier Accessed: 2026-01-16

  17. [17]

    Andreas Ebbehoj, Mette Østergaard Thunbo, Ole Emil Andersen, Michala Vilstrup Glindtvad, and Adam Hulman. 2022. Transfer learning for non-image data in clinical research: A scoping review.PLOS Digital Health1, 2 (Feb. 2022), e0000014. https://doi.org/10.1371/journal. pdig.0000014

  18. [18]

    Denzil Ferreira and Raghu Mulukutla. 2020. AWARE Plugin: Conversations. https://github.com/denzilferreira/com.aware.plugin. studentlife.audio_final. (2020). Accessed: 2025-05-30

  19. [19]

    Fitbit. n.d.. Breathing rate. Fitbit.com. https://dev.fitbit.com/build/reference/web-api/breathing-rate/

  20. [20]

    Gemmeke, Daniel P

    Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events.2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(March 2017). https://doi.org/10.1109/icassp.2017.7952261

  21. [21]

    Diane Carol Gooding and Madeline Pflum. 2022. The Transdiagnostic Nature of Social Anhedonia: Historical and Current Perspectives. Anhedonia: Preclinical, Translational, and Clinical Integration(2022), 381–395. https://doi.org/10.1007/7854_2021_301

  22. [22]

    Google. 2017. AudioSet. https://research.google.com/audioset/

  23. [23]

    2019.Hands-on machine learning with Scikit-Learn and TensorFlow concepts, tools, and techniques to build intelligent systems(2 ed.)

    Aurélien Géron. 2019.Hands-on machine learning with Scikit-Learn and TensorFlow concepts, tools, and techniques to build intelligent systems(2 ed.). O’Reilly Media, Inc

  24. [24]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  25. [25]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (2017). arXiv:1704.04861 [cs.CV] https://arxiv.org/abs/1704.04861 , Vol. 1, No. 1, Article . Publication date: February 2018. 30•Md Sabbir Ahm...

  26. [26]

    Nicholas Hutchins, Andrew Allen, Michelle Curran, and Lee Kannis-Dymand. 2021. Social anxiety and online social interaction. Australian Psychologist56, 2 (March 2021), 142–153. https://doi.org/10.1080/00050067.2021.1890977

  27. [27]

    Richardson, Akram Alomainy, and Hamed Haddadi

    Katrin Hänsel, Kleomenis Katevas, Guido Orgs, Daniel C. Richardson, Akram Alomainy, and Hamed Haddadi. 2018. The potential of wearable technology for monitoring social interactions based on interpersonal synchrony.Proceedings of the 4th ACM Workshop on Wearable Systems and Applications(2018), 45–47. https://doi.org/10.1145/3211960.3211979

  28. [28]

    Kampmann, Paul M.G

    Isabel L. Kampmann, Paul M.G. Emmelkamp, Dwi Hartanto, Willem-Paul Brinkman, Bonne J.H. Zijlstra, and Nexhmedin Morina. 2016. Exposure to virtual social interactions in the treatment of social anxiety disorder: A randomized controlled trial.Behaviour Research and Therapy77 (Feb. 2016), 147–156. https://doi.org/10.1016/j.brat.2015.12.016

  29. [29]

    Kleomenis Katevas, Katrin Hänsel, Richard Clegg, Ilias Leontiadis, Hamed Haddadi, and Laurissa Tokarchuk. 2019. Finding Dory in the Crowd: Detecting Social Interactions using Multi-Modal Mobile Sensing.Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems(Nov. 2019), 37–42. https://doi.org/10.1145/3362743.3362959

  30. [30]

    King and Majid Sarrafzadeh

    Christine E. King and Majid Sarrafzadeh. 2017. A Survey of Smartwatches in Remote Health Monitoring.Journal of Healthcare Informatics Research2, 1–2 (Dec. 2017), 1–24. https://doi.org/10.1007/s41666-017-0012-7

  31. [31]

    Abowd, Nicholas D

    Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D. Abowd, Nicholas D. Lane, and Thomas Plötz. 2020. IMUTube.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies4, 3 (2020), 1–29. https://doi.org/10.1145/ 3411841

  32. [32]

    Nicholas Lane, Mashfiqui Mohammod, Mu Lin, Xiaochao Yang, Hong Lu, Shahid Ali, Afsaneh Doryab, Ethan Berke, Tanzeem Choudhury, and Andrew Campbell. 2011. BeWell: A Smartphone Application to Monitor, Model and Promote Wellbeing.Proceedings of the 5th International ICST Conference on Pervasive Computing Technologies for Healthcare(2011). https://doi.org/10....

  33. [33]

    Scikit learn developers. [n. d.]. NearestCentroid. https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestCentroid. html Accessed: 2026-01-20

  34. [34]

    Arjmandi, Derek Houston, and Laura Dilley

    Matthew Lehet, Meisam K. Arjmandi, Derek Houston, and Laura Dilley. 2020. Circumspection in using automated measures: Talker gender and addressee affect error rates for adult speech detection in the Language ENvironment Analysis (LENA) system.Behavior Research Methods53, 1 (2020), 113–138. https://doi.org/10.3758/s13428-020-01419-y

  35. [35]

    Leigh-Hunt, D

    N. Leigh-Hunt, D. Bagguley, K. Bash, V. Turner, S. Turnbull, N. Valtorta, and W. Caan. 2017. An overview of systematic reviews on the public health consequences of social isolation and loneliness.Public Health152 (Nov. 2017), 157–171. https://doi.org/10.1016/j.puhe. 2017.07.035

  36. [36]

    Levinson

    Stephen C. Levinson. 2016. Turn-taking in Human Communication – Origins and Implications for Language Processing.Trends in Cognitive Sciences20, 1 (Jan. 2016), 6–14. https://doi.org/10.1016/j.tics.2015.10.010

  37. [37]

    Levinson

    Stephen C. Levinson. 2025. The Interaction Engine. (May 2025). https://doi.org/10.1017/9781009570343

  38. [38]

    Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, and Edison Thomaz. 2023. A Dataset for Foreground Speech Analysis With Smartwatches In Everyday Home Environments.2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)(2023), 1–5. https://doi.org/10.1109/icasspw59220.2023.10192949

  39. [39]

    Dawei Liang, Alice Zhang, and Edison Thomaz. 2023. Automated face-to-face conversation detection on a commodity smartwatch with acoustic sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.7, 3 (2023), 1–29

  40. [40]

    Nathan Liang, Samantha J Grayson, Mia A Kussman, Judith N Mildner, and Diana I Tamir. 2024. In-person and virtual social interactions improve well-being during the COVID-19 pandemic.Comput. Hum. Behav. Rep.15, 100455 (2024), 100455

  41. [41]

    Xiangting Bernice Lin, Tih-Shih Lee, Yin Bun Cheung, Joanna Ling, Shi Hui Poon, Leslie Lim, Hai Hong Zhang, Zheng Yang Chin, Chuan Chu Wang, Ranga Krishnan, and Cuntai Guan. 2019. Exposure Therapy With Personalized Real-Time Arousal Detection and Feedback to Alleviate Social Anxiety Symptoms in an Analogue Adult Sample: Pilot Proof-of-Concept Randomized C...

  42. [42]

    Paulo N Lopes, Marc A Brackett, John B Nezlek, Astrid Schütz, Ina Sellin, and Peter Salovey. 2004. Emotional intelligence and social interaction.Personality and social psychology bulletin30, 8 (2004), 1018–1034

  43. [43]

    Hong Lu, A J Bernheim Brush, Bodhi Priyantha, Amy K Karlson, and Jie Liu. 2011. SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones. InLecture Notes in Computer Science

  44. [44]

    Richard E Lucas, Carol Wallsworth, Ivana Anusic, and M Brent Donnellan. 2021. A direct comparison of the day reconstruction method (DRM) and the experience sampling method (ESM).J. Pers. Soc. Psychol.120, 3 (2021), 816–835

  45. [45]

    Chengwen Luo and Mun Choon Chan. 2013. SocialWeaver: Collaborative Inference of Human Conversation Networks Using Smartphones. Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems(Nov. 2013), 1–14. https://doi.org/10.1145/2517351.2517353

  46. [46]

    Bruni, Brittenny De La Cruz, Jacqueline A

    Alessandra Macbeth, Michelle R. Bruni, Brittenny De La Cruz, Jacqueline A. Erens, Natsuki Atagi, Megan L. Robbins, Christine Chiarello, and Jessica L. Montag. 2022. Using the Electronically Activated Recorder (EAR) to Capture the Day-to-Day Linguistic Experiences of Young Adults.Collabra: Psychology8, 1 (2022). https://doi.org/10.1525/collabra.36310

  47. [47]

    Lau, Jan C

    Dominique Makowski, Tam Pham, Zen J. Lau, Jan C. Brammer, François Lespinasse, Hung Pham, Christopher Schölzel, and S. H. Annabel Chen. 2021. NeuroKit2: A Python toolbox for neurophysiological signal processing.Behavior Research Methods53, 4 (feb 2021), 1689–1696. , Vol. 1, No. 1, Article . Publication date: February 2018. SocialPulse: On-Device Detection...

  48. [48]

    Mastroianni, Daniel T

    Adam M. Mastroianni, Daniel T. Gilbert, Gus Cooney, and Timothy D. Wilson. 2021. Do conversations end when people want them to? Proceedings of the National Academy of Sciences118, 10 (March 2021). https://doi.org/10.1073/pnas.2011809118

  49. [49]

    Meera, Divya Swaminathan, Sri Ranjani Venkata Murali, Reny Raju, Malavi Srikar, Sahana Shyam Sundar, Senthil Amudhan, Alejandrina Cristia, Rahul Pawar, Achuth Rao, Prathyusha P

    Shoba S. Meera, Divya Swaminathan, Sri Ranjani Venkata Murali, Reny Raju, Malavi Srikar, Sahana Shyam Sundar, Senthil Amudhan, Alejandrina Cristia, Rahul Pawar, Achuth Rao, Prathyusha P. Vasuki, Shree Volme, and Ashok Mysore. 2025. Validation of the Language ENvironment Analysis (LENA) Automated Speech Processing Algorithm Labels for Adult and Child Segme...

  50. [50]

    Matthias R. Mehl. 2017. The Electronically Activated Recorder (EAR).Current Directions in Psychological Science26, 2 (April 2017), 184–190. https://doi.org/10.1177/0963721416680611

  51. [51]

    Marije Michel and Marco Cappellini. 2019. Alignment During Synchronous Video Versus Written Chat L2 Interactions: A Methodological Exploration.Annual Review of Applied Linguistics39 (March 2019), 189–216. https://doi.org/10.1017/s0267190519000072

  52. [52]

    Narayanan

    Amrutha Nadarajan, Krishna Somandepalli, and Shrikanth S. Narayanan. 2019. Speaker Agnostic Foreground Speech Detection from Audio Recordings in Workplace Settings from Wearable Recorders.ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(May 2019), 6765–6769. https://doi.org/10.1109/icassp.2019.8683244

  53. [53]

    Eisuke Ono, Takayuki Nozawa, Taiki Ogata, Masanari Motohashi, Naoki Higo, Tetsuro Kobayashi, Kunihiro Ishikawa, Koji Ara, Kazuo Yano, and Yoshihiro Miyake. 2011. Relationship between social interaction and mental health. In2011 IEEE/SICE International Symposium on System Integration (SII). IEEE, 246–249

  54. [54]

    Mark C Pachucki, Emily J Ozer, Alain Barrat, and Ciro Cattuto. 2015. Mental health and social networks in early adolescence: A dynamic study of objectively-measured social interaction behaviors.Social science & medicine125 (2015), 40–50

  55. [55]

    Ernest S Park and Verlin B Hinsz. 2015. Group interaction sustains positive moods and diminishes negative moods.Group dynamics: Theory, research, and practice19, 4 (2015), 290

  56. [56]

    Ji Soo Park, Sa-Yoon Park, Jae Won Moon, Kwangsoo Kim, and Dong In Suh. 2025. Artificial Intelligence Models for Pediatric Lung Sound Analysis: Systematic Review and Meta-Analysis.J Med Internet Res27 (18 Apr 2025), e66491. https://doi.org/10.2196/66491

  57. [57]

    Peperkoorn, D

    Leonard S. Peperkoorn, D. Vaughn Becker, Daniel Balliet, Simon Columbus, Catherine Molho, and Paul A. M. Van Lange. 2020. The prevalence of dyads in social life.PLOS ONE15, 12 (Dec. 2020), e0244188. https://doi.org/10.1371/journal.pone.0244188

  58. [58]

    Manoj Plakal and Dan Ellis. 2019. YAMNet. https://github.com/tensorflow/models/tree/master/research/audioset/yamnet

  59. [59]

    Aditya Ponnada, Caitlin Haynes, Dharam Maniar, Justin Manjourides, and Stephen Intille. 2017. Microinteraction Ecological Momentary Assessment Response Rates.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies1, 3 (2017), 1–16. https://doi.org/10.1145/3130957

  60. [60]

    Mashfiqui Rabbi, Shahid Ali, Tanzeem Choudhury, and Ethan Berke. 2011. Passive and In-Situ assessment of mental and physical well-being using mobile sensors.Proceedings of the 13th international conference on Ubiquitous computing(2011), 385–394. https: //doi.org/10.1145/2030112.2030164

  61. [61]

    Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury. 2015. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. InProceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing. 707–718

  62. [62]

    Md Mahbubur Rahman, Amin Ahsan Ali, Kurt Plarre, Mustafa al’Absi, Emre Ertin, and Santosh Kumar. 2011. mConverse: Inferring conversation episodes from respiratory measurements collected in the field. InProceedings of the 2nd Conference on Wireless Health. 1–10

  63. [63]

    Spaulding, and Charles T

    Madeleine Rassaby, Isabella G. Spaulding, and Charles T. Taylor. 2024. Fear of positive evaluation and social affiliation in social anxiety disorder and major depression.Journal of Anxiety Disorders107 (2024), 102931. https://doi.org/10.1016/j.janxdis.2024.102931

  64. [64]

    Richards, Dongxin Xu, and Jill Gilkerson

    Jeffrey A. Richards, Dongxin Xu, and Jill Gilkerson. 2010.Development and Performance of the LENA Automatic Autism Screen. Technical Report. LENA Foundation

  65. [65]

    Yannick Roos, Michael D Krämer, David Richter, Ramona Schoedel, and Cornelia Wrzus. 2023. Does your smartphone “know” your social life? A methodological comparison of day reconstruction, experience sampling, and mobile sensing.Adv. Methods Pract. Psychol. Sci.6, 3 (2023)

  66. [66]

    Hope Schroeder, Deb Roy, and Jad Kabbara. 2024. Fora: A corpus and framework for the study of facilitated dialogue. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 13985–1400...

  67. [67]

    Andreas Schwerdtfeger and Peter Friedrich-Mai. 2009. Social interaction moderates the relationship between depressive mood and heart rate variability: evidence from an ambulatory monitoring study.Health Psychol.(2009)

  68. [68]

    Farhad Shahmohammadi, Anahita Hosseini, Christine E King, and Majid Sarrafzadeh. 2017. Smartwatch based activity recognition using active learning. In2017 IEEE/ACM CHASE. IEEE, 321–329

  69. [69]

    Sara Shahrestani, Elizabeth M Stewart, Daniel S Quintana, Ian B Hickie, and Adam J Guastella. 2015. Heart rate variability during adolescent and adult social interactions: A meta-analysis.Biol. Psychol.105 (2015), 43–50. , Vol. 1, No. 1, Article . Publication date: February 2018. 32•Md Sabbir Ahmed, Kaitlyn Dorothy Petz, Noah French, Tanvi Lakhtakia, Aayu...

  70. [70]

    Jessie Sun, Kelci Harris, and Simine Vazire. 2020. Is well-being associated with the quantity and quality of social interactions?Journal of personality and social psychology119, 6 (2020), 1478

  71. [71]

    Campbell

    Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing(...

  72. [72]

    Wikipedia contributors. 2025. Samsung Galaxy Watch 5. https://en.wikipedia.org/wiki/Samsung_Galaxy_Watch_5. Accessed: 2026-01-21

  73. [73]

    Dongxin Xu, Umit Yapanel, Sharmistha Gray, Jill Gilkerson, Jeffrey Richards, and John Hansen. 2008. Signal processing for young child speech language development. InThe 1st Workshop on Child, Computer, and Interaction. https://api.semanticscholar.org/CorpusID: 21425835

  74. [74]

    Katelynn Boland, Hanne K

    Michael Yeomans, F. Katelynn Boland, Hanne K. Collins, Nicole Abi-Esber, and Alison Wood Brooks. 2023. A Practical Guide to Conversation Research: How to Study What People Say to Each Other.Advances in Methods and Practices in Psychological Science6, 4 (Oct. 2023). https://doi.org/10.1177/25152459231183919

  75. [75]

    Alice Zhang, Callihan Bertley, Dawei Liang, and Edison Thomaz. 2025. Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing. (2025). https://doi.org/10.48550/ARXIV.2507.12002

  76. [76]

    Sliwinski, Lynn M

    Ruixue Zhaoyang, Martin J. Sliwinski, Lynn M. Martire, and Joshua M. Smyth. 2018. Age differences in adults’ daily social interactions: An ecological momentary assessment study.Psychology and Aging33, 4 (2018), 607–618. https://doi.org/10.1037/pag0000242 , Vol. 1, No. 1, Article . Publication date: February 2018