TeamLLM: Exploring the Capabilities of LLMs for Multimodal Group Interaction Prediction

· 2026 · cs.HC · arXiv 2604.08771

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Predicting group behavior, how individuals coordinate, communicate, and interact during collaborative tasks, is essential for designing systems that can support team performance through real-time prediction and realistic simulation of collaborative scenarios. Large Language Models (LLMs) have shown promise for processing sensor data for human-activity recognition (HAR), yet their capabilities for team dynamics or group-level multimodal sensing remain unexplored. This paper investigates whether LLMs can predict group coordination patterns from multimodal sensor data in collaborative Mixed Reality (MR) environments. We encode hierarchical context -- individual behavioral profiles, group structural properties, and temporal activity context -- as natural language and evaluate three LLM adaptation paradigms (zero-shot, few-shot, and supervised fine-tuning) against statistical baselines. Our evaluation on 16 groups (64 participants, $\sim$25 hours of sensor data) reveals that LLMs achieve 3.2$\times$ improvement over LSTM baselines for linguistically-grounded behaviors, with fine-tuning reaching 96\% accuracy for conversation prediction while maintaining sub-35ms latency. Beyond performance gains, we characterize the boundaries of text-based LLMs for multimodal sensing conversation prediction succeeds because turn-taking maps to linguistic patterns, while shared or joint attention may require spatial and visual reasoning that text only LLMs cannot capture. We further identify simulation mode brittleness (83\% degradation from cascading context errors) and minimal few-shot sensitivity to example selection strategy. These findings establish guidelines when LLMs are appropriate for CPS/IoT sensing for team dynamics and inform the design of future multimodal foundation models.

representative citing papers

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

LLMs exhibit authority inversion by prioritizing natural-language user claims over numerical sensor data in conflicts, diagnosed with new geometric metrics and mitigated via layer-level calibration.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors cs.AI · 2026-04-28 · unverdicted · none · ref 29 · internal anchor
LLMs exhibit authority inversion by prioritizing natural-language user claims over numerical sensor data in conflicts, diagnosed with new geometric metrics and mitigated via layer-level calibration.

TeamLLM: Exploring the Capabilities of LLMs for Multimodal Group Interaction Prediction

fields

years

verdicts

representative citing papers

citing papers explorer