pith. sign in

arxiv: 2604.00767 · v2 · pith:LU5FA7WYnew · submitted 2026-04-01 · 💻 cs.LG

ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

classification 💻 cs.LG
keywords activityunderstandingclassificationdownstreamhumanopen-endedparadigmwearable
0
0 comments X
read the original abstract

Wearable human activity recognition (HAR) has made steady progress, yet much of this progress remains grounded in fixed-window, closed-set classification benchmarks. This formulation is poorly matched to everyday behavior, where activities are open-ended, unscripted, personalized, variable in duration, and often compositional. To address this mismatch, we introduce ActivityNarrated, an open-ended narrative paradigm for language-grounded wearable activity understanding. We formulate this setting as dense sensor signal captioning with a comprehensive benchmark protocol that measures temporal localization, caption quality, sensor-language alignment, conventional closed-set classification as a downstream diagnostic, and additional robustness measures. We further present ActNarrator, a 3-stage architecture that discretizes continuous IMU signals into reusable motion tokens and uses an external frozen small language model to generate open-vocabulary activity captions. Experiments show that our method provides high quality dense sensor captioning with superior adaptivity and robustness, enabling various downstream tasks by turning sensor-based human activity understanding into sensor-grounded text-level reasoning. This includes downstream classification where ActNarrator outperforms state-of-the-art HAR models by 3.8 - 31.6 \% in Macro-F1. This paradigm also enables novel activity understanding capabilities such as complex question-answering over long time horizons.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.