pith. sign in

arxiv: 2605.18455 · v1 · pith:Q3SU22PAnew · submitted 2026-05-18 · 💻 cs.HC

OrganicHAR: Towards Activity Discovery in Organic Settings for Privacy Preserving Sensors Using Efficient Video Analysis

Pith reviewed 2026-05-20 08:34 UTC · model grok-4.3

classification 💻 cs.HC
keywords human activity recognitionprivacy preserving sensorsactivity discoveryvision language modelsambient sensinghome monitoringsignal patternsefficient video use
0
0 comments X

The pith

OrganicHAR discovers home activities by letting privacy-preserving sensors first find their own repeatable signal patterns and label them with video models only at those moments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that human activity recognition can work in real homes by reversing the usual order: privacy-preserving sensors like radar and thermal arrays first locate natural changes in their signals, then a vision language model is consulted only at those instants to supply labels that match what the sensors can actually tell apart. A sympathetic reader would care because existing methods either demand extensive per-home labeled data or rely on always-on cameras that raise privacy issues and fail when sensor views differ from camera views. By anchoring discovery in sensor-detectable patterns rather than camera-visible categories, the system produces user- and environment-specific activities that remain usable after the video step ends.

Core claim

OrganicHAR identifies naturally occurring signal patterns using privacy-preserving sensors, applies vision language models only during these key moments for scene understanding, and discovers discrete activity labels at granularities the sensors can reliably detect. With twelve participants it reaches 79 percent accuracy on four to five coarse activities using only ambient radar, lidar, and thermal arrays, and 73 percent on eight to nine fine-grained activities once wearable IMU, depth, and pose sensors are added, while averaging 77 percent accuracy across setups and surfacing four to eight categories per user that total fifteen distinct ones overall. Video queries fall by 90 percent because

What carries the argument

Sensor-driven detection of signal pattern changes that selectively triggers brief video analysis for labeling, ensuring every discovered activity stays within the discrimination power of the local sensors.

Load-bearing premise

Naturally occurring signal patterns from the sensors map to discrete, repeatable human activities whose labels a vision language model can supply from short triggered clips at a granularity the sensors can later distinguish without video.

What would settle it

Measure whether recognition accuracy stays near 77 percent when the system runs on new participants in completely unseen homes with no further video labeling or model adjustment.

Figures

Figures reproduced from arXiv: 2605.18455 by Adriano Soares, Ana Vasconcelos, Cristina Mendes Santos, Filippo Talami, In\^es Silva, Joana Couto da Silva, Mayank Goel, Prasoon Patidar, Ricardo Gra\c{c}a, Riku Arakawa, R\'uben Moutinho, Yuvraj Agarwal.

Figure 1
Figure 1. Figure 1: The OrganicHAR framework discovers activity labels through a three-step sensor-first approach. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualizing information across various activities from various privacy-preserving sensors inspired by our prior work [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall architecture of OrganicHAR. Raw sensor signals from different hardware configurations (§ [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Kitchen environments used in our study: (left) Kitchen 1 with compact galley layout, (middle) Kitchen 2 with island [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Percentage of video data requir￾ing VLM analysis across configurations. Our approach processes only 9-11% of total video data, demonstrating effi￾ciency compared to continuous moni￾toring. Granularity Metrics Sensor Config Ambient (Basic) Only Ambient (Basic)+ Wearable (IMU) Ambient (Advanced)+ Wearable (IMU) Conservative Accuracy 90.4%±8.9% 91.1%±6.7% 91.7%±6.9% F1 Score 89.4%±9.8% 90.6%±6.4% 90.3%±7.4% B… view at source ↗
Figure 6
Figure 6. Figure 6: Discovered activity labels across three semantic granularity settings: Conservative ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-participant accuracy across three sensing configurations. Kitchen 1 participants (P1-P4) show consistently high [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-participant F1 scores across sensing configurations, revealing sharper performance drops than accuracy metrics [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Confusion matrices showing the HAR performance using basic ambient sensors across three granularity settings. As [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Incremental training analysis of OrganicHAR: (a) Count of VLM queries after incorporating new training session. (b) [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Kitchen environments used in real-world home deployments: (Home-1) compact galley layout captured from overhead [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Overall recognition accuracy in real-world [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: Interface for customizing the sensors to be used and activity labels. The percentage(%) values show how well a [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: Impact of frame rate on label discovery perfor [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Confusion matrices showing activity recognition performance for [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Confusion matrices showing activity recognition performance for [PITH_FULL_IMAGE:figures/full_fig_p032_18.png] view at source ↗
read the original abstract

Deploying human activity recognition (HAR) at home is still rare because sensor signals vary wildly across houses, people, and time, essentially requiring in-situ data collection and training. Prior approaches use cameras to generate training labels for privacy-preserving sensors (LiDAR, RADAR, Thermal), but this forces sensors to detect predefined activities that cameras can see yet the sensors themselves cannot reliably distinguish. In this work, we introduce OrganicHAR, an activity discovery framework that inverts this relationship by placing sensor capabilities at the center of activity discovery. Our approach identifies naturally occurring signal patterns using privacy-preserving sensors, leverages Vision Language Models (VLMs) only during these key moments for scene understanding, and discovers discrete activity labels at granularities that these sensors can reliably detect. Our evaluation with 12 participants demonstrates OrganicHAR's effectiveness: it achieves 79% accuracy for coarse (4-5) activities using only basic ambient sensors (radar, lidar, thermal arrays), and 73% accuracy for fine-grained (8-9) activities when a wearable IMU, depth, and pose sensor are added. OrganicHAR maintains 77% accuracy on average across configurations while discovering 4-8 categories per user (15 across all users) tailored to each environment and sensor capabilities. By triggering video processing only at key moments identified by local sensors, we reduce queries to VLM by 90%, enabling practical and privacy-preserving activity recognition in natural settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OrganicHAR, an activity discovery framework for privacy-preserving human activity recognition in organic home settings. It uses ambient sensors (radar, lidar, thermal arrays) and optionally wearables (IMU, depth, pose) to detect natural signal patterns, triggering VLMs only at those moments to generate discrete activity labels tailored to sensor capabilities and environments. Evaluation with 12 participants reports 79% accuracy on 4-5 coarse activities with basic sensors, 73% on 8-9 fine-grained activities with added sensors, 77% average accuracy, discovery of 4-8 categories per user (15 total), and 90% reduction in VLM queries.

Significance. If the central claims hold after addressing validation gaps, the work offers a practical path to in-situ HAR that avoids predefined activity taxonomies and constant video use, potentially improving privacy and adaptability across households. The query reduction and per-user category discovery are concrete strengths that could influence sensor-driven discovery methods in HCI and ubiquitous computing.

major comments (2)
  1. [Evaluation] Evaluation with 12 participants (abstract and results section): the reported accuracies (79% coarse, 73% fine-grained, 77% average) are computed against VLM-assigned labels with no description of independent human ground-truth collection, activity boundary definitions, or inter-rater agreement metrics. This is load-bearing for the effectiveness claim because the numbers measure reproduction of VLM outputs rather than independently verifiable sensor-distinguishable activities.
  2. [Abstract] Abstract and method overview: the assumption that naturally occurring sensor signal patterns correspond to repeatable, VLM-labelable activities at a granularity the sensors can distinguish is stated but not tested against a hold-out human-annotated set or consistency checks on VLM outputs. Without this, the 90% query reduction and category counts risk being self-referential.
minor comments (2)
  1. [Method] Clarify in the method section how signal pattern detection thresholds are set and whether they are user- or environment-specific.
  2. [Abstract] The abstract lists sensor configurations but does not explicitly state the exact participant demographics or house types; adding a short table or sentence would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation methodology and the underlying assumptions of OrganicHAR. We address each major comment below and commit to revisions that strengthen the validation of our claims without altering the core contributions.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation with 12 participants (abstract and results section): the reported accuracies (79% coarse, 73% fine-grained, 77% average) are computed against VLM-assigned labels with no description of independent human ground-truth collection, activity boundary definitions, or inter-rater agreement metrics. This is load-bearing for the effectiveness claim because the numbers measure reproduction of VLM outputs rather than independently verifiable sensor-distinguishable activities.

    Authors: We agree that the primary reported accuracies are measured against VLM-assigned labels, as the framework uses VLMs to generate discrete activity categories from sensor-triggered moments. This choice enables discovery of user- and environment-specific activities without relying on predefined taxonomies. To address the concern directly, we will revise the manuscript to include a new subsection on independent validation: we collected human annotations for a random 20% subset of the detected events from three annotators, defined activity boundaries based on sensor signal changes, and will report inter-rater agreement (Fleiss' kappa) along with agreement rates between human labels and VLM outputs. This addition will demonstrate that the sensor-based classifiers capture activities distinguishable beyond VLM reproduction alone. revision: yes

  2. Referee: [Abstract] Abstract and method overview: the assumption that naturally occurring sensor signal patterns correspond to repeatable, VLM-labelable activities at a granularity the sensors can distinguish is stated but not tested against a hold-out human-annotated set or consistency checks on VLM outputs. Without this, the 90% query reduction and category counts risk being self-referential.

    Authors: The 90% VLM query reduction stems from the sensor-driven pattern detection step, which operates independently of any labels and triggers video analysis only at candidate moments; this metric is therefore not self-referential. For the discovered categories and their alignment with sensor capabilities, we acknowledge the value of additional checks. In the revised manuscript we will add: (i) consistency analysis of VLM outputs by re-querying a subset of moments with varied prompts and reporting label stability, and (ii) a hold-out human-annotated evaluation where annotators assess whether the discovered categories correspond to repeatable, sensor-distinguishable behaviors in the raw signals. These steps will provide external evidence that the per-user categories (4-8) reflect genuine activity structure rather than VLM artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical evaluation

full rationale

The paper presents a system and empirical evaluation with 12 participants that uses local sensors to trigger VLM labeling at detected signal patterns, then reports classification accuracies on the resulting (sensor, VLM-label) pairs. No mathematical derivation, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The reported 79%/73%/77% accuracies and discovered category counts are direct outcomes of the data collection and training process rather than any reduction to the inputs by construction. Label quality concerns belong to assumption validity, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical observation that sensor signals contain repeatable patterns that align with human activities and that a VLM can supply accurate scene descriptions at the moments those patterns occur. No free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5843 in / 1361 out tokens · 44222 ms · 2026-05-20T08:34:00.908236+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages

  1. [1]

    Ramokapane, and Jose M

    Noura Abdi, Kopo M. Ramokapane, and Jose M. Such. 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants. InFifteenth Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Santa Clara, CA, 451–466. https://www.usenix.org/conference/soups2019/presentation/abdi

  2. [2]

    Antonio A Aguileta, Ramon F Brena, Oscar Mayora, Erik Molino-Minero-Re, and Luis A Trejo. 2019. Multi-sensor fusion for activity recognition—A survey.Sensors19, 17 (2019), 3808

  3. [3]

    Karan Ahuja, Yue Jiang, Mayank Goel, and Chris Harrison. 2021. Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 292, 10 pages...

  4. [4]

    Riku Arakawa, Jill Fain Lehman, and Mayank Goel. 2024. PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.8, 4 (Nov. 2024), 180:1–180:26. https://doi.org/10.1145/3699759

  5. [5]

    Riku Arakawa, Prasoon Patidar, Will Page, Jill Lehman, and Mayank Goel. 2025. Scaling Context-Aware Task Assistants that Learn from Demonstration and Adapt through Mixed-Initiative Dialogue. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 1...

  6. [6]

    Riku Arakawa, Hiromu Yakura, and Mayank Goel. 2024. PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a Smartwatch. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3654777.3676350

  7. [7]

    DeMeo, Haarika A

    Riku Arakawa, Hiromu Yakura, Vimal Mollyn, Suzanne Nie, Emma Russell, Dustin P. DeMeo, Haarika A. Reddy, Alexander K. Maytin, Bryan T. Carroll, Jill Fain Lehman, and Mayank Goel. 2023. PrISM-Tracker: A Framework for Multimodal Procedure Tracking Using Wearable Sensors and State Transition Information with User-Driven Handling of Errors and Uncertainty.Pro...

  8. [8]

    Paola Ariza Colpas, Enrico Vicario, Emiro De-La-Hoz-Franco, Marlon Pineres-Melo, Ana Oviedo-Carrascal, and Fulvio Patara. 2020. Unsupervised Human Activity Recognition Using the Clustering Approach: A Review.Sensors20, 9 (Jan. 2020), 2702. https://doi.org/10. 3390/s20092702 Number: 9 Publisher: Multidisciplinary Digital Publishing Institute

  9. [9]

    Luca Arrotta, Claudio Bettini, Gabriele Civitarese, and Michele Fiori. 2024. ContextGPT: Infusing LLMs Knowledge into Neuro-Symbolic Activity Recognition Models. In2024 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, Osaka, Japan, 55–62

  10. [10]

    Autonomous. 2024. AUTONOMOUS; Co-Designing Independence — autonomous-project.com. https://www.autonomous-project.com/. [Accessed 10-10-2025]

  11. [11]

    Awan-Ur-Rahman. 2023. Understanding Soft Voting and Hard Voting: A Comparative Analysis of Ensemble Learning Meth- ods. https://medium.com/@awanurrahman.cse/understanding-soft-voting-and-hard-voting-a-comparative-analysis-of-ensemble- learning-methods-db0663d2c008

  12. [12]

    Oresti Banos, Juan-Manuel Galvez, Miguel Damas, Hector Pomares, and Ignacio Rojas. 2014. Window Size Impact in Human Activity Recognition.Sensors14, 4 (April 2014), 6474–6499. https://doi.org/10.3390/s140406474 Number: 4 Publisher: Multidisciplinary Digital Publishing Institute

  13. [13]

    Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2021. IMU2Doppler: Cross-Modal Domain Adaptation for Doppler-based Activity Recognition Using IMU Data.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies5, 4 (2021), 1–20

  14. [14]

    Sarnab Bhattacharya, Rebecca Adaimi, and Edison Thomaz. 2022. Leveraging sound and wrist motion to detect activities of daily living with commodity smartwatches.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 2 (2022), 42:1–42:28. https://doi.org/10.1145/3534582

  15. [15]

    Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Me...

  16. [16]

    Damien Bouchabou, Sao Mai Nguyen, Christophe Lohr, Benoit LeDuc, and Ioannis Kanellos. 2021. A Survey of Human Activity Recognition in Smart Homes Based on IoT Sensors Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning.Sensors (Basel, Switzerland)21, 18 (Sept. 2021), 6037. https://doi.org/10.3390/s21186037

  17. [17]

    Bernheim Brush, Bongshin Lee, Ratul Mahajan, Sharad Agarwal, Stefan Saroiu, and Colin Dixon

    A.J. Bernheim Brush, Bongshin Lee, Ratul Mahajan, Sharad Agarwal, Stefan Saroiu, and Colin Dixon. 2011. Home automation in the wild: challenges and opportunities. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2115–2124. https://doi.org/10.1145/1978942.1979249...

  18. [18]

    Timothy I Cannings, Yingying Fan, and Richard J Samworth. 2020. Classification with imperfect training labels.Biometrika107, 2 (2020), 311–330

  19. [19]

    João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, 4724–4733. https://doi.org/10.1109/CVPR.2017.502

  20. [20]

    Gabriele Cipriani, Sabrina Danti, Lucia Picchi, Angelo Nuti, and Mario Di Fiorino. 2020. Daily functioning and dementia.Dementia & Neuropsychologia14, 2 (2020), 93–102. https://doi.org/10.1590/1980-57642020dn14-020001

  21. [21]

    Diane Cook, Narayanan Krishnan, and Parisa Rashidi. 2013. Activity Discovery and Activity Recognition: A New Partnership.IEEE transactions on cybernetics43, 3 (June 2013), 820–828. https://doi.org/10.1109/TSMCB.2012.2216873

  22. [22]

    Ivan Culjak, David Abram, Tomislav Pribanic, Hrvoje Dzapo, and Mario Cifrek. 2012. A brief introduction to OpenCV. In2012 Proceedings of the 35th International Convention MIPRO. IEEE, Opatija, Croatia, 1725–1730

  23. [23]

    Smith, and Flora D

    Shohreh Deldari, Hao Xue, Aaqib Saeed, Jiayuan He, Daniel V. Smith, and Flora D. Salim. 2022. Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data. arXiv:2206.02353 [cs.LG]

  24. [24]

    Kaikai Deng, Dong Zhao, Zihan Zhang, Shuyue Wang, Wenxin Zheng, and Huadong Ma. 2024. Midas++: Generating Training Data of mmWave Radars From Videos for Privacy-Preserving Human Sensing With Mobility.IEEE Transactions on Mobile Computing23, 6 (June 2024), 6650–6666. https://doi.org/10.1109/TMC.2023.3325399

  25. [25]

    Nathan DeVrio, Vimal Mollyn, and Chris Harrison. 2023. SmartPoser: Arm Pose Estimation with a Smartphone and Smartwatch Using UWB and IMU Data. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3586183.3606821

  26. [26]

    Ha, Emma Russell, Haarika A

    Megan V. Ha, Emma Russell, Haarika A. Reddy, Alexander K. Maytin, Dustin P. DeMeo, Riku Arakawa, Mayank Goel, Jill F. Lehman, and Bryan T. Carroll. 2024. Self-narration for patient monitoring with smartwatch technology in post-operative wound care after dermatologic surgery.Archives of Dermatological Research316, 7 (June 2024), 389. https://doi.org/10.100...

  27. [27]

    Harris, K

    Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

  28. [28]

    Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, USA, 961–970. https://doi.org/10.1109/CVPR.2015.7298698

  29. [29]

    Hiremath, Yasutaka Nishimura, Sonia Chernova, and Thomas Plötz

    Shruthi K. Hiremath, Yasutaka Nishimura, Sonia Chernova, and Thomas Plötz. 2022. Bootstrapping Human Activity Recognition Systems for Smart Homes from Scratch.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 3 (Sept. 2022), 1–27. https://doi.org/10.1145/3550294

  30. [30]

    Hiremath and Thomas Plötz

    Shruthi K. Hiremath and Thomas Plötz. 2023. The Lifespan of Human Activity Recognition Systems for Smart Homes.Sensors23, 18 (Jan. 2023), 7729. https://doi.org/10.3390/s23187729 Number: 18 Publisher: Multidisciplinary Digital Publishing Institute

  31. [31]

    Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil Mathur. 2022. ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.6, 1, Article 17 (mar 2022), 28 pages. https: //doi.org/10.1145/3517246

  32. [32]

    Ahmad Jalal, Shaharyar Kamal, and Daijin Kim. 2017. A Depth Video-based Human Detection and Activity Recognition using Multi- features and Embedded Hidden Markov Models for Health Care Monitoring Systems.International Journal of Interactive Multimedia and Artificial Intelligence4, Regular Issue (2017), 54–62. https://www.ijimai.org/journal/bibcite/reference/2606

  33. [33]

    Tianjie Ju, Yi Hua, Hao Fei, Zhenyu Shao, Yubin Zheng, Haodong Zhao, Mong-Li Lee, Wynne Hsu, Zhuosheng Zhang, and Gongshen Liu. 2025. Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models. https: //doi.org/10.48550/arXiv.2503.01208 arXiv:2503.01208 [cs]

  34. [34]

    Alexander Karpekov, Sonia Chernova, and Thomas Plötz. 2025. DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes. https://doi.org/10.48550/arXiv.2503.01733 arXiv:2503.01733 [cs]

  35. [35]

    Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D Abowd, Nicholas D Lane, and Thomas Ploetz. 2020. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies4, 3 (2020), 1–29

  36. [36]

    Gierad Laput and Chris Harrison. 2019. SurfaceSight: A New Spin on Touch, User, and Object Sensing for IoT Experiences. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300559

  37. [37]

    Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. InProc. of the 2017 CHI Conference on Human Factors in Computing Systems(Denver, Colorado, USA)(CHI ’17). ACM, New York, NY, USA, 3986–3999. https://doi.org/10.1145/3025453.3025773 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 9, No. 4, Ar...

  38. [38]

    Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. 2017. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning.Journal of Machine Learning Research18, 17 (2017), 1–5. http://jmlr.org/papers/v18/16-365.html

  39. [39]

    Zikang Leng, Amitrajit Bhattacharjee, Hrudhai Rajasekhar, Lizhe Zhang, Elizabeth Bruda, Hyeokhyen Kwon, and Thomas Plötz. 2024. IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 3 (Aug. 2024), 1–32. https://doi.org/10.1145/3678545

  40. [40]

    Zikang Leng, Hyeokhyen Kwon, and Thomas Ploetz. 2023. Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition. InProceedings of the 2023 ACM International Symposium on Wearable Computers (ISWC ’23). Association for Computing Machinery, New York, NY, USA, 39–43. https://doi.org/10.1145/3594738.3611361

  41. [41]

    Zikang Leng, Hyeokhyen Kwon, and Thomas Plötz. 2023. On the Benefit of Generative Foundation Models for Human Activity Recognition. https://doi.org/10.48550/arXiv.2310.12085 arXiv:2310.12085 [cs]

  42. [42]

    Dawei Liang, Guihong Li, Rebecca Adaimi, Radu Marculescu, and Edison Thomaz. 2022. AudioIMU: Enhancing Inertial Sensing-Based Activity Recognition with Acoustic Models. InProceedings of the 2022 ACM International Symposium on Wearable Computers(Cambridge, United Kingdom)(ISWC ’22). Association for Computing Machinery, New York, NY, USA, 44–48. https://doi...

  43. [43]

    Sicong Liu, Junzhao Du, Anshumali Shrivastava, and Lin Zhong. 2019. Privacy Adversarial Network.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3, 4 (dec 2019), 1–18. https://doi.org/10.1145/3369816

  44. [44]

    Tian-Yu Liu. 2009. EasyEnsemble and Feature Selection for Imbalance Data Sets. In2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing. IEEE, Shanghai, China, 517–520. https://doi.org/10.1109/IJCBS.2009.22

  45. [45]

    Harsh Lunia. 2024. Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators. https://doi.org/10. 48550/arXiv.2407.14834 arXiv:2407.14834 [cs] version: 1

  46. [46]

    Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.The Journal of Open Source Software 2, 11 (March 2017), 205. https://doi.org/10.21105/joss.00205

  47. [47]

    Mites.io. 2020. Mites.io: a full-stack ubiquitous sensing platform. https://mites.io/

  48. [48]

    MMAction2. 2020. OpenMMLab’s Next Generation Video Understanding Toolbox and Benchmark. https://github.com/open-mmlab/ mmaction2

  49. [49]

    MMPose. 2020. OpenMMLab Pose Estimation Toolbox and Benchmark. https://github.com/open-mmlab/mmpose

  50. [50]

    Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 3 (2022), 1–19

  51. [51]

    Vimal Mollyn, Riku Arakawa, Mayank Goel, Chris Harrison, and Karan Ahuja. 2023. IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3544548.3581392

  52. [52]

    Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-Based Sensor Fusion Techniques for Multimodal Human Activity Recognition. InProceedings of the 2017 ACM International Symposium on Wearable Computers(Maui, Hawaii)(ISWC ’17). Association for Computing Machinery, New York, NY, USA, 158–1...

  53. [53]

    OpenAI. 2025. OpenAI API. https://platform.openai.com/docs/api-reference/ Accessed: 2025-04-29

  54. [54]

    2020, doi: 10.5281/zenodo.3509134

    The pandas development team. 2020.pandas-dev/pandas: Pandas. pandas-dev. https://doi.org/10.5281/zenodo.3509134

  55. [55]

    Preksha Pareek and Ankit Thakkar. 2021. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications.Artificial Intelligence Review54, 3 (2021), 2259–2322

  56. [56]

    Prasoon Patidar, Mayank Goel, and Yuvraj Agarwal. 2023. VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 3 (Sept. 2023), 1–24. https://doi.org/10.1145/3610907

  57. [57]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830

  58. [58]

    Daniel Perazzo, Natalia Souza Soares, Victor Gouveia de Menezes Lyra, Gustavo Camargo Rocha Lima, Alana Elza Fontes da Gama, Joao Marcelo Xavier Natario Teixeira, and Veronica Teichrieb. 2022. OAK-D as a Platform for Human Movement Analysis: A Case Study. InProceedings of the 23rd Symposium on Virtual and Augmented Reality(Virtual Event, Brazil)(SVR ’21)....

  59. [59]

    Prasoon Patidar, Riku Arakawa, Mayank Goel, Yuvraj Agarwal. 2025. OrganicHAR: Open-source repository for the OrganicHAR. https://github.com/synergylabs/OrganicHAR

  60. [60]

    Riccardo Presotto, Gabriele Civitarese, and Claudio Bettini. 2022. Federated Clustering and Semi-Supervised learning: A new partnership for personalized Human Activity Recognition.Pervasive and Mobile Computing88 (2022), 101726

  61. [61]

    Suneth Ranasinghe, Fadi Al Machot, and Heinrich C Mayr. 2016. A review on applications of activity recognition systems with regard to performance and evaluation.International Journal of Distributed Sensor Networks12, 8 (2016), 1550147716665520. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 9, No. 4, Article 203. Publication date: December 20...

  62. [62]

    Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, and David Lindner. 2024. Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning. https://doi.org/10.48550/arXiv.2310.12921 arXiv:2310.12921 [cs] version: 2

  63. [63]

    Laurens Samson, Nimrod Barazani, Sennay Ghebreab, and Yuki M. Asano. 2025. Little Data, Big Impact: Privacy-Aware Visual Language Models via Minimal Tuning. https://doi.org/10.48550/arXiv.2405.17423 arXiv:2405.17423 [cs]

  64. [64]

    Khoshgoftaar, Jason Van Hulse, and Amri Napolitano

    Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2010. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance.IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans40, 1 (Jan. 2010), 185–197. https://doi.org/10.1109/TSMCA.2009.2029559

  65. [65]

    Pekka Siirtola and Juha Röning. 2019. Incremental Learning to Personalize Human Activity Recognition Models: The Importance of Human AI Collaboration.Sensors (Basel, Switzerland)19, 23 (Nov. 2019), 5151. https://doi.org/10.3390/s19235151

  66. [66]

    Adane Nega Tarekegn, Mohib Ullah, Faouzi Alaya Cheikh, and Muhammad Sajjad. 2023. Enhancing Human Activity Recognition Through Sensor Fusion And Hybrid Deep Learning Model. In2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, Rhodes Island, Greece, 1–5. https://doi.org/10.1109/ICASSPW59220.2023.10193698

  67. [67]

    Maytin, Yash Kumar, Toluwalashe Onamusi, Haarika A

    Annalise Vaccarello, Alexander K. Maytin, Yash Kumar, Toluwalashe Onamusi, Haarika A. Reddy, Mayank Goel, Riku Arakawa, Jill Fain Lehman, and Bryan T. Carroll. 2024. Barriers to use of digital assistance for postoperative wound care: a single-center survey of dermatologic surgery patients.Archives of Dermatological Research316, 7 (June 2024), 376. https:/...

  68. [68]

    Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St´ efan J

    Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Mo...

  69. [69]

    Michalis Vrigkas, Christophoros Nikou, and Ioannis A Kakadiaris. 2015. A review of human activity recognition methods.Frontiers in Robotics and AI2 (2015), 28

  70. [70]

    Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, and Suhang Wang. 2024. A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness. arXiv:2411.0335...

  71. [71]

    Shuai Wang, Luoyu Mei, Ruofeng Liu, Wenchao Jiang, Zhimeng Yin, Xianjun Deng, and Tian He. 2025. Multi-Modal Fusion Sensing: A Comprehensive Review of Millimeter-Wave Radar and Its Integration With Other Modalities.IEEE Commun. Surv. Tutorials27, 1 (2025), 322–352. https://doi.org/10.1109/COMST.2024.3398004

  72. [72]

    Pete Warden, Matthew Stewart, Brian Plancher, Colby Banbury, Shvetank Prakash, Emma Chen, Zain Asgar, Sachin Katti, and Vijay Janapa Reddi. 2022. Machine Learning Sensors. https://doi.org/10.48550/ARXIV.2206.03266

  73. [73]

    Why is ’Chicago’ deceptive?

    Jason Wu, Chris Harrison, Jeffrey P. Bigham, and Gierad Laput. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–14. https://doi.org/10.1145/3313831.3376875

  74. [74]

    Tong Wu, Murtadha Aldeer, Tahiya Chowdhury, Amber Haynes, Fateme Nikseresht, Mahsa Pahlavikhah Varnosfaderani, Jiechao Gao, Arsalan Heydarian, Brad Campbell, and Jorge Ortiz. 2021. The Smart Building Privacy Challenge. InProceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation(Coimbra, Portu...

  75. [75]

    Chengshuo Xia, Xinrui Fang, Riku Arakawa, and Yuta Sugiura. 2022. VoLearn: A Cross-Modal Operable Motion-Learning System Combined with Virtual Avatar and Auditory Feedback.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.6, 2 (2022), 81:1–81:26. https://doi.org/10.1145/3534576

  76. [76]

    In: Annals of Operations Research

    Kenji Yamanishi, Jun’ichi Takeuchi, Graham J. Williams, and Peter Milne. 2004. On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms.Data Mining and Knowledge Discovery8, 3 (2004), 275–300. https://doi.org/10.1023/B: DAMI.0000023676.72185.7c

  77. [77]

    Murat Yağcı, Tevfik Aytekin, and Fikret S

    A. Murat Yağcı, Tevfik Aytekin, and Fikret S. Gürgen. 2016. Balanced random forest for imbalanced data streams. In2016 24th Signal Processing and Communication Application Conference (SIU). IEEE, Zonguldak, Turkey, 1065–1068. https://doi.org/10.1109/SIU.2016. 7495927

  78. [78]

    Nguyen, Taesik Gong, and Sung-Ju Lee

    Hyungjun Yoon, Hyeongheon Cha, Hoang C. Nguyen, Taesik Gong, and Sung-Ju Lee. 2024. IMG2IMU: Translating Knowledge from Large-Scale Images to IMU Sensing Applications. https://doi.org/10.48550/arXiv.2209.00945 arXiv:2209.00945 [cs]

  79. [79]

    Sojeong Yun and Youn-kyung Lim. 2025. What If Smart Homes Could See Our Homes?: Exploring DIY Smart Home Building Experiences with VLM-Based Camera Sensors. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3706598.3713265

  80. [80]

    Shugang Zhang, Zhiqiang Wei, Jie Nie, Lei Huang, Shuang Wang, and Zhen Li. 2017. A Review on Human Activity Recognition Using Vision-Based Method.Journal of Healthcare Engineering2017, 1 (2017), 3090343. https://doi.org/10.1155/2017/3090343 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 9, No. 4, Article 203. Publication date: December 2025. ...

Showing first 80 references.