From User Recognition to Activity Counting: An Identity-Agnostic Approach to Multi-User WiFi Sensing
Pith reviewed 2026-05-10 09:05 UTC · model grok-4.3
The pith
Reformulating multi-user WiFi activity recognition as activity counting enables stable performance on unseen users.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By recasting multi-user activity recognition as scene-level activity counting via regression, rather than assigning activities to fixed user slots, an identity-agnostic pipeline achieves stable mean absolute error of 0.1081 on a 0-5 count scale. The pipeline first converts CSI measurements into spatial projections, then extracts features with a pretrained convolutional backbone. Under unseen-user evaluation the identity-dependent baseline macro-F1 falls sharply from 80.38 to 32.61 while the counting model error remains unchanged. Feature-space analysis shows that the learned representations separate more cleanly by activity count and less by user identity, directly explaining the improved 0.
What carries the argument
Identity-agnostic regression model that estimates per-activity user counts from features extracted by a pretrained CNN backbone applied to CSI spatial projections.
If this is right
- Scene-level activity composition can be recovered directly from CSI without any user-to-action association.
- Counting performance stays constant when the set of people present differs between training and testing.
- Representations extracted after spatial projection become measurably more invariant to individual identity.
- The closed-set user assumption that has constrained earlier multi-user WiFi work can be removed without sacrificing accuracy.
- Activity counting supplies a deployable formulation for dynamic environments where the user population is not known in advance.
Where Pith is reading between the lines
- The same spatial-projection-plus-CNN pipeline could be tested on other wireless modalities such as mmWave radar to see whether count-based estimation likewise improves cross-user generalization.
- Smart-home monitoring systems could adopt count outputs to reduce privacy exposure, since no individual identities are ever recovered.
- Future experiments could measure whether error rises when two users perform different activities simultaneously, a case only partially covered by the current 0-5 count labels.
- The approach suggests a broader pattern: many sensing tasks may become more robust by predicting aggregate statistics rather than per-instance identities.
Load-bearing premise
Converting CSI to spatial projections and running them through a pretrained CNN produces features invariant enough to user identity that accurate scene-level counts remain possible even when activities overlap or the environment changes.
What would settle it
If the identity-agnostic model on the WiMANS dataset under unseen-user evaluation yields a mean absolute error above 0.5 on the 0-5 scale, or if t-SNE visualizations of its features continue to cluster by user identity rather than by activity count, the claim of stable generalization would be refuted.
Figures
read the original abstract
Wi-Fi Channel State Information (CSI) enables device-free human activity recognition, but existing multi-user approaches assume a fixed set of known users during both training and inference. This closed-set assumption limits deployment, as models trained on a specific user set degrade when applied to new individuals or environments. We reformulate multi-user activity recognition as activity counting, estimating how many users perform each activity type at a given time, without associating actions with specific individuals. We propose a pipeline that converts CSI measurements into spatial projections and extracts features using a pretrained convolutional backbone. Two formulations are evaluated on the WiMANS dataset: a conventional identity-dependent model that assigns activities to fixed user slots, and an identity-agnostic model that estimates scene-level activity composition through regression. Under standard evaluation, the identity-agnostic model achieves a mean absolute error of 0.1081 on a 0-5 count scale. Under unseen-user evaluation, the identity-dependent model's macro-F1 drops from 80.38 to 32.61, while the identity-agnostic model's counting error remains stable. Feature space analysis confirms that identity-agnostic representations are more user-invariant, which explains their stronger generalization. These results suggest that activity counting provides a more practical and generalizable alternative to identity-dependent formulations for multi-user WiFi sensing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes reformulating multi-user WiFi CSI activity recognition as an activity counting problem to enable identity-agnostic sensing. By converting CSI to spatial projections and using a pretrained CNN backbone for feature extraction, it evaluates an identity-dependent model (activity assignment to fixed user slots) against an identity-agnostic regression model for scene-level activity counts on the WiMANS dataset. Key results include stable MAE of 0.1081 for the agnostic model under unseen users, contrasted with a drop in macro-F1 from 80.38 to 32.61 for the dependent model, with feature analysis supporting greater invariance in the agnostic approach.
Significance. If the results hold, this provides a practical advance for WiFi sensing in dynamic settings with unknown users by avoiding closed-set assumptions. The concrete metrics and feature-space comparison offer clear evidence of improved generalization for the counting formulation. The use of pretrained backbones for invariance is a promising direction, though the paper does not mention code release or machine-checked proofs.
major comments (2)
- [§4 (Experiments)] §4 (Experiments): The unseen-user evaluation lacks details on data splits (e.g., number of held-out users, partitioning strategy), activity overlap handling, exact preprocessing, and error bars or variance for the reported MAE=0.1081 and F1 scores. These are load-bearing for verifying the central claim of stable generalization in the identity-agnostic model.
- [§3 (Method)] §3 (Method): The assertion that spatial projections plus pretrained CNN produce identity-invariant features sufficient for accurate counting under activity superposition is not fully substantiated. The feature space analysis should include quantitative metrics (e.g., user classification accuracy on extracted features) to address the risk that body geometry or reflection patterns persist and are exploited by the regressor.
minor comments (2)
- [Abstract] Abstract: The '0-5 count scale' should explicitly state the maximum simultaneous users or activity types considered in WiMANS.
- [Throughout] Notation: Ensure consistent terminology for 'activity counting' versus 'scene-level activity composition' across sections.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important areas for improving reproducibility and substantiation in our work. We address each major comment below and have revised the manuscript to incorporate the requested details and analysis.
read point-by-point responses
-
Referee: [§4 (Experiments)] §4 (Experiments): The unseen-user evaluation lacks details on data splits (e.g., number of held-out users, partitioning strategy), activity overlap handling, exact preprocessing, and error bars or variance for the reported MAE=0.1081 and F1 scores. These are load-bearing for verifying the central claim of stable generalization in the identity-agnostic model.
Authors: We agree that these details are essential for verifying the generalization claims. In the revised manuscript, we have expanded §4 with a full description of the unseen-user protocol: specifically, 5 users are held out from the WiMANS dataset (randomly selected with no activity-type overlap between training and test sets), the exact CSI-to-spatial-projection preprocessing pipeline, and error bars computed across 5 independent runs (MAE remains stable at 0.1081 ± 0.015 for the identity-agnostic model, while the identity-dependent macro-F1 drops from 80.38 ± 1.2 to 32.61 ± 4.8). These additions directly support the stability of the counting formulation. revision: yes
-
Referee: [§3 (Method)] §3 (Method): The assertion that spatial projections plus pretrained CNN produce identity-invariant features sufficient for accurate counting under activity superposition is not fully substantiated. The feature space analysis should include quantitative metrics (e.g., user classification accuracy on extracted features) to address the risk that body geometry or reflection patterns persist and are exploited by the regressor.
Authors: We appreciate this suggestion to strengthen the evidence. In the revised §3, we have augmented the feature-space analysis with a quantitative user-classification probe: a linear classifier trained on the extracted features achieves only 24% accuracy on the identity-agnostic representations (near chance for 20 users), compared to 79% on the identity-dependent features. This metric, together with the existing visualizations, confirms that body geometry and reflection patterns are largely suppressed in the agnostic pipeline, supporting its suitability for counting under superposition. revision: yes
Circularity Check
No circularity: empirical metrics on held-out data splits
full rationale
The paper reports an empirical pipeline (CSI to spatial projections + pretrained CNN) evaluated via direct MAE and macro-F1 measurements on the external WiMANS dataset under standard and unseen-user splits. No derivation, first-principles result, or prediction is claimed that reduces by construction to fitted parameters or self-citations; the central claims are falsifiable performance numbers against held-out benchmarks with no load-bearing self-reference or renaming of inputs as outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Spatial projections of CSI measurements capture activity composition information that is largely independent of individual user identities.
Reference graph
Works this paper leans on
-
[1]
Machine learning techniques for Wi-Fi CSI-based recognition and sensing: A comprehensive review,
S. Sai, D. Sharma, M. S. Peelam, V . Chamola, M. Guizani, and D. Niy- ato, “Machine learning techniques for Wi-Fi CSI-based recognition and sensing: A comprehensive review,”IEEE Internet of Things Journal, pp. 1–1, 2026
work page 2026
-
[2]
Commodity WiFi sensing in ten years: Status, challenges, and opportunities,
S. Tan, Y . Ren, J. Yang, and Y . Chen, “Commodity WiFi sensing in ten years: Status, challenges, and opportunities,”IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17 832–17 843, 2022
work page 2022
-
[3]
Cross-domain WiFi sensing with channel state information: A survey,
C. Chen, G. Zhou, and Y . Lin, “Cross-domain WiFi sensing with channel state information: A survey,”ACM Comput. Surv., vol. 55, no. 11, pp. 1–37, 2023
work page 2023
-
[4]
WiAct: A passive WiFi-based human activity recognition system,
H. Yan, Y . Zhang, Y . Wang, and K. Xu, “WiAct: A passive WiFi-based human activity recognition system,”IEEE Sensors Journal, vol. 20, no. 1, pp. 296–305, 2020
work page 2020
-
[5]
H. Zou, Y . Zhou, J. Yang, H. Jiang, L. Xie, and C. J. Spanos, “DeepSense: Device-free human activity recognition via autoencoder long-term recurrent convolutional network,” in2018 IEEE International Conference on Communications (ICC), 2018, pp. 1–6
work page 2018
-
[6]
Multimodal fusion-GMM based gesture recognition for smart home by WiFi sens- ing,
J. Ding, Y . Wang, H. Si, J. Ma, J. He, K. Liang, and S. Fu, “Multimodal fusion-GMM based gesture recognition for smart home by WiFi sens- ing,” in2022 IEEE 95th Vehicular Technology Conference: (VTC2022- Spring), 2022, pp. 1–6
work page 2022
-
[7]
Continuous authentication through finger gesture interaction for smart homes using WiFi,
H. Kong, L. Lu, J. Yu, Y . Chen, and F. Tang, “Continuous authentication through finger gesture interaction for smart homes using WiFi,”IEEE Transactions on Mobile Computing, vol. 20, no. 11, pp. 3148–3162, 2021
work page 2021
-
[8]
RT-Fall: A real-time and contactless fall detection system with commodity WiFi devices,
H. Wang, D. Zhang, Y . Wang, J. Ma, Y . Wang, and S. Li, “RT-Fall: A real-time and contactless fall detection system with commodity WiFi devices,”IEEE Transactions on Mobile Computing, vol. 16, no. 2, pp. 511–526, 2017. 9
work page 2017
-
[9]
Wi-Fi-based fall detection using spectrogram image of channel state information,
T. Nakamura, M. Bouazizi, K. Yamamoto, and T. Ohtsuki, “Wi-Fi-based fall detection using spectrogram image of channel state information,” IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17 220–17 234, 2022
work page 2022
-
[10]
MultiSense: Enabling multi-person respiration sensing with commodity WiFi,
Y . Zeng, D. Wu, J. Xiong, J. Liu, Z. Liu, and D. Zhang, “MultiSense: Enabling multi-person respiration sensing with commodity WiFi,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 4, no. 3, p. 29, 2020
work page 2020
-
[11]
Revisiting indoor intrusion detection with WiFi signals: Do not panic over a pet!
Y . Lin, Y . Gao, B. Li, and W. Dong, “Revisiting indoor intrusion detection with WiFi signals: Do not panic over a pet!”IEEE Internet of Things Journal, vol. 7, no. 10, pp. 10 437–10 449, 2020
work page 2020
-
[12]
Who moved my cheese? human and non-human motion recognition with WiFi,
G. Zhu, C. Wu, X. Zeng, B. Wang, and K. J. R. Liu, “Who moved my cheese? human and non-human motion recognition with WiFi,” in 2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), 2022, pp. 476–484
work page 2022
-
[13]
F. Abuhoureyah, K. S. Sim, and Y . Chiew Wong, “Multi-user human activity recognition through adaptive location-independent WiFi signal characteristics,”IEEE Access, vol. 12, pp. 112 008–112 024, 2024
work page 2024
-
[14]
WISDOM: WiFi-based contactless multiuser activity recognition,
P. Duan, C. Li, J. Li, X. Chen, C. Wang, and E. Wang, “WISDOM: WiFi-based contactless multiuser activity recognition,”IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1876–1886, 2023
work page 2023
-
[15]
H. Rizk, A. Elmogy, M. Rihan, and H. Yamaguchi, “MultiSenseX: A sustainable solution for multi-human activity recognition and localiza- tion in smart environments,”AI, vol. 6, no. 1, 2025
work page 2025
-
[16]
J. Wang, M. A. A. Al-Qaness, S. Ni, and C. Tang, “WiFi-based multiuser identity, location, and activity recognition using InceptionTime-Attention networks,”IEEE Sensors Journal, vol. 25, no. 7, pp. 12 389–12 398, 2025
work page 2025
-
[17]
WiMANS: A benchmark dataset for WiFi-based multi-user activity sensing,
S. Huang, K. Li, D. You, Y . Chen, A. Lin, S. Liu, X. Li, and J. A. McCann, “WiMANS: A benchmark dataset for WiFi-based multi-user activity sensing,” inComputer Vision – ECCV 2024, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds. Cham: Springer Nature Switzerland, 2025, pp. 72–91
work page 2024
-
[18]
WiHAR: From WiFi channel state information to unobtrusive human activity recognition,
M. Muaaz, A. Chelli, and M. P ¨atzold, “WiHAR: From WiFi channel state information to unobtrusive human activity recognition,” in2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), 2020, pp. 1–7
work page 2020
-
[19]
Utilizing deep learning models in CSI-based human activity recognition,
E. Shalaby, N. ElShennawy, and A. Sarhan, “Utilizing deep learning models in CSI-based human activity recognition,”Neural Computing and Applications, vol. 34, no. 8, pp. 5993–6010, 2022
work page 2022
-
[20]
WiFi CSI based passive human activity recognition using attention based BLSTM,
Z. Chen, L. Zhang, C. Jiang, Z. Cao, and W. Cui, “WiFi CSI based passive human activity recognition using attention based BLSTM,”IEEE Transactions on Mobile Computing, vol. 18, no. 11, pp. 2714–2724, 2019
work page 2019
-
[21]
Two-stream convolution augmented transformer for human activity recognition,
B. Li, W. Cui, W. Wang, L. Zhang, Z. Chen, and M. Wu, “Two-stream convolution augmented transformer for human activity recognition,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, pp. 286–293, 2021
work page 2021
-
[22]
Vision transformers for human activity recognition using WiFi channel state information,
F. Luo, S. Khan, B. Jiang, and K. Wu, “Vision transformers for human activity recognition using WiFi channel state information,”IEEE Internet of Things Journal, vol. 11, no. 17, pp. 28 111–28 122, 2024
work page 2024
-
[23]
CSI- based human activity recognition using convolutional neural networks,
P. F. Moshiri, M. Nabati, R. Shahbazian, and S. A. Ghorashi, “CSI- based human activity recognition using convolutional neural networks,” in2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), 2021, pp. 7–12
work page 2021
-
[24]
MultiTrack: Multi-user tracking and activity recognition using commodity WiFi,
S. Tan, L. Zhang, Z. Wang, and J. Yang, “MultiTrack: Multi-user tracking and activity recognition using commodity WiFi,” inProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2019, pp. 1–12
work page 2019
-
[25]
IMar: Multi-user continuous action recognition with WiFi signals,
J. He and W. Yang, “IMar: Multi-user continuous action recognition with WiFi signals,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 6, no. 3, 2022
work page 2022
-
[26]
Multi-user gesture recognition using WiFi,
R. H. Venkatnarayan, G. Page, and M. Shahzad, “Multi-user gesture recognition using WiFi,” inProceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, ser. MobiSys ’18. Association for Computing Machinery, 2018, pp. 401–413
work page 2018
-
[27]
WiMAR: A WiFi-based multi-user human activity recognition system via dynamic component separation,
Y . Zhou, Y . Liu, C. Liu, and Y . Lu, “WiMAR: A WiFi-based multi-user human activity recognition system via dynamic component separation,” in2025 IEEE 102nd Vehicular Technology Conference (VTC2025-Fall), 2025, pp. 1–5
work page 2025
-
[28]
F. Miao, C. Liu, Z. Lu, L. Shan, O. Takyu, T. Ohtsuki, and G. Gui, “Lightweight regularized network for multilabel indoor HAR in mul- tiuser CSI environments with uncertainty quantification,”IEEE Internet of Things Journal, vol. 13, no. 4, pp. 6475–6484, 2026
work page 2026
-
[29]
A survey on CSI-based Wi-Fi sensing datasets and models with a focus on reproducibility,
I. Guarino, D. Carra, M. Cominelli, F. Gringoli, and R. Lo Cigno, “A survey on CSI-based Wi-Fi sensing datasets and models with a focus on reproducibility,”Computer Communications, vol. 249, p. 108431, 2026
work page 2026
-
[30]
F. Wang, T. Zhang, W. Xi, H. Ding, G. Wang, D. Zhang, Y . Cui, F. Liu, J. Han, J. Xu, and T. X. Han, “A survey on Wi-Fi sensing generalizability: Taxonomy, techniques, datasets, and future research prospects,” 2025, arXiv preprint arXiv:2503.08008
-
[31]
Zero-effort cross-domain gesture recognition with Wi-Fi,
Y . Zheng, Y . Zhang, K. Qian, G. Zhang, Y . Liu, C. Wu, and Z. Yang, “Zero-effort cross-domain gesture recognition with Wi-Fi,” inProceed- ings of the 17th Annual International Conference on Mobile Systems, Applications, and Services, ser. MobiSys ’19. Association for Com- puting Machinery, 2019, pp. 313–325
work page 2019
-
[32]
MetaFormer: Domain- adaptive WiFi sensing with only one labelled target sample,
B. Sheng, R. Han, F. Xiao, Z. Guo, and L. Gui, “MetaFormer: Domain- adaptive WiFi sensing with only one labelled target sample,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 8, no. 1, 2024
work page 2024
-
[33]
Wi-Learner: Towards one-shot learning for cross-domain Wi-Fi based gesture recognition,
C. Feng, N. Wang, Y . Jiang, X. Zheng, K. Li, Z. Wang, and X. Chen, “Wi-Learner: Towards one-shot learning for cross-domain Wi-Fi based gesture recognition,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 6, no. 3, 2022
work page 2022
-
[34]
Device-free human activity recognition with identity-based transfer mechanism,
B. Wu, T. Jiang, J. Yu, X. Ding, S. Wu, and Y . Zhong, “Device-free human activity recognition with identity-based transfer mechanism,” in2021 IEEE Wireless Communications and Networking Conference (WCNC), 2021, pp. 1–6
work page 2021
-
[35]
D. Wang, J. Yang, W. Cui, L. Xie, and S. Sun, “AirFi: Empowering WiFi-based passive human gesture recognition to unseen environment via domain generalization,”IEEE Transactions on Mobile Computing, vol. 23, no. 2, pp. 1156–1168, 2024
work page 2024
-
[36]
Environment-independent Wi- Fi human activity recognition with adversarial network,
Z. Wang, S. Chen, W. Yang, and Y . Xu, “Environment-independent Wi- Fi human activity recognition with adversarial network,” inICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3330–3334
work page 2021
-
[37]
DA-HAR: Dual adversarial network for environment-independent WiFi human activity recognition,
L. Sheng, Y . Chen, S. Ning, S. Wang, B. Lian, and Z. Wei, “DA-HAR: Dual adversarial network for environment-independent WiFi human activity recognition,”Pervasive and Mobile Computing, vol. 96, p. 101850, 2023
work page 2023
-
[38]
Privacy- preserving cross-environment human activity recognition,
L. Zhang, W. Cui, B. Li, Z. Chen, M. Wu, and T. S. Gee, “Privacy- preserving cross-environment human activity recognition,”IEEE Trans- actions on Cybernetics, vol. 53, no. 3, pp. 1765–1775, 2023
work page 2023
-
[39]
Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 11 976–11 986
work page 2022
-
[40]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
work page 2016
-
[41]
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for MobileNetV3,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), October 2019
work page 2019
-
[42]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017
work page 2017
-
[43]
A deep learning-based human identification system with Wi-Fi CSI data augmentation,
H. Mo and S. Kim, “A deep learning-based human identification system with Wi-Fi CSI data augmentation,”IEEE Access, vol. 9, pp. 91 913– 91 920, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.