pith. sign in

arxiv: 2604.04540 · v1 · submitted 2026-04-06 · 📡 eess.SP

Activity Recognition Using mm-Wave Radar and Deep Learning: Prayer Tracker Case Study

Pith reviewed 2026-05-10 19:59 UTC · model grok-4.3

classification 📡 eess.SP
keywords activity recognitionmm-wave radardeep learningprayer trackingpoint cloud classificationResNetprivacy preservingFMCW radar
0
0 comments X

The pith

mm-Wave radar point clouds with ResNet classify prayer movements at 95.4 percent accuracy on unseen data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a privacy-preserving system that uses mm-wave radar to track and recognize the sequence of movements during prayer. Instead of camera images, it processes the radar returns into point clouds containing range, amplitude, Doppler and angle information. Several machine learning classifiers are compared, and the ResNet convolutional network is shown to reach 95.4 percent accuracy when tested on movements it has not seen before. This approach meets privacy requirements while still supplying real-time position, sequence tracking and user feedback.

Core claim

By feeding four-dimensional point-cloud tensors from a frequency-modulated continuous-wave radar into a ResNet classifier, specific prayer postures can be identified at up to 95.4 percent accuracy on previously unseen recordings.

What carries the argument

ResNet convolutional neural network operating on radar point clouds that encode range, reflection amplitude, Doppler velocity and angle of arrival.

If this is right

  • The framework supplies current position tracking, sequence tracking, and feedback to the user.
  • The method works with conventional radar processing output rather than raw I-Q samples.
  • The same point-cloud plus deep-learning pipeline applies to a wide range of activity-recognition tasks.
  • ResNet outperforms the other tested classifiers on this data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar radar setups could monitor repetitive physical routines in homes without recording visual images.
  • Accuracy might improve further by adding temporal sequence models on top of the per-frame classification.
  • Deployment in varied room sizes or with multiple simultaneous users would test the limits of the current training set.

Load-bearing premise

The point clouds recorded from the chosen prayer movements contain all the features needed to distinguish the classes and are representative of how people actually perform them.

What would settle it

A drop in accuracy below 80 percent when the same radar records new users performing the identical prayer sequence in a different physical environment.

Figures

Figures reproduced from arXiv: 2604.04540 by Karim Saifullin, Mohamed-Slim Alouini, Sajid Ahmed.

Figure 1
Figure 1. Figure 1: The four basic activities in each rakat of a Salat are [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The actual orientation of the TI AWR-1642 module [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Radar setup example in front of the person. Radar is [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Framework for system development. For radar positioning, we selected the most convenient option for the user perspective and data collection point of view. The sensor is placed on the ground in front of the person, approximately 1.5 meters away, with an elevation angle of about 60 degrees as shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 8
Figure 8. Figure 8: Expected prayer sequence. If we assume the user follows the prayer sequence correctly, we can improve classification accuracy by incorporating prior knowledge about probabilities of transitions. For instance, knowing that a person is in a standing position, there is a high probability they will move to bowing, a very small probability they will move to prostration, and zero probability they will move to si… view at source ↗
Figure 7
Figure 7. Figure 7: Bowing position image of (a) reflection coefficient and (b) Doppler shift channels. image represents the lower part of the person, while the upper part of the image represents the upper part of the person. This aligns with our expectations: the lower part of the person is closer to the radar module and has lower angle values, whereas the upper part is relatively farther away from the radar module and has h… view at source ↗
Figure 9
Figure 9. Figure 9: Test accuracy for the first data division scenario. [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Test accuracy for the second data division scenario [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Test accuracy for the third data division scenario. [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Fire module for SqueezeNet. Since we are using a 2-channel image, we investigated the performance when using only one input channel (either reflection amplitude or Doppler). Our findings indicate that the reflection channel contains sufficient information to achieve the same accuracy as using both channels, whereas the Doppler channel provides less informative results, as observed from the comparison belo… view at source ↗
Figure 14
Figure 14. Figure 14: We have observed that the reflection channel alone contains sufficient information to achieve the desired classification ac￾curacy. While utilizing the reflection channel might marginally enhance performance compared to using both channels, any difference is minimal and could be attributed to random variability during training. When deciding between channels, computational complexity favors the use of jus… view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of SqueezeNet accuracy based on varying [PITH_FULL_IMAGE:figures/full_fig_p008_14.png] view at source ↗
read the original abstract

The issue of privacy has gained significant attention in recent times. Many real-world applications increasingly require the use of sensitive data, such as in surveillance or tracking and assistance systems. To address these concerns, we propose a framework based on mm-wave radar technology that not only meets privacy requirements but also provides the necessary capabilities for these systems, including reliable current position tracking, sequence tracking, and feedback to the user. While the use of radar technology for surveillance purposes is gaining momentum, there has been no research to date on its application for prayer tracking and assistance systems. Furthermore, there is a lack of comprehensive research that covers all aspects of implementing such a system. Proposed approach offers a versatile solution that can be applied to a broad range of scenarios. Instead of utilizing raw I-Q data, we addressed the challenge of classification based on point cloud information generated by the conventional processing chain of the frequency-modulated continuous wave radar. This information contains corresponding range, reflection amplitude, Doppler and angular values. We have developed and compared different machine-learning classification algorithms to identify the most effective one. Our findings reveal that the convolutional neural network ResNet achieves the best results, with accuracy rates reaching up to 95.4 percent when applied to unknown data. The demonstration video of the developed system can be viewed at the following link: https://youtu.be/PnpGQZWqCr4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a mm-wave FMCW radar system for privacy-preserving prayer activity recognition and assistance. It processes conventional point-cloud outputs (range, Doppler, angle, amplitude) rather than raw I/Q samples, compares several ML classifiers, and reports that ResNet achieves 95.4% accuracy on held-out 'unknown data'.

Significance. If the empirical results prove robust under proper validation, the work supplies a concrete case study of radar-based activity recognition in a culturally specific domain and demonstrates the practicality of using standard point-cloud features. The explicit comparison of multiple models and the provision of a demonstration video are positive elements.

major comments (2)
  1. [Abstract] Abstract: the headline claim of 95.4% accuracy on 'unknown data' is presented without any accompanying information on total frames, number of subjects, class balance, recording conditions, or the train/test split protocol (random, leave-one-subject-out, or session-based). This information is load-bearing for evaluating whether the result reflects genuine generalization or overfitting to limited recording conditions.
  2. [Results] Results/Experimental Setup (inferred from abstract and method description): the central assumption that the collected point clouds capture all relevant intra- and inter-subject variation in prayer movements is not supported by any quantitative description of subject diversity, movement speed variation, clothing, or environmental factors. Without these details the 95.4% figure cannot be distinguished from a best-case laboratory result.
minor comments (1)
  1. [Abstract] The manuscript would benefit from a table summarizing dataset statistics (subjects, total samples per class, train/validation/test split sizes) and from explicit baseline comparisons (e.g., SVM or simpler CNN) with the same feature representation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for highlighting these important aspects of our presentation. We have made revisions to the manuscript to incorporate the suggested details, thereby strengthening the clarity and credibility of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of 95.4% accuracy on 'unknown data' is presented without any accompanying information on total frames, number of subjects, class balance, recording conditions, or the train/test split protocol (random, leave-one-subject-out, or session-based). This information is load-bearing for evaluating whether the result reflects genuine generalization or overfitting to limited recording conditions.

    Authors: We agree with the referee that the abstract should provide more context for the reported accuracy to allow proper evaluation of generalization. We have revised the abstract to include information on the total number of frames collected, the number of subjects involved, the class balance, the recording conditions, and the train/test split protocol used (leave-one-subject-out). These additions ensure that readers can assess whether the 95.4% accuracy reflects robust performance. The full dataset description remains in the experimental section, but key statistics are now highlighted in the abstract. revision: yes

  2. Referee: [Results] Results/Experimental Setup (inferred from abstract and method description): the central assumption that the collected point clouds capture all relevant intra- and inter-subject variation in prayer movements is not supported by any quantitative description of subject diversity, movement speed variation, clothing, or environmental factors. Without these details the 95.4% figure cannot be distinguished from a best-case laboratory result.

    Authors: We acknowledge that the manuscript would benefit from a more explicit quantitative description of the variations captured in the point cloud data. In the revised version, we have expanded the experimental setup section to include details on subject diversity (e.g., number of participants and their characteristics), observed variations in movement speeds, types of clothing worn by subjects, and environmental factors such as room setup and potential interferers. This addition supports the assumption that the collected data encompasses relevant intra- and inter-subject variations for the prayer activity recognition task, distinguishing it from a purely best-case scenario. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML performance evaluation

full rationale

The manuscript describes data collection with mm-wave radar, conversion to point clouds (range, Doppler, angle, amplitude), and training/comparison of standard classifiers including ResNet on held-out test data. The reported 95.4% accuracy is a direct empirical metric obtained from this pipeline; no equations, derivations, fitted parameters renamed as predictions, or self-citation chains are present that would reduce the result to its own inputs by construction. Concerns about subject count, session independence, or representativeness affect external validity but do not constitute circularity under the defined criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of radar signal processing and supervised deep learning rather than new derivations. No free parameters are explicitly introduced beyond ordinary neural-network weights; no new physical entities are postulated.

free parameters (1)
  • ResNet and competing model hyperparameters
    All neural-network weights and training choices are fitted to the collected radar recordings; exact values and search procedure are not reported.
axioms (2)
  • domain assumption Conventional FMCW radar processing chain produces point clouds whose range, Doppler, amplitude, and angle features are sufficient to distinguish prayer activities.
    Invoked when the authors replace raw I-Q data with point-cloud input.
  • domain assumption The collected training and test recordings adequately sample real-world prayer motion variability.
    Required for the 95.4% accuracy on 'unknown data' to generalize.

pith-pipeline@v0.9.0 · 5550 in / 1379 out tokens · 36399 ms · 2026-05-10T19:59:59.849215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    94 ghz radar used fo r perimeter surveillance with wide target clarification,

    W. Johannes, M. Caris, and S. Stanko, “94 ghz radar used fo r perimeter surveillance with wide target clarification,” in 2020 21st International Radar Symposium (IRS) , pp. 15–18, 2020

  2. [2]

    Classification of hu man postures using ultra-wide band radar based on neural networ ks,

    M. A. Kiasari, S. Y . Na, and J. Y . Kim, “Classification of hu man postures using ultra-wide band radar based on neural networ ks,” in 2014 International Conference on IT Convergence and Security (I CITCS), pp. 1–4, 2014

  3. [3]

    Classification of human posture from radar returns using ultra-wideband radar,

    Z. Baird, S. Rajan, and M. Bolic, “Classification of human posture from radar returns using ultra-wideband radar,” in 2018 40th Annual International Conference of the IEEE Engineering in Medici ne and Biology Society (EMBC) , pp. 3268–3271, 2018

  4. [4]

    Bodycom pass: Monitoring sleep posture with wireless signals,

    S. Y ue, Y . Y ang, H. Wang, H. Rahul, and D. Katabi, “Bodycom pass: Monitoring sleep posture with wireless signals,” Proc. ACM Interact. Mob. W earable Ubiquitous Technol., vol. 4, jun 2020

  5. [5]

    Forma track: Tracking people based on body shape,

    A. Kalyanaraman, D. Hong, E. Soltanaghaei, and K. Whiteh ouse, “Forma track: Tracking people based on body shape,” vol. 1, n o. 3, 2017

  6. [6]

    Application of linear-frequency-modulated continuous- wave (lfmcw) radars for tracking of vital signs,

    G. Wang, J.-M. Mu˜ noz-Ferreras, C. Gu, C. Li, and R. G´ ome z-Garc´ ıa, “Application of linear-frequency-modulated continuous- wave (lfmcw) radars for tracking of vital signs,” IEEE Transactions on Microwave Theory and Techniques , vol. 62, no. 6, pp. 1387–1399, 2014

  7. [7]

    Multi-t arget vital-signs monitoring using a dual-beam hybrid doppler radar,

    M. Nosrati, S. Shahsavari, and N. Tavassolian, “Multi-t arget vital-signs monitoring using a dual-beam hybrid doppler radar,” in 2018 IEEE International Microwave Biomedical Conference (IMBioC) , pp. 58–60, 2018

  8. [8]

    Real-time arm gesture recognition in smar t home scenarios via millimeter wave sensing,

    H. Liu, Y . Wang, A. Zhou, H. He, W. Wang, K. Wang, P . Pan, Y . L u, L. Liu, and H. Ma, “Real-time arm gesture recognition in smar t home scenarios via millimeter wave sensing,” vol. 4, no. 4, 2020

  9. [9]

    Continuous human motion recognition with a dynam ic range- doppler trajectory method based on fmcw radar,

    C. Ding, H. Hong, Y . Zou, H. Chu, X. Zhu, F. Fioranelli, J. L e Kernec, and C. Li, “Continuous human motion recognition with a dynam ic range- doppler trajectory method based on fmcw radar,” IEEE Transactions on Geoscience and Remote Sensing , vol. 57, no. 9, pp. 6821–6831, 2019

  10. [10]

    Noncontact exercise monitor ing in multi- person scenario with frequency-modulated continuous-wav e radar,

    D. V . Rodrigues and C. Li, “Noncontact exercise monitor ing in multi- person scenario with frequency-modulated continuous-wav e radar,” in 2020 IEEE MTT-S International Microwave Biomedical Confer ence (IMBioC), pp. 1–3, 2020

  11. [11]

    Exploring tangible interactions with radar sensing,

    H.-S. Y eo, R. Minami, K. Rodriguez, G. Shaker, and A. Qui gley, “Exploring tangible interactions with radar sensing,” vol . 2, no. 4, 2018

  12. [12]

    Moving target classi fication in au- tomotive radar systems using convolutional recurrent neur al networks,

    S. Kim, S. Lee, S. Doo, and B. Shim, “Moving target classi fication in au- tomotive radar systems using convolutional recurrent neur al networks,” in 2018 26th European Signal Processing Conference (EUSIPCO) , pp. 1482–1486, 2018

  13. [13]

    Deep learning-based object classification on automotive r adar spectra,

    K. Patel, K. Rambach, T. Visentin, D. Rusev, M. Pfeiffer , and B. Y ang, “Deep learning-based object classification on automotive r adar spectra,” in 2019 IEEE Radar Conference (RadarConf) , pp. 1–6, 2019

  14. [14]

    Road users cla ssification based on bi-frame micro-doppler with 24-ghz fmcw radar,

    R. Coppola, S. Ahmed, and M.-S. Alouini, “Road users cla ssification based on bi-frame micro-doppler with 24-ghz fmcw radar,” Frontiers in Signal Processing, vol. 2, 2022

  15. [15]

    Li and P

    J. Li and P . Stoica, MIMO Radar Signal Processing . Wiley-IEEE Press, 2009

  16. [16]

    AWR1642 Single-Chip 77- and 79-GHz FMCW Radar sensor da tasheet (Rev. C)

  17. [17]

    Principal component analysis: a rev iew and recent developments,

    C. J. Jolliffe IT, “Principal component analysis: a rev iew and recent developments,” Philos Trans A Math Phys Eng Sci , vol. 374, no. 6, pp. 1387–1399, 2016

  18. [18]

    Elad, Sparse and Redundant Representations

    M. Elad, Sparse and Redundant Representations. From Theory to Applications in Signal and Image Processing . Springer New Y ork, NY , 1 ed., 2010

  19. [19]

    F¨ urnkranz, Decision Tree, pp

    J. F¨ urnkranz, Decision Tree, pp. 263–267. Boston, MA: Springer US, 2010

  20. [20]

    Deep residual learni ng for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learni ng for image recognition,” 2015

  21. [21]

    Squeezenet: Alexnet-level accuracy with 5 0x fewer parameters and ¡0.5mb model size,

    F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Da lly, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 5 0x fewer parameters and ¡0.5mb model size,” 2016