Human Body Parts Tracking: Applications to Activity Recognition
Pith reviewed 2026-05-25 10:59 UTC · model grok-4.3
The pith
Torso blob tracking on foreground silhouettes anchors real-time tracking of head, arms and legs for activity recognition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The HBPT system obtains the torso location and size via blob tracking on the foreground silhouette in every frame, places the remaining body parts at fixed relative positions, models each part with a 2D-Gaussian blob, and uses the resulting tracks to recognize activities such as approaching an object, carrying an object, and opening a box or suitcase while remaining accurate under varying illumination and partial occlusions.
What carries the argument
Torso blob tracking on the refined foreground silhouette, which fixes the reference frame for placing and modeling all other body parts as 2D-Gaussian blobs.
If this is right
- Body-part tracks produced by the system can be fed directly into activity classifiers for tasks such as carrying objects or opening containers.
- The same torso reference allows consistent part placement even when illumination varies between frames.
- Partial occlusions that leave the torso visible still permit recovery of the remaining part locations.
- The 2D-Gaussian representation supplies both position and pose information usable for real-time recognition.
Where Pith is reading between the lines
- If torso detection remains the only robust cue, the method will degrade in crowded scenes where multiple overlapping silhouettes appear.
- Replacing the fixed relative offsets with learned kinematic constraints could extend the approach to more articulated motions without changing the core silhouette pipeline.
- The Gaussian blob output could serve as input features for downstream probabilistic trackers that handle full occlusions.
- Evaluating the system on standard public activity datasets would reveal whether the reported robustness generalizes beyond the sequences shown.
Load-bearing premise
The torso can be reliably located and sized by blob tracking on the foreground silhouette obtained by background subtraction.
What would settle it
A test video in which the blob tracker produces an incorrect torso location or size under partial occlusion or changed lighting, causing the derived positions of head, arms and legs to deviate enough that the intended activity labels are no longer recovered.
Figures
read the original abstract
As cameras and computers became popular, the applications of computer vision techniques attracted attention enormously. One of the most important applications in the computer vision community is human activity recognition. In order to recognize human activities, we propose a human body parts tracking system that tracks human body parts such as head, torso, arms and legs in order to perform activity recognition tasks in real time. This thesis presents a real-time human body parts tracking system (i.e. HBPT) from video sequences. Our body parts model is mostly represented by body components such as legs, head, torso and arms. The body components are modeled using torso location and size which are obtained by a torso tracking method in each frame. In order to track the torso, we are using a blob tracking module to find the approximate location and size of the torso in each frame. By tracking the torso, we will be able to track other body parts based on their location with respect to the torso on the detected silhouette. In the proposed method for human body part tracking, we are also using a refining module to improve the detected silhouette by refining the foreground mask (i.e. obtained by background subtraction) in order to detect the body parts with respect to torso location and size. Having found the torso size and location, the region of each human body part on the silhouette will be modeled by a 2D-Gaussian blob in each frame in order to show its location, size and pose. The proposed approach described in this thesis tracks accurately the body parts in different illumination conditions and in the presence of partial occlusions. The proposed approach is applied to activity recognition tasks such as approaching an object, carrying an object and opening a box or suitcase.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a real-time human body parts tracking system (HBPT) for activity recognition. Body components (head, torso, arms, legs) are modeled relative to the torso, whose location and size are obtained via blob tracking on the foreground silhouette from background subtraction; a refining module improves the mask, parts are placed at fixed relative positions on the silhouette, and each is represented as a 2D Gaussian. The approach is claimed to track accurately under varying illumination and partial occlusions and is applied to tasks such as approaching an object, carrying an object, and opening a box.
Significance. If the tracking pipeline were shown to be robust, the method could contribute to real-time activity recognition pipelines that rely on explicit body-part localization. The procedural description offers no machine-checked proofs, reproducible code, parameter-free derivations, or falsifiable quantitative predictions, so significance cannot be evaluated from the supplied material.
major comments (3)
- [Abstract] Abstract: the claim that 'the proposed approach described in this thesis tracks accurately the body parts in different illumination conditions and in the presence of partial occlusions' is unsupported; the manuscript supplies no quantitative tracking metrics (e.g., MOTA, precision-recall, pixel error), no datasets, no baseline comparisons, and no ablation of the refining module.
- [Method description (torso tracking module)] The torso-location step (blob tracking on the background-subtracted silhouette) is load-bearing for the entire pipeline and for the illumination/occlusion robustness claim, yet the description provides neither the exact blob-tracking algorithm nor any validation that this step remains reliable when background subtraction fails under the very illumination changes the paper asserts it handles.
- [Body-part placement step] Placement of remaining parts at 'fixed relative positions' with respect to the detected torso on the silhouette is presented without any mechanism for handling the partial occlusions that directly corrupt the silhouette used for both torso sizing and relative placement.
minor comments (2)
- [Gaussian modeling paragraph] Notation for the 2D-Gaussian blobs (means, covariances, how pose is encoded) is never defined.
- [Refining module] The refining module is invoked repeatedly but never specified (algorithm, parameters, or pseudocode).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment point by point below, indicating planned revisions where appropriate. We acknowledge several limitations in the current manuscript regarding evaluation and methodological detail.
read point-by-point responses
-
Referee: [Abstract] the claim that 'the proposed approach described in this thesis tracks accurately the body parts in different illumination conditions and in the presence of partial occlusions' is unsupported; the manuscript supplies no quantitative tracking metrics (e.g., MOTA, precision-recall, pixel error), no datasets, no baseline comparisons, and no ablation of the refining module.
Authors: We agree that the manuscript provides no quantitative metrics, datasets, baselines or ablations, and that the accuracy claim in the abstract is therefore unsupported by such evidence. The claim derives from qualitative visual inspection of tracking results on the demonstrated activity examples. We will revise the abstract to remove the unsupported quantitative claim and instead describe the approach as having been applied to activity recognition tasks involving varying illumination and partial occlusions, based on the presented examples. revision: yes
-
Referee: [Method description (torso tracking module)] The torso-location step (blob tracking on the background-subtracted silhouette) is load-bearing for the entire pipeline and for the illumination/occlusion robustness claim, yet the description provides neither the exact blob-tracking algorithm nor any validation that this step remains reliable when background subtraction fails under the very illumination changes the paper asserts it handles.
Authors: The torso tracking relies on standard connected-component blob detection applied to the foreground mask after background subtraction. We acknowledge that the manuscript does not specify the exact algorithm (e.g., parameters, distance metrics or update rules) nor include targeted validation showing reliability when background subtraction degrades under illumination variation. This constitutes a genuine gap in the method description. We will expand the torso-tracking subsection with additional implementation details drawn from the original thesis work where possible. revision: partial
-
Referee: [Body-part placement step] Placement of remaining parts at 'fixed relative positions' with respect to the detected torso on the silhouette is presented without any mechanism for handling the partial occlusions that directly corrupt the silhouette used for both torso sizing and relative placement.
Authors: Body-part regions are assigned at fixed offsets relative to the detected torso on the refined silhouette. The refining module improves the foreground mask, yet we agree there is no explicit mechanism (such as occlusion-aware adjustment or fallback estimation) to compensate when occlusions corrupt the silhouette used for sizing and placement. The robustness claim rests on observed behavior in the example sequences rather than a dedicated algorithmic safeguard. We will add a limitations paragraph clarifying this point and noting that severe occlusions may affect placement accuracy. revision: partial
- Provision of quantitative tracking metrics, datasets, baseline comparisons or ablation studies, as none were performed in the original work.
Circularity Check
No circularity: purely procedural description with no equations or derivations
full rationale
The manuscript describes a body-parts tracking pipeline (torso blob tracking on background-subtracted silhouette, relative part placement, 2D-Gaussian modeling, and a refining module) but contains no equations, no fitted parameters, no derivations, and no self-citations. The accuracy claim under illumination changes and occlusions is asserted as an empirical outcome of the described steps rather than shown to reduce to those steps by construction. Because there is no derivation chain at all, none of the enumerated circularity patterns apply.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Torso location and size obtained by blob tracking can serve as reliable anchor for locating other body parts on the silhouette.
- domain assumption Background subtraction yields a usable foreground mask under the target conditions.
Reference graph
Works this paper leans on
-
[1]
w4 : Real -Time Surveillance of People and Their Activities
I. Haritaoglu, D. Harwood, and L.S. Davis, “w4 : Real -Time Surveillance of People and Their Activities”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809-830, Aug. 2000
work page 2000
-
[2]
Human Activity Recognition Using Multidimensional Indexing
J. Ben -Arie, Z. Wang, P. Pandit and S. Rajaram, “Human Activity Recognition Using Multidimensional Indexing ”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1091-1104, Aug. 2002
work page 2002
-
[3]
Distinctive Image Features from Scale -Invariant Keypoints
D. G . Lowe , “Distinctive Image Features from Scale -Invariant Keypoints ”, International Journal of Computer Vision 60(2), 91–110, 2004
work page 2004
-
[4]
SURF: Speeded -Up Robust Features
Bay, H., Tuytelaars, T., & Van Gool, L., “SURF: Speeded -Up Robust Features”, 9th European Conference on Computer Vision, V ol. 110, pp. 346-359, 2008
work page 2008
-
[5]
Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration
M. Muja and D. G . Lowe, “Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration”, International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009
work page 2009
-
[6]
M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Comm. of the ACM 24: 381–395, June 1981
work page 1981
-
[7]
An Extended Set of Haar -like Features for Rapid Object Detection
R. Lienhart, Jochen Maydt, “An Extended Set of Haar -like Features for Rapid Object Detection”, ICIP, 2002
work page 2002
-
[8]
D. Comaniciu, V . Ramesh, and P. Meer, “Kernel-based object tracking”, PAMI, 2003
work page 2003
-
[9]
R. D. Cavin, A. V . Nefian and N. Goel, “A B ayesian Formulation for 3D Articulated Upper Body Segmentation and Tracking from Dense Disparity Maps”, ICIP, 2003
work page 2003
-
[10]
Multi-bandwidth Kernel-Based Object Tracking
A. Dargazany, A. Soleimani, A. Ahmadyfard, “Multi-bandwidth Kernel-Based Object Tracking”, Hindawi Publishing Corporation, Advances in Artificial Intelligence, Article ID 175603, 15 pages, 2010
work page 2010
-
[11]
C. Barron and I. Kakadiaris. Estimating anthropometry and pose from a single image. In Computer Vision and Pattern Recognition, pages 669–676, 2000
work page 2000
-
[12]
C. Bregler and J. Malik. Tracking pe ople with twists and exponential maps. In Computer Vision and Pattern Recognition, pages 8–15, 1998
work page 1998
-
[13]
R. Grzeszczuk, G . Bradski, M.H. Chu, and J.Y . Bouguet. Stereo based gesture recognition invariant to 3D pose and lighting. In International Conference on Computer Vision and Pattern Recognition, pages 826–833, 2000
work page 2000
- [14]
- [15]
-
[16]
I. Kakadiaris and D. Met axas. Model-based estimation of 3D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1453–1459, 2000
work page 2000
-
[17]
A statistical upper body model for 3D static and dynamic gesture re cognition from stereo sequences
A. V . Nefian, R. Grzeszczuk, and V . Eruhimov. “A statistical upper body model for 3D static and dynamic gesture re cognition from stereo sequences”, In IEEE International Conference on Image Processing, pages 601–607, 2001
work page 2001
-
[18]
H. Sidenbladh, F. De La Torre, and M. J. Black. A framework for modeling the appearance of 3D articulated figures. In Automatic Face and Gestu re Recognition, pages 368–375, 2000
work page 2000
-
[19]
C. Wren, A. Azerbayejani, T. Darell, and A. Pentland. Pfinder: Real -time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:780–785, July 1997
work page 1997
-
[20]
Aras Dargazany, Ali Soleimani, “Kernel-Based Hand Tracking”, INSInet Publication, Australian Journal of Basic and Applied Sciences, 2009
work page 2009
-
[21]
Recursive Estimation of Motion, Structure, and Focal Length,
A. Azarbayejani and A. Pentland, “Recursive Estimation of Motion, Structure, and Focal Length,” Trans. Pattern Analysis and Machine In telligence, vol. 17, no. 6, pp. 562–575, June 1995
work page 1995
-
[22]
An Efficient Method for Contour Tracking Using Active Shape Models,
A. Baumberg and D. Hogg, “An Efficient Method for Contour Tracking Using Active Shape Models,” Proc. Workshop Motion of Nonrigid and Articulated Objects. Los Alamitos, Calif.: IEEE CS Press, 1994
work page 1994
-
[23]
Segmenting Simply Connected Moving Objects in a Static Scene,
M. Bichsel, “Segmenting Simply Connected Moving Objects in a Static Scene,” Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 11, pp. 1,138 –1,142, Nov. 1994
work page 1994
-
[24]
Pfinder: Real -Time Tracking of the Human Body
C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real -Time Tracking of the Human Body”, Trans. Pattern Analysis and Machine Intelligence, vol. 19, 1997
work page 1997
-
[25]
Adaptive background mixture models for real-time tracking
C. Stauffer, W. Grimso, “Adaptive background mixture models for real-time tracking” ,CVPR, 1998
work page 1998
-
[26]
Foreground O bject Detection from Videos Containing Complex Background
L. Li, W. Huang, I. Y .H. Gu, Q. Tian, “Foreground O bject Detection from Videos Containing Complex Background”, ACM, 2003
work page 2003
-
[27]
Tracking and Matching Connected Components from 3D Video
D. da Silva Pires, R. Cesar -Jr.,“Tracking and Matching Connected Components from 3D Video”, CVPR, 2005
work page 2005
-
[28]
Real Time Hand Tracking by Combi ning Particle Filtering and Mean Shift
C. Shan, Y . Wei, T. Tan, F. Ojardias, “Real Time Hand Tracking by Combi ning Particle Filtering and Mean Shift”, Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.