BlazePose: On-device Real-time Body Pose tracking

Fan Zhang; Ivan Grishchenko; Karthik Raveendran; Matthias Grundmann; Tyler Zhu; Valentin Bazarevsky

arxiv: 2006.10204 · v1 · pith:5BKTZVFUnew · submitted 2020-06-17 · 💻 cs.CV

BlazePose: On-device Real-time Body Pose tracking

Valentin Bazarevsky , Ivan Grishchenko , Karthik Raveendran , Tyler Zhu , Fan Zhang , Matthias Grundmann This is my paper

classification 💻 cs.CV

keywords bodyposenetworkreal-timetrackingblazeposeestimationinference

0 comments

read the original abstract

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing
cs.CV 2026-05 unverdicted novelty 7.0

The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing
cs.CV 2026-05 unverdicted novelty 7.0

AIGaitor is the first claimed end-to-end on-device monocular motion-capture and deep-learning gait analysis pipeline demonstrated on consumer smartphones.
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation
cs.CV 2026-05 conditional novelty 7.0

MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation, spanning video, audio, shot, and reference dimensions with an adaptive evaluation framework that reaches 91.5% Spearman correlation...
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation
cs.CV 2026-05 unverdicted novelty 7.0

MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation featuring four dimensions, challenging scenarios, and an adaptive hybrid evaluation framework that achieves 91.5% Spearman correlati...
BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton
cs.CV 2026-05 unverdicted novelty 7.0

BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.
DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild
cs.CV 2025-02 unverdicted novelty 7.0

DIPSER supplies multi-view RGB video and smartwatch data from natural in-person classes with attention and emotion labels from self-report plus four experts, including underrepresented ethnicities.
From Multimodal Signals to Adaptive XR Experiences for De-escalation Training
cs.HC 2026-04 unverdicted novelty 4.0

An early multimodal XR prototype fuses five signal streams with an interpretation layer to detect escalation cues and enable adaptive de-escalation training.
Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model
cs.CV 2026-04 unverdicted novelty 3.0

Facial emotion embeddings improve short-term pose forecasting accuracy for emotion-driven motions when fused via normalized gating in a lightweight LSTM world model, but not with simple multimodal fusion.