pith. sign in

arxiv: 2006.10204 · v1 · pith:5BKTZVFUnew · submitted 2020-06-17 · 💻 cs.CV

BlazePose: On-device Real-time Body Pose tracking

classification 💻 cs.CV
keywords bodyposenetworkreal-timetrackingblazeposeestimationinference
0
0 comments X
read the original abstract

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

    cs.CV 2026-05 unverdicted novelty 7.0

    The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.

  2. AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

    cs.CV 2026-05 unverdicted novelty 7.0

    AIGaitor is the first claimed end-to-end on-device monocular motion-capture and deep-learning gait analysis pipeline demonstrated on consumer smartphones.

  3. MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

    cs.CV 2026-05 conditional novelty 7.0

    MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation, spanning video, audio, shot, and reference dimensions with an adaptive evaluation framework that reaches 91.5% Spearman correlation...

  4. MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation featuring four dimensions, challenging scenarios, and an adaptive hybrid evaluation framework that achieves 91.5% Spearman correlati...

  5. BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton

    cs.CV 2026-05 unverdicted novelty 7.0

    BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.

  6. DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild

    cs.CV 2025-02 unverdicted novelty 7.0

    DIPSER supplies multi-view RGB video and smartwatch data from natural in-person classes with attention and emotion labels from self-report plus four experts, including underrepresented ethnicities.

  7. From Multimodal Signals to Adaptive XR Experiences for De-escalation Training

    cs.HC 2026-04 unverdicted novelty 4.0

    An early multimodal XR prototype fuses five signal streams with an interpretation layer to detect escalation cues and enable adaptive de-escalation training.

  8. Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

    cs.CV 2026-04 unverdicted novelty 3.0

    Facial emotion embeddings improve short-term pose forecasting accuracy for emotion-driven motions when fused via normalized gating in a lightweight LSTM world model, but not with simple multimodal fusion.