pith. machine review for the scientific record. sign in

arxiv: 2604.17530 · v1 · submitted 2026-04-19 · 💻 cs.HC · cs.CV

Recognition: unknown

Real-Time Cellist Postural Evaluation With On-Device Computer Vision

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:34 UTC · model grok-4.3

classification 💻 cs.HC cs.CV
keywords cellist postureon-device computer visionreal-time feedbackmobile applicationheuristic evaluationAndroidinstrumental practiceposture monitoring
0
0 comments X

The pith

The Cello Evaluator app gives cellists real-time posture feedback using computer vision that runs on any current Android phone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the problem that beginning cellists receive posture instruction only once a week and often develop bad habits in between. It introduces Cello Evaluator, an Android application that performs posture analysis in real time through models optimized to run entirely on the phone. This removes the need for external cameras, sensors, or powerful computers. The authors validate the system with a heuristic review by cellist and UX experts, who judged the app user-friendly and practically helpful. If the approach holds, cellists gain continuous guidance during solo practice that was previously unavailable.

Core claim

We present Cello Evaluator, a real-time postural feedback system for practicing cellists. Through this optimization for on-device computer vision inference, we provide access to cellist postural evaluation to anyone with a current generation Android phone and thus reduces the postural feedback voids within individual practice.

What carries the argument

On-device computer vision models optimized for real-time Android inference that detect and score cellist-specific posture issues.

Load-bearing premise

The computer vision models running on ordinary Android phones can detect and rate cellist posture problems accurately enough to give useful real-time guidance.

What would settle it

A side-by-side test in which the app's posture ratings are compared with ratings from professional cellists watching the same video recordings of practice sessions.

Figures

Figures reproduced from arXiv: 2604.17530 by Ekaterina Tszyao, Felix Lu, Gurtej Bagga, Jackson P. Shields, Joshua Kamphuis, Kexin Sha, Kristen Yeon-Ji Yun, Luke Choi, Michael Zhang, Paige Lorenz, Paolo Wang, Raymond Otis Kwon, Shrinand Perumal, Sivamurugan Velmurugan, Trevor Ju, William P. Jiang, Yung-Hsiang Lu.

Figure 1
Figure 1. Figure 1: This is a screenshot of the Cello Evaluator being used during practice. The text at the top of the screen is an example of the instructions provided to the user for correcting poor posture. In addition, the bounding boxes of the relevant areas of the cello and the bow are depicted in blue, meaning that the posture in regards to the two objects are correct. The detected nodes of the user’s bow hand is also … view at source ↗
Figure 2
Figure 2. Figure 2: An example of correct cello posture with a correctly slightly pronated wrist, correct elbow height, and appropriate bow height (as it is touching the strings between the bottom end of the string board and the bridge of the cello) and angle (perpendicular to the cello strings). Additional details regarding how postural correctness is evaluated is discussed in sections 3.2 and 3.3 [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Example of an incorrect, supinated wrist posture, in which the forearm and hand are rotated such that the palm faces upward. (b) Example of correct wrist posture, with slight pronation, where the forearm and hand are rotated such that the palm faces downward. at 1100 parameters for wrist posture and roughly 440 parameters for elbow posture. Due to the complexity of the extracted nodes, we elected to us… view at source ↗
Figure 4
Figure 4. Figure 4: An example image of training data collection, where the cellist is purposely playing with an incorrect supinated wrist posture to allow for collection of supinated wrist node coordinates. distance from the elbow to the wrist, and the X, Y, and Z components of the normalized direction vectors from the shoulder to the elbow and from the wrist to the elbow. It is worth noting that the bow-side arm does not mo… view at source ↗
Figure 5
Figure 5. Figure 5: The bounding boxes of the cello bow and the cello string regions are used to identify the bow’s height and angle relative to the strings, which are then used for postural classification. needed to track bow and string positions precisely for accurate classification while maintaining low inference latency to achieve real-time performance. We evaluated several models, including Segment Anything Model 3 (SAM … view at source ↗
Figure 6
Figure 6. Figure 6: Two label techniques were used to label each image needed to train the various YOLO models that utilize different computer vision techniques. (a) A labeled image used for training YOLO for instance segmentation, object detection, and orienting bounding boxes. (b) A labeled image used for training YOLO for keypoint detections, where points marked with "x" are occluded points. accurate results due to insuffi… view at source ↗
Figure 7
Figure 7. Figure 7: These images showcase YOLO model predictions after training the model with our custom, manually labeled dataset. (a) YOLO instance segmentation detection. (b) oriented bounding box detection. (c) YOLO object detection. (d) YOLO keypoint detection [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The normalized confusion matrix above illustrates how well YOLO-OBB per￾formed at distinguishing the three classes: Bow, String, and background. The model performs excellently at identifying strings (99% correct) and reasonably well at detect￾ing background elements (91% correct). The model struggles to consistently identify the bow (78% correct) by misclassifying the bow as the background (22% incorrect).… view at source ↗
Figure 9
Figure 9. Figure 9: An example of hand and pose annotations with posture correction instruction at the top displayed by the app. Buttons on the right represent (from top to bottom): close camera, flip camera, open settings, and view session history [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: An example of the session summary with a detailed breakdown of (a) postural occurrence percentages and (b) representative screenshots for all categories under bow placement and cellist posture [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean severity ratings (0-4) for the 12 sub-heuristics, grouped into six higher￾level heuristic categories. Bar colors indicate severity tiers: strengths (0-1), moderate issues (1-3), and critical issues (3-4) [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
read the original abstract

Posture is a critical factor for beginning instrumental learners. Most students receive instruction only once a week, and during the intervals between lessons they have little or no feedback on their physical posture. As a result, posture often deteriorates, increasing the risk of musculoskeletal injury and inefficient technique. Recent advances in computer vision and machine learning make it possible to evaluate posture without the constant presence of a human expert. However, current solutions have been extremely limited in availability and convenience due to their reliance on computationally expensive hardware or multi-sensor setups. We present Cello Evaluator, a real-time postural feedback system for practicing cellists. Through this optimization for on-device computer vision inference, we provide access to cellist postural evaluation to anyone with a current generation Android phone and thus reduces the postural feedback voids within individual practice. To validate our mobile application, we conduct a heuristic evaluation consisting of cellist and UX experts. Overall feedback from the evaluation found the app to be user friendly and helpful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents Cello Evaluator, a mobile Android application that performs real-time cellist postural evaluation using on-device computer vision. It argues that this addresses gaps in feedback during individual practice sessions, reducing injury risk and improving technique. Validation consists of a heuristic evaluation with cellist and UX experts, who rated the app as user-friendly and helpful overall.

Significance. If the underlying posture detection were shown to be accurate and low-latency, the work would offer a practical, accessible tool for musicians that leverages commodity hardware. The on-device focus is a positive engineering choice that could broaden access compared to cloud or multi-sensor systems. However, the absence of any technical performance data means the significance of the postural evaluation component cannot yet be assessed.

major comments (3)
  1. [Evaluation] Evaluation section (and abstract): The heuristic evaluation reports only qualitative expert opinions on usability and helpfulness. No quantitative metrics are supplied for the core claim of accurate postural evaluation, such as precision/recall for posture keypoints, agreement with expert-labeled ground truth on cello-specific issues (e.g., shoulder position, wrist angle), or false-positive rates for feedback triggers.
  2. [Implementation] Implementation / Methods: No description is given of the computer vision model (e.g., MediaPipe, OpenPose, or custom), any fine-tuning or dataset used for cellist postures, on-device optimization steps (quantization, model size), or inference pipeline. Without these, the feasibility of real-time on-device operation on standard Android hardware cannot be evaluated.
  3. [Results] Results / Claims: The abstract and introduction assert real-time performance and reduction of 'postural feedback voids,' yet no latency measurements (ms per frame), hardware specifications tested, or accuracy benchmarks appear. This leaves the central engineering claim unsupported.
minor comments (2)
  1. [Introduction] The abstract and introduction would benefit from a brief comparison table or citations to prior posture-detection systems in music education or general HCI to clarify novelty.
  2. [Figures] If figures of the UI or detected keypoints exist, ensure they include example outputs with overlaid feedback to illustrate the system's behavior.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the technical aspects of the manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (and abstract): The heuristic evaluation reports only qualitative expert opinions on usability and helpfulness. No quantitative metrics are supplied for the core claim of accurate postural evaluation, such as precision/recall for posture keypoints, agreement with expert-labeled ground truth on cello-specific issues (e.g., shoulder position, wrist angle), or false-positive rates for feedback triggers.

    Authors: We agree that quantitative accuracy metrics for the posture detection would strengthen the claims regarding effective postural evaluation. As this is an HCI-focused paper presenting an integrated mobile application rather than a novel computer vision algorithm, the evaluation centered on a heuristic assessment of usability and helpfulness with cellist and UX experts, following standard practices for prototype systems. The detection relies on off-the-shelf on-device models without cello-specific fine-tuning or ground-truth labeling in this work. In revision, we will expand the Evaluation and Discussion sections to explicitly note this limitation, qualify the claims about postural evaluation accuracy, and suggest directions for future quantitative validation studies. revision: partial

  2. Referee: [Implementation] Implementation / Methods: No description is given of the computer vision model (e.g., MediaPipe, OpenPose, or custom), any fine-tuning or dataset used for cellist postures, on-device optimization steps (quantization, model size), or inference pipeline. Without these, the feasibility of real-time on-device operation on standard Android hardware cannot be evaluated.

    Authors: We acknowledge the omission of implementation details and will revise the Methods section to include a complete description of the computer vision pipeline. This will cover the specific model used, any adaptations for cellist postures (including whether fine-tuning or a dedicated dataset was applied), on-device optimizations such as quantization and model size, and the end-to-end inference pipeline to allow assessment of real-time feasibility on standard Android hardware. revision: yes

  3. Referee: [Results] Results / Claims: The abstract and introduction assert real-time performance and reduction of 'postural feedback voids,' yet no latency measurements (ms per frame), hardware specifications tested, or accuracy benchmarks appear. This leaves the central engineering claim unsupported.

    Authors: We agree that explicit performance benchmarks are needed to support the real-time and accessibility claims. While the system was implemented and tested to run in real time on current Android devices, specific quantitative results were not reported in the initial submission. In the revision, we will add a Results subsection with latency measurements (e.g., ms per frame), the hardware specifications of devices tested, and any available accuracy-related observations to substantiate the engineering claims. revision: yes

Circularity Check

0 steps flagged

No derivations, predictions, or self-referential steps; straightforward engineering with external heuristic validation

full rationale

The paper presents an on-device CV mobile app for cellist posture feedback and validates it via a heuristic evaluation by cellist and UX experts. No equations, fitted parameters, predictions, or derivation chains appear. The validation is an independent external assessment rather than a self-referential fit or self-citation. This matches the default expectation of no significant circularity for applied engineering work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No theoretical components, free parameters, axioms, or invented entities are involved; the work is an applied system implementation relying on standard computer vision libraries and mobile hardware.

pith-pipeline@v0.9.0 · 5535 in / 1029 out tokens · 35604 ms · 2026-05-10T05:34:57.367761+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 18 canonical work pages · 5 internal anchors

  1. [1]

    Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P.N., Inkpen, K., Teevan, J., Kikin-Gil, R., Horvitz, E.: Guidelinesforhuman-aiinteraction.In:Proceedingsofthe2019CHIConferenceon Human Factors in Computing Systems. p. 1–13. CHI ’19, Association for Comput- ing Machinery, New York, NY, USA (2019).ht...

  2. [2]

    Bradski, G.: The opencv library. Dr. Dobb’s Journal of Software Tools (2000)

  3. [3]

    Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: Realtime multi- person 2d pose estimation using part affinity fields (2019),https://arxiv.org/ abs/1812.08008

  4. [4]

    SAM 3: Segment Anything with Concepts

    Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K.V., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., Rädle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.H., Zhou, Y., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng, H.K., ...

  5. [5]

    Journal of Sport and Health Research4, 23–34 (10 2011)

    Figueres, J., Perez-Soriano, P., Belloch, S., Figueres, E.: Injuries prevention in string players. Journal of Sport and Health Research4, 23–34 (10 2011)

  6. [6]

    JAAPA: official journal of the American Academy of Physician Assistants21(4) (2008).https: //doi.org/10.1097/01720610-200804000-00015

    Heinan, M.: A review of the unique injuries sustained by musicians. JAAPA: official journal of the American Academy of Physician Assistants21(4) (2008).https: //doi.org/10.1097/01720610-200804000-00015

  7. [7]

    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An- dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017),https://arxiv.org/abs/1704.04861

  8. [8]

    Computer Music Journal43(1), 59–78 (2020).https://doi

    Johnson, D., Damian, D., Tzanetakis, G.: Detecting hand posture in piano playing using depth data. Computer Music Journal43(1), 59–78 (2020).https://doi. org/10.1162/comj_a_00500

  9. [9]

    Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhance- ments (2024),https://arxiv.org/abs/2410.17725 20 Wang et al

  10. [10]

    Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., Chang, W.T., Hua, W., Georg, M., Grund- mann, M.: Mediapipe: A framework for building perception pipelines (2019), https://arxiv.org/abs/1906.08172

  11. [11]

    com/articles/ten-usability-heuristics/(1994), accessed: 2026-01-20

    Nielsen, J.: 10 usability heuristics for user interface design.https://www.nngroup. com/articles/ten-usability-heuristics/(1994), accessed: 2026-01-20

  12. [12]

    Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of theSIGCHIConferenceonHumanFactorsinComputingSystems.p.249–256.CHI ’90, Association for Computing Machinery, New York, NY, USA (1990).https: //doi.org/10.1145/97243.97281,https://doi.org/10.1145/97243.97281

  13. [14]

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection (2016),https://arxiv.org/abs/1506.02640

  14. [15]

    Scientific Reports10, 13882 (08 2020)

    Rozé, J., Aramaki, M., Kronland-Martinet, R., Ystad, S.: Cellists’ sound quality is shaped by their primary postural behavior. Scientific Reports10, 13882 (08 2020). https://doi.org/10.1038/s41598-020-70705-8

  15. [16]

    Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks (2020),https://arxiv.org/abs/1905.11946

  16. [17]

    Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors (2025),https://arxiv.org/abs/2502.12524

  17. [18]

    Empowering edge intelligence: A comprehensive survey on on-device ai models,

    Wang, X., Tang, Z., Guo, J., Meng, T., Wang, C., Wang, T., Jia, W.: Empowering edgeintelligence:Acomprehensivesurveyonon-deviceaimodels.ACMComputing Surveys57(9), 1–39 (Apr 2025).https://doi.org/10.1145/3724420,http://dx. doi.org/10.1145/3724420

  18. [19]

    Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2.https:// github.com/facebookresearch/detectron2(2019), computer software

  19. [20]

    Molecular & Cellular Biome- chanics22, 762 (01 2025).https://doi.org/10.62617/mcb762

    Yang, P.: Integrating intelligent algorithms in music education to analyze and im- prove posture and motion in instrumental training. Molecular & Cellular Biome- chanics22, 762 (01 2025).https://doi.org/10.62617/mcb762

  20. [21]

    Yaseen, M.: What is yolov8: An in-depth exploration of the internal features of the next-generation object detector (2024),https://arxiv.org/abs/2408.15857

  21. [22]

    Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.L., Grundmann, M.: Mediapipe hands: On-device real-time hand tracking (2020), https://arxiv.org/abs/2006.10214

  22. [23]

    Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection (2024),https://arxiv.org/abs/2304.08069