arxiv: 2604.17530 · v1 · submitted 2026-04-19 · 💻 cs.HC · cs.CV

Recognition: unknown

Real-Time Cellist Postural Evaluation With On-Device Computer Vision

Paolo Wang , Michael Zhang , Shrinand Perumal , Ekaterina Tszyao , Luke Choi , Kexin Sha , Felix Lu , Paige Lorenz

show 9 more authors

Jackson P. Shields Sivamurugan Velmurugan Joshua Kamphuis William P. Jiang Gurtej Bagga Trevor Ju Raymond Otis Kwon Kristen Yeon-Ji Yun Yung-Hsiang Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:34 UTC · model grok-4.3

classification 💻 cs.HC cs.CV

keywords cellist postureon-device computer visionreal-time feedbackmobile applicationheuristic evaluationAndroidinstrumental practiceposture monitoring

0 comments

The pith

The Cello Evaluator app gives cellists real-time posture feedback using computer vision that runs on any current Android phone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the problem that beginning cellists receive posture instruction only once a week and often develop bad habits in between. It introduces Cello Evaluator, an Android application that performs posture analysis in real time through models optimized to run entirely on the phone. This removes the need for external cameras, sensors, or powerful computers. The authors validate the system with a heuristic review by cellist and UX experts, who judged the app user-friendly and practically helpful. If the approach holds, cellists gain continuous guidance during solo practice that was previously unavailable.

Core claim

We present Cello Evaluator, a real-time postural feedback system for practicing cellists. Through this optimization for on-device computer vision inference, we provide access to cellist postural evaluation to anyone with a current generation Android phone and thus reduces the postural feedback voids within individual practice.

What carries the argument

On-device computer vision models optimized for real-time Android inference that detect and score cellist-specific posture issues.

Load-bearing premise

The computer vision models running on ordinary Android phones can detect and rate cellist posture problems accurately enough to give useful real-time guidance.

What would settle it

A side-by-side test in which the app's posture ratings are compared with ratings from professional cellists watching the same video recordings of practice sessions.

Figures

Figures reproduced from arXiv: 2604.17530 by Ekaterina Tszyao, Felix Lu, Gurtej Bagga, Jackson P. Shields, Joshua Kamphuis, Kexin Sha, Kristen Yeon-Ji Yun, Luke Choi, Michael Zhang, Paige Lorenz, Paolo Wang, Raymond Otis Kwon, Shrinand Perumal, Sivamurugan Velmurugan, Trevor Ju, William P. Jiang, Yung-Hsiang Lu.

**Figure 1.** Figure 1: This is a screenshot of the Cello Evaluator being used during practice. The text at the top of the screen is an example of the instructions provided to the user for correcting poor posture. In addition, the bounding boxes of the relevant areas of the cello and the bow are depicted in blue, meaning that the posture in regards to the two objects are correct. The detected nodes of the user’s bow hand is also … view at source ↗

**Figure 2.** Figure 2: An example of correct cello posture with a correctly slightly pronated wrist, correct elbow height, and appropriate bow height (as it is touching the strings between the bottom end of the string board and the bridge of the cello) and angle (perpendicular to the cello strings). Additional details regarding how postural correctness is evaluated is discussed in sections 3.2 and 3.3 [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 3.** Figure 3: (a) Example of an incorrect, supinated wrist posture, in which the forearm and hand are rotated such that the palm faces upward. (b) Example of correct wrist posture, with slight pronation, where the forearm and hand are rotated such that the palm faces downward. at 1100 parameters for wrist posture and roughly 440 parameters for elbow posture. Due to the complexity of the extracted nodes, we elected to us… view at source ↗

**Figure 4.** Figure 4: An example image of training data collection, where the cellist is purposely playing with an incorrect supinated wrist posture to allow for collection of supinated wrist node coordinates. distance from the elbow to the wrist, and the X, Y, and Z components of the normalized direction vectors from the shoulder to the elbow and from the wrist to the elbow. It is worth noting that the bow-side arm does not mo… view at source ↗

**Figure 5.** Figure 5: The bounding boxes of the cello bow and the cello string regions are used to identify the bow’s height and angle relative to the strings, which are then used for postural classification. needed to track bow and string positions precisely for accurate classification while maintaining low inference latency to achieve real-time performance. We evaluated several models, including Segment Anything Model 3 (SAM … view at source ↗

**Figure 6.** Figure 6: Two label techniques were used to label each image needed to train the various YOLO models that utilize different computer vision techniques. (a) A labeled image used for training YOLO for instance segmentation, object detection, and orienting bounding boxes. (b) A labeled image used for training YOLO for keypoint detections, where points marked with "x" are occluded points. accurate results due to insuffi… view at source ↗

**Figure 7.** Figure 7: These images showcase YOLO model predictions after training the model with our custom, manually labeled dataset. (a) YOLO instance segmentation detection. (b) oriented bounding box detection. (c) YOLO object detection. (d) YOLO keypoint detection [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: The normalized confusion matrix above illustrates how well YOLO-OBB performed at distinguishing the three classes: Bow, String, and background. The model performs excellently at identifying strings (99% correct) and reasonably well at detecting background elements (91% correct). The model struggles to consistently identify the bow (78% correct) by misclassifying the bow as the background (22% incorrect).… view at source ↗

**Figure 9.** Figure 9: An example of hand and pose annotations with posture correction instruction at the top displayed by the app. Buttons on the right represent (from top to bottom): close camera, flip camera, open settings, and view session history [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: An example of the session summary with a detailed breakdown of (a) postural occurrence percentages and (b) representative screenshots for all categories under bow placement and cellist posture [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Mean severity ratings (0-4) for the 12 sub-heuristics, grouped into six higherlevel heuristic categories. Bar colors indicate severity tiers: strengths (0-1), moderate issues (1-3), and critical issues (3-4) [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

read the original abstract

Posture is a critical factor for beginning instrumental learners. Most students receive instruction only once a week, and during the intervals between lessons they have little or no feedback on their physical posture. As a result, posture often deteriorates, increasing the risk of musculoskeletal injury and inefficient technique. Recent advances in computer vision and machine learning make it possible to evaluate posture without the constant presence of a human expert. However, current solutions have been extremely limited in availability and convenience due to their reliance on computationally expensive hardware or multi-sensor setups. We present Cello Evaluator, a real-time postural feedback system for practicing cellists. Through this optimization for on-device computer vision inference, we provide access to cellist postural evaluation to anyone with a current generation Android phone and thus reduces the postural feedback voids within individual practice. To validate our mobile application, we conduct a heuristic evaluation consisting of cellist and UX experts. Overall feedback from the evaluation found the app to be user friendly and helpful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a practical Android app for cello posture feedback using on-device vision but validates only the UI with expert opinions, not the accuracy of the posture detection.

read the letter

This paper describes Cello Evaluator, a mobile app that runs pose estimation locally on Android phones to give cellists real-time feedback on their posture during practice. The core idea targets a clear practical problem: students often practice alone between weekly lessons and can develop bad habits that raise injury risk. Making the system work without cloud calls or special hardware is a reasonable engineering choice for accessibility. They report positive comments from cellist and UX experts who tried the app and found it user-friendly and helpful. That part lands as a modest but honest usability check. The work stays grounded in existing computer vision tools rather than claiming new models or algorithms, which keeps expectations in line with what is delivered. The citation pattern is light and appropriate for an application-focused piece. The soft spot is the evaluation. Heuristic feedback on the interface does not test whether the vision component correctly flags cello-specific posture problems in real time. There are no accuracy numbers, no ground-truth comparisons against expert labels on keypoints or issues, and no latency or resource measurements on the target phones. The stress-test note is accurate here: without those checks, the central claim about reliable postural evaluation remains unsupported. This is the kind of short engineering note that might fit a demo track or workshop in HCI or music technology, but it does not carry enough evidence for a full paper. Readers who want a working prototype description could get something out of it, but anyone expecting validated performance data would come away empty. I would not bring this to a reading group. I would not cite it. It does not deserve peer review in its current form.

Referee Report

3 major / 2 minor

Summary. The paper presents Cello Evaluator, a mobile Android application that performs real-time cellist postural evaluation using on-device computer vision. It argues that this addresses gaps in feedback during individual practice sessions, reducing injury risk and improving technique. Validation consists of a heuristic evaluation with cellist and UX experts, who rated the app as user-friendly and helpful overall.

Significance. If the underlying posture detection were shown to be accurate and low-latency, the work would offer a practical, accessible tool for musicians that leverages commodity hardware. The on-device focus is a positive engineering choice that could broaden access compared to cloud or multi-sensor systems. However, the absence of any technical performance data means the significance of the postural evaluation component cannot yet be assessed.

major comments (3)

[Evaluation] Evaluation section (and abstract): The heuristic evaluation reports only qualitative expert opinions on usability and helpfulness. No quantitative metrics are supplied for the core claim of accurate postural evaluation, such as precision/recall for posture keypoints, agreement with expert-labeled ground truth on cello-specific issues (e.g., shoulder position, wrist angle), or false-positive rates for feedback triggers.
[Implementation] Implementation / Methods: No description is given of the computer vision model (e.g., MediaPipe, OpenPose, or custom), any fine-tuning or dataset used for cellist postures, on-device optimization steps (quantization, model size), or inference pipeline. Without these, the feasibility of real-time on-device operation on standard Android hardware cannot be evaluated.
[Results] Results / Claims: The abstract and introduction assert real-time performance and reduction of 'postural feedback voids,' yet no latency measurements (ms per frame), hardware specifications tested, or accuracy benchmarks appear. This leaves the central engineering claim unsupported.

minor comments (2)

[Introduction] The abstract and introduction would benefit from a brief comparison table or citations to prior posture-detection systems in music education or general HCI to clarify novelty.
[Figures] If figures of the UI or detected keypoints exist, ensure they include example outputs with overlaid feedback to illustrate the system's behavior.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the technical aspects of the manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Evaluation] Evaluation section (and abstract): The heuristic evaluation reports only qualitative expert opinions on usability and helpfulness. No quantitative metrics are supplied for the core claim of accurate postural evaluation, such as precision/recall for posture keypoints, agreement with expert-labeled ground truth on cello-specific issues (e.g., shoulder position, wrist angle), or false-positive rates for feedback triggers.

Authors: We agree that quantitative accuracy metrics for the posture detection would strengthen the claims regarding effective postural evaluation. As this is an HCI-focused paper presenting an integrated mobile application rather than a novel computer vision algorithm, the evaluation centered on a heuristic assessment of usability and helpfulness with cellist and UX experts, following standard practices for prototype systems. The detection relies on off-the-shelf on-device models without cello-specific fine-tuning or ground-truth labeling in this work. In revision, we will expand the Evaluation and Discussion sections to explicitly note this limitation, qualify the claims about postural evaluation accuracy, and suggest directions for future quantitative validation studies. revision: partial
Referee: [Implementation] Implementation / Methods: No description is given of the computer vision model (e.g., MediaPipe, OpenPose, or custom), any fine-tuning or dataset used for cellist postures, on-device optimization steps (quantization, model size), or inference pipeline. Without these, the feasibility of real-time on-device operation on standard Android hardware cannot be evaluated.

Authors: We acknowledge the omission of implementation details and will revise the Methods section to include a complete description of the computer vision pipeline. This will cover the specific model used, any adaptations for cellist postures (including whether fine-tuning or a dedicated dataset was applied), on-device optimizations such as quantization and model size, and the end-to-end inference pipeline to allow assessment of real-time feasibility on standard Android hardware. revision: yes
Referee: [Results] Results / Claims: The abstract and introduction assert real-time performance and reduction of 'postural feedback voids,' yet no latency measurements (ms per frame), hardware specifications tested, or accuracy benchmarks appear. This leaves the central engineering claim unsupported.

Authors: We agree that explicit performance benchmarks are needed to support the real-time and accessibility claims. While the system was implemented and tested to run in real time on current Android devices, specific quantitative results were not reported in the initial submission. In the revision, we will add a Results subsection with latency measurements (e.g., ms per frame), the hardware specifications of devices tested, and any available accuracy-related observations to substantiate the engineering claims. revision: yes

Circularity Check

0 steps flagged

No derivations, predictions, or self-referential steps; straightforward engineering with external heuristic validation

full rationale

The paper presents an on-device CV mobile app for cellist posture feedback and validates it via a heuristic evaluation by cellist and UX experts. No equations, fitted parameters, predictions, or derivation chains appear. The validation is an independent external assessment rather than a self-referential fit or self-citation. This matches the default expectation of no significant circularity for applied engineering work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No theoretical components, free parameters, axioms, or invented entities are involved; the work is an applied system implementation relying on standard computer vision libraries and mobile hardware.

pith-pipeline@v0.9.0 · 5535 in / 1029 out tokens · 35604 ms · 2026-05-10T05:34:57.367761+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 18 canonical work pages · 5 internal anchors

[1]

Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P.N., Inkpen, K., Teevan, J., Kikin-Gil, R., Horvitz, E.: Guidelinesforhuman-aiinteraction.In:Proceedingsofthe2019CHIConferenceon Human Factors in Computing Systems. p. 1–13. CHI ’19, Association for Comput- ing Machinery, New York, NY, USA (2019).ht...

work page doi:10.1145/3290605 2019
[2]

Bradski, G.: The opencv library. Dr. Dobb’s Journal of Software Tools (2000)

2000
[3]

Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: Realtime multi- person 2d pose estimation using part affinity fields (2019),https://arxiv.org/ abs/1812.08008

work page arXiv 2019
[4]

SAM 3: Segment Anything with Concepts

Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K.V., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., Rädle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.H., Zhou, Y., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng, H.K., ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Journal of Sport and Health Research4, 23–34 (10 2011)

Figueres, J., Perez-Soriano, P., Belloch, S., Figueres, E.: Injuries prevention in string players. Journal of Sport and Health Research4, 23–34 (10 2011)

2011
[6]

JAAPA: official journal of the American Academy of Physician Assistants21(4) (2008).https: //doi.org/10.1097/01720610-200804000-00015

Heinan, M.: A review of the unique injuries sustained by musicians. JAAPA: official journal of the American Academy of Physician Assistants21(4) (2008).https: //doi.org/10.1097/01720610-200804000-00015

work page doi:10.1097/01720610-200804000-00015 2008
[7]

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An- dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017),https://arxiv.org/abs/1704.04861

work page internal anchor Pith review arXiv 2017
[8]

Computer Music Journal43(1), 59–78 (2020).https://doi

Johnson, D., Damian, D., Tzanetakis, G.: Detecting hand posture in piano playing using depth data. Computer Music Journal43(1), 59–78 (2020).https://doi. org/10.1162/comj_a_00500

work page doi:10.1162/comj_a_00500 2020
[9]

Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhance- ments (2024),https://arxiv.org/abs/2410.17725 20 Wang et al

work page internal anchor Pith review arXiv 2024
[10]

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., Chang, W.T., Hua, W., Georg, M., Grund- mann, M.: Mediapipe: A framework for building perception pipelines (2019), https://arxiv.org/abs/1906.08172

work page internal anchor Pith review arXiv 2019
[11]

com/articles/ten-usability-heuristics/(1994), accessed: 2026-01-20

Nielsen, J.: 10 usability heuristics for user interface design.https://www.nngroup. com/articles/ten-usability-heuristics/(1994), accessed: 2026-01-20

1994
[12]

Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of theSIGCHIConferenceonHumanFactorsinComputingSystems.p.249–256.CHI ’90, Association for Computing Machinery, New York, NY, USA (1990).https: //doi.org/10.1145/97243.97281,https://doi.org/10.1145/97243.97281

work page doi:10.1145/97243.97281 1990
[14]

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection (2016),https://arxiv.org/abs/1506.02640

work page arXiv 2016
[15]

Scientific Reports10, 13882 (08 2020)

Rozé, J., Aramaki, M., Kronland-Martinet, R., Ystad, S.: Cellists’ sound quality is shaped by their primary postural behavior. Scientific Reports10, 13882 (08 2020). https://doi.org/10.1038/s41598-020-70705-8

work page doi:10.1038/s41598-020-70705-8 2020
[16]

Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks (2020),https://arxiv.org/abs/1905.11946

work page arXiv 2020
[17]

Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors (2025),https://arxiv.org/abs/2502.12524

work page internal anchor Pith review arXiv 2025
[18]

Empowering edge intelligence: A comprehensive survey on on-device ai models,

Wang, X., Tang, Z., Guo, J., Meng, T., Wang, C., Wang, T., Jia, W.: Empowering edgeintelligence:Acomprehensivesurveyonon-deviceaimodels.ACMComputing Surveys57(9), 1–39 (Apr 2025).https://doi.org/10.1145/3724420,http://dx. doi.org/10.1145/3724420

work page doi:10.1145/3724420 2025
[19]

Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2.https:// github.com/facebookresearch/detectron2(2019), computer software

2019
[20]

Molecular & Cellular Biome- chanics22, 762 (01 2025).https://doi.org/10.62617/mcb762

Yang, P.: Integrating intelligent algorithms in music education to analyze and im- prove posture and motion in instrumental training. Molecular & Cellular Biome- chanics22, 762 (01 2025).https://doi.org/10.62617/mcb762

work page doi:10.62617/mcb762 2025
[21]

Yaseen, M.: What is yolov8: An in-depth exploration of the internal features of the next-generation object detector (2024),https://arxiv.org/abs/2408.15857

work page arXiv 2024
[22]

Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.L., Grundmann, M.: Mediapipe hands: On-device real-time hand tracking (2020), https://arxiv.org/abs/2006.10214

work page arXiv 2020
[23]

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection (2024),https://arxiv.org/abs/2304.08069

work page arXiv 2024