pith. sign in

arxiv: 2602.22941 · v2 · pith:4SXV7ZSHnew · submitted 2026-02-26 · 💻 cs.CV

Velocity and stroke rate reconstruction of canoe sprint team boats based on panned and zoomed video recordings

Pith reviewed 2026-05-21 11:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords canoe sprintvelocity reconstructionstroke ratevideo analysishomographyobject detectionsports performance
0
0 comments X

The pith

Computer vision reconstructs canoe sprint velocity and stroke rate from panned and zoomed videos to match GPS accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automated video-based system to measure boat speed and paddling rate in canoe sprint races without attaching sensors or using GPS. It detects buoys and athletes in the footage, uses the known layout of racecourse markers to calculate positions via perspective transforms, and tracks the boat even when the camera pans and zooms. The approach also pulls stroke rate from athlete poses or bounding boxes. If successful, it would let coaches analyze pacing strategies from ordinary video recordings across all boat types and distances.

Core claim

The framework estimates boat positions by computing homographies from detected buoys in the known grid and calibrates the boat tip location using a learned athlete offset from a U-net. It tracks multi-athlete boats with optical flow and extracts stroke rate from pose estimates or bounding box movements. When tested on elite competition videos, this yields velocity errors under 1.5 percent and stroke rate errors under 1.3 percent compared to GPS ground truth.

What carries the argument

Homography estimation from YOLOv8-detected buoys combined with U-net boat tip calibration for position reconstruction in panned and zoomed videos.

Load-bearing premise

The racecourse buoy positions must be known in advance and the buoys must be accurately detected in every video frame despite camera movements.

What would settle it

A direct comparison of the reconstructed velocity profiles against simultaneous GPS measurements on additional races where buoy detection occasionally fails would show whether errors exceed the reported 1 percent MAPE.

Figures

Figures reproduced from arXiv: 2602.22941 by Daniel Matthes, Finn Gerdts, Julian Ziegler, Matthias Englert, Mirco Fuchs, Patrick Frenzel, Tina Koevari, Torsten Warnke.

Figure 1
Figure 1. Figure 1: Visualization of the reconstructed scene geometry for a K1 canoe race over the distance of 500 m according to [1]. Orange lines correspond to lane boundaries, blue ones to boundaries between equidistant segment, and yellowish dots where orange and blue lines cross are expected buoy location alongside the track. See Sec. 3 for an outline of the scene reconstruction method. Numbers in red coloured annotation… view at source ↗
Figure 2
Figure 2. Figure 2: Shown is the scene geometry of a regatta course (left) and an approximate orthogonal camera view on the scene (right). For clarity, yellow buoys highlighted in the scene are those visible in the right image. Corresponding buoys in both views can be identified based on their lane boundaries Bi and track segments Sj . Correspon￾dences are used to estimate a homography and propagated to adjacent frames. Adapt… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of all boat classes in our main dataset. C refers to canoe, where single-bladed paddles are used. K refers to kayak, where double-bladed paddles are mandated. The number indicated the number of athletes per boat. 4.2 Datasets This section presents the datasets used throughout this study. We distinguish between a main dataset and a curated dataset that is only used to train the boat tip detect… view at source ↗
Figure 4
Figure 4. Figure 4: Graphical Illustration of Automatic Boat Tip Offset Calibration. (1) For every detected athlete bounding box (green), we assume the centre of the bottom edge to be the image position of that athlete (blue point). (2) Athlete positions are transformed to the 2d world space, and a standard offset is applied to the mean position of the athletes (orange). (3) This position is reprojected into the image, and de… view at source ↗
Figure 5
Figure 5. Figure 5: Principle of boat tip localization using a U-Net. The image annotation (green dot) of the boat tip (left) is encoded via a heatmap with gaussian kernel (middle). The U-Net is trained to predict this heatmap (right) from which the predicted position (red cross) is derived using argmax over the predicted heatmap. the decoder. This mechanism facilitates a significantly more precise localization of the feature… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of stroke estimation methods across one paddle stroke. Top: A motion signal is built by calculating the Euclidean distance between the detected shoul￾der joint (orange) and wrist joint (blue) normalized by bounding box width in every frame. Bottom: The mean brightness of a patch 20 % the width and height of the athlete bounding box (purple) in the left bottom corner is used as the motion sign… view at source ↗
Figure 7
Figure 7. Figure 7: Perspective variance of the athlete positioning model relative to race distance. While a minor systematic negative shift in mean error occurs as the camera panning angle increases, the mean absolute error remains below 0.4 m. Standard deviation is consistent. The model maintains sufficient accuracy for boat tip estimation. Best viewed on screen [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Agreement analysis between the proposed velocity estimates and GPS ground truth. Left: Bland-Altman plot illustrating the mean difference and limits of agree￾ment (±1.97 SD), indicating minimal bias and only few outliers beyond the confidence bounds. Right: Relative velocity error over the full dataset, demonstrating low devia￾tion from GPS measurements across the entire velocity range. 500 450 400 350 300… view at source ↗
Figure 9
Figure 9. Figure 9: Results of GPS-based (blue) and video-based (orange) velocity profiles visu￾alized over race distance. Left: 500m. Right: 200m. Both figures underpin the high agreement between both video- and GPS-based results [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Agreement analysis between ViTPose stroke rate estimates and Gyrometric ground truth. Left: Bland-Altman plot illustrating the mean difference and limits of agreement (±1.97 SD), indicating minimal bias and only few outliers beyond the con￾fidence bounds. The two obvious clusters are caused by different stroke rates of canoe and kayak athletes. Right: Relative stroke rate error over the full dataset, demo… view at source ↗
Figure 11
Figure 11. Figure 11: Stroke rate analysis was conducted over all boat classes, but data for multi￾athlete boats consistently suffered from occlusions due to several poles that become visible while moving the camera to follow the boats. The occlusion occurred at around 120m (left) and 320m (right) race distance. These correspond to the higher standard deviations observed in our predictions at -380m and -180m, see [PITH_FULL_I… view at source ↗
Figure 12
Figure 12. Figure 12: Results of both Pose-based (green) and Bounding Box based (orange) stroke rate estimation, visualized for two races. The left race shows both methods performing well, as they are both able to capture the form of the profile. On the right, both methods show weaknesses and do not correlate exactly with ground truth data. However, the bounding box method delivers much worse results [PITH_FULL_IMAGE:figures/… view at source ↗
Figure 13
Figure 13. Figure 13: Demonstration of the application of the methodological extensions to analyse team boats as presented in this paper to a K4 canoe race over 500m distance. See [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
read the original abstract

Pacing strategies, defined by velocity and stroke rate profiles, are essential for peak performance in canoe sprint. While GPS is the gold standard for analysis, its limited availability necessitates automated video-based solutions. This paper presents an extended framework for reconstructing performance metrics from panned and zoomed video recordings across all sprint disciplines (K1-K4, C1-C2) and distances (200m-500m). Our method utilizes YOLOv8 for buoy and athlete detection, leveraging the known buoy grid to estimate homographies. We generalized the estimation of the boat position by means of learning a boat-specific athlete offset using a U-net based boat tip calibration. Further, we implement a robust tracking scheme using optical flow to adapt to multi-athlete boat types. Finally, we introduce methods to extract stroke rate information from either pose estimations or the athlete bounding boxes themselves. Evaluation against GPS data from elite competitions yields a velocity MAPE of 0.011 [0.008 0.014] (Spearman rho=0.974) and a stroke rate MAPE of 0.009 [0.006 0.013] (Spearman rho = 0.975). The methods provide coaches with highly accurate, automated feedback with minimal manual initialization work required, and without requiring sensors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents an automated computer vision pipeline for reconstructing velocity and stroke rate in canoe sprint team boats (K1-K4, C1-C2) from panned and zoomed video recordings. The approach detects buoys and athletes with YOLOv8, computes homographies from a known buoy grid for position estimation, uses a U-Net to learn boat-specific athlete offsets for boat tip calibration, applies optical flow for robust tracking in multi-athlete scenarios, and extracts stroke rates from pose estimation or bounding boxes. End-to-end evaluation against GPS data from elite competitions reports velocity MAPE of 0.011 with Spearman rho=0.974 and stroke rate MAPE of 0.009 with rho=0.975.

Significance. If the results hold, this provides a practical sensor-free method for obtaining detailed performance metrics in canoe sprint, which is significant given GPS limitations in competitions. The generalization across boat types and distances, combined with high correlations to independent GPS measurements, supports utility for coaches. The end-to-end validation against real elite data and use of reproducible CV components (YOLOv8, U-Net, optical flow) are strengths that enhance the work's applicability in sports analytics.

major comments (1)
  1. [§3.2] §3.2 (Homography and Position Estimation): The velocity reconstruction depends on per-frame homographies computed from YOLOv8 buoy detections on the known grid. No quantitative metrics are reported for buoy detection precision/recall, homography reprojection error, or failure rates across zoom/pan conditions. This is load-bearing for the central claim because the reported velocity MAPE of 0.011 is end-to-end against GPS; without these intermediate statistics it remains unclear whether the low error persists when detections are imperfect (common when buoys occupy few pixels in zoomed footage), even with optical-flow tracking and U-Net offset.
minor comments (2)
  1. [Abstract] The abstract states generalization to all disciplines and distances but provides no breakdown of the number of boats, athletes, or video sequences used in the GPS evaluation; adding this would strengthen the cross-discipline claims.
  2. [§4] Consider including a supplementary table or figure showing per-boat-type MAPE and rho values (e.g., K1 vs. K4) to support the claim of applicability to team boats.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive major comment. We address it point by point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Homography and Position Estimation): The velocity reconstruction depends on per-frame homographies computed from YOLOv8 buoy detections on the known grid. No quantitative metrics are reported for buoy detection precision/recall, homography reprojection error, or failure rates across zoom/pan conditions. This is load-bearing for the central claim because the reported velocity MAPE of 0.011 is end-to-end against GPS; without these intermediate statistics it remains unclear whether the low error persists when detections are imperfect (common when buoys occupy few pixels in zoomed footage), even with optical-flow tracking and U-Net offset.

    Authors: We agree that reporting these intermediate metrics would improve transparency and allow readers to evaluate robustness under realistic detection challenges. Although our primary contribution and validation focus on end-to-end accuracy against GPS (the metric most relevant to coaches), we acknowledge the referee's point that dissecting the homography stage is valuable. In the revised manuscript we will add to §3.2: (i) precision and recall for YOLOv8 buoy detections on our annotated validation frames, (ii) mean and standard deviation of homography reprojection error both in image pixels and in world coordinates, and (iii) the fraction of frames in which homography estimation failed or fell back to optical-flow tracking, stratified by zoom level and pan speed. These statistics are computable from our existing dataset and will be presented in a new table. We expect the numbers to show that the combination of optical flow and U-Net offset calibration keeps velocity error low even when individual buoy detections are imperfect. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external GPS validation anchors results

full rationale

The paper reconstructs velocity and stroke rate via YOLOv8 buoy/athlete detection, known-grid homographies, U-Net boat-tip offset calibration, optical-flow tracking, and pose/bounding-box stroke extraction. All performance claims (velocity MAPE 0.011, stroke-rate MAPE 0.009, Spearman correlations) are computed against independent GPS ground truth from elite competitions, not against the method's own fitted parameters or internal assumptions. No self-definitional equations, fitted-input-as-prediction steps, or load-bearing self-citations appear in the derivation chain. The external benchmark keeps the evaluation non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach builds on standard computer vision tools with learned components for domain-specific calibration; no new physical entities postulated.

free parameters (1)
  • boat-specific athlete offset
    Learned via U-net based calibration for boat tip position
axioms (2)
  • domain assumption Buoy grid positions are known and fixed
    Essential for estimating homographies from detected buoys to real-world coordinates
  • domain assumption YOLOv8 detections are sufficiently accurate for buoy and athlete localization
    Underpins the homography and position estimation

pith-pipeline@v0.9.0 · 5789 in / 1371 out tokens · 58378 ms · 2026-05-21T11:40:29.931206+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Recon- structing velocity profiles using scene geometry in panned and zoomed canoe sprint videos,

    D. Matthes, P. Frenzel, J. Ziegler, T. Warnke, T. K¨ ovari, and M. Fuchs, “Recon- structing velocity profiles using scene geometry in panned and zoomed canoe sprint videos,” in2025 IEEE International Workshop on Sport, Technology and Research (STAR), pp. 78–83, Oct. 2025

  2. [2]

    YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness,

    R. Varghese and S. M., “YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness,” in2024 International Conference on Ad- vances in Data Engineering and Intelligent Computing Systems (ADICS), (Chen- nai, India), pp. 1–6, IEEE, Apr. 2024

  3. [3]

    Vitpose: simple vision transformer base- lines for human pose estimation,

    Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: simple vision transformer base- lines for human pose estimation,” inProceedings of the 36th International Confer- ence on Neural Information Processing Systems, NIPS ’22, (Red Hook, NY, USA), Curran Associates Inc., 2022

  4. [4]

    Road Plane Detec- tion using Differential Homography Estimated by Pair Feature Matching of Local Regions,

    K. Nishida, J. Fujiki, C. Tsuchiya, S. Tanaka, and T. Kurita, “Road Plane Detec- tion using Differential Homography Estimated by Pair Feature Matching of Local Regions,” ACTA Press, Apr. 2011

  5. [5]

    Continuous 3D Label Stereo Matching using Local Expansion Moves,

    T. Taniai, Y. Matsushita, Y. Sato, and T. Naemura, “Continuous 3D Label Stereo Matching using Local Expansion Moves,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, pp. 2725–2739, Nov. 2018

  6. [6]

    Estimation of Runners’ Number of Steps, Stride Length and Speed Transition from Video of a 100-Meter Race,

    K. Yagi, K. Hasegawa, Y. Sugiura, and H. Saito, “Estimation of Runners’ Number of Steps, Stride Length and Speed Transition from Video of a 100-Meter Race,” in Proceedings of the 1st International Workshop on Multimedia Content Analysis in Sports, (Seoul Republic of Korea), pp. 87–95, ACM, Oct. 2018

  7. [7]

    No bells just whistles: Sports field registration by leveraging geometric properties,

    M. Guti´ errez-P´ erez and A. Agudo, “No bells just whistles: Sports field registration by leveraging geometric properties,” inProceedings of the IEEE/CVF Conference Reconstruction of canoe velocity and stroke rate XXIII on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3325–3334, June 2024

  8. [8]

    Video-based Sequential Bayesian Homogra- phy Estimation for Soccer Field Registration,

    P. J. Claasen and J. P. d. Villiers, “Video-based Sequential Bayesian Homogra- phy Estimation for Soccer Field Registration,”Expert Systems with Applications, vol. 252, p. 124156, Oct. 2024

  9. [9]

    AuxFlow: Anchor-grounded ho- mography estimation through flow-guided auxiliary points for Soccer field registra- tion and player localization,

    J. Ziegler, D. Matthes, P. Frenzel, and M. Fuchs, “AuxFlow: Anchor-grounded ho- mography estimation through flow-guided auxiliary points for Soccer field registra- tion and player localization,”Computer Vision and Image Understanding, vol. 264, p. 104662, Feb. 2026

  10. [10]

    Utilizing Mask R-CNN for Waterline Detection in Canoe Sprint Video Analysis,

    M.-S. von Braun, P. Frenzel, C. Kading, and M. Fuchs, “Utilizing Mask R-CNN for Waterline Detection in Canoe Sprint Video Analysis,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3826–3835, June 2020

  11. [11]

    System for Performance Assessment of K2 Crews in Flatwater Sprint Kayak,

    V. Bonaiuto, G. Annino, P. Boatto, N. Lanotte, L. Caprioli, E. Padua, and C. Ro- magnoli, “System for Performance Assessment of K2 Crews in Flatwater Sprint Kayak,” in2022 IEEE International Workshop on Sport, Technology and Research (STAR), pp. 56–60, July 2022

  12. [12]

    AI-Driven Paddle Motion Detection,

    A. Najlaoui, F. Campoli, L. Caprioli, S. Edriss, C. Frontuto, C. Romagnoli, G. An- nino, V. Bonaiuto, and A. Zanela, “AI-Driven Paddle Motion Detection,” in 2024 IEEE International Workshop on Sport, Technology and Research (STAR), pp. 290–295, July 2024

  13. [13]

    3D human pose estimation in video with temporal convolutions and semi-supervised training

    D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, “3D human pose estimation in video with temporal convolutions and semi-supervised training,” Mar. 2019. http://arxiv.org/abs/1811.11742

  14. [14]

    End-to-End Camera Calibration for Broadcast Videos,

    L. Sha, J. Hobbs, P. Felsen, X. Wei, P. Lucey, and S. Ganguly, “End-to-End Camera Calibration for Broadcast Videos,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13624–13633, June 2020

  15. [15]

    A Video-Based Method to Quantify Stroke Syn- chronisation in Crew Boat Sprint Kayaking,

    C. S. Tay and P. W. Kong, “A Video-Based Method to Quantify Stroke Syn- chronisation in Crew Boat Sprint Kayaking,”Journal of Human Kinetics, vol. 65, pp. 45–56, Dec. 2018

  16. [16]

    An analysis of pacing profiles in sprint kayak racing using functional principal compo- nents and hidden Markov models,

    H. Estreich, N. Bullock, M. Osborne, E. Santos-Fernandez, and P. P.-Y. Wu, “An analysis of pacing profiles in sprint kayak racing using functional principal compo- nents and hidden Markov models,”PLOS ONE, vol. 20, p. e0326375, July 2025. Publisher: Public Library of Science

  17. [17]

    Algorithm-Based Real-Time Analysis of Training Phases in Competi- tive Canoeing: An Automated Approach for Performance Monitoring,

    S. Amat, S. Busquier, C. D. G´ omez-Carmona, M. G´ omez-L´ opez, and J. Pino- Ortega, “Algorithm-Based Real-Time Analysis of Training Phases in Competi- tive Canoeing: An Automated Approach for Performance Monitoring,”Algorithms, vol. 18, p. 242, May 2025. Publisher: Multidisciplinary Digital Publishing Institute

  18. [18]

    An iterative image registration technique with an application to stereo vision,

    B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inProc. of 7th int. joint conference on Artificial intelligence - Volume 2, IJCAI’81, pp. 674–679, Aug. 1981