pith. machine review for the scientific record. sign in

arxiv: 2604.17567 · v1 · submitted 2026-04-19 · 💻 cs.CV · eess.IV

Recognition: unknown

Multi-Camera Self-Calibration in Sports Motion Capture: Leveraging Human and Stick Poses

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:54 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords multi-camera calibrationself-calibrationsports motion captureextrinsic calibrationhuman posestick posetool-free calibrationscale resolution
0
0 comments X

The pith

A tool-free method calibrates multi-camera setups in stick sports by jointly using human body poses and the known length of implements like bats or clubs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a self-calibration technique for multiple cameras that records sports motion without needing special calibration patterns or targets. It combines detections of an athlete's body keypoints, whose absolute size is unknown, with the fixed physical length of a rigid stick such as a golf club or hockey stick. A sympathetic reader would care because setting up accurate 3D tracking for coaching or analysis has traditionally required cumbersome equipment and manual labor. The approach runs a three-stage optimization that first refines camera positions and orientations, then reconstructs the motion paths, and finally locks the overall scale using the stick constraint. Tests on a new synthetic dataset across golf, baseball, and similar activities show lower errors than prior methods.

Core claim

Extrinsic calibration of multi-camera systems can be achieved accurately without dedicated tools by formulating a three-stage optimization pipeline that jointly exploits human body keypoints with unknown metric scale and a rigid stick-like implement of known length from synchronized videos, thereby refining camera extrinsics, reconstructing human and stick trajectories, and resolving global scale via the stick-length constraint.

What carries the argument

Three-stage optimization pipeline that refines camera extrinsics, reconstructs human and stick trajectories, and resolves global scale via the stick-length constraint.

If this is right

  • Accurate extrinsic calibration is obtained without any dedicated calibration tools or patterns.
  • The first benchmark dataset for this task supplies synthetic sequences across four sports categories with 3 to 10 cameras.
  • Low rotation and translation errors are achieved on the introduced dataset, outperforming prior approaches.
  • The pipeline supports varying numbers of cameras and multiple stick-based sports without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stick-length constraint could serve as a lightweight scale reference in other multi-view setups such as robotics or surveillance where a known rigid object is present.
  • If pose detection noise is the main remaining error source, replacing the current keypoint estimator with a more robust network would directly lower calibration residuals.
  • Extending the optimization to handle mild asynchrony between cameras would broaden the method to consumer-grade recording setups.
  • The synthetic dataset could be used to train end-to-end networks that predict extrinsics directly from raw video clips.

Load-bearing premise

The method assumes access to synchronized multi-camera videos containing both detectable human body keypoints and a rigid stick-like implement of known length.

What would settle it

Apply the method to real multi-camera footage of golf swings where camera positions are independently measured with a traditional checkerboard procedure, and check whether the reported rotation and translation errors remain below a few degrees and centimeters.

Figures

Figures reproduced from arXiv: 2604.17567 by Changsoo Jung, Fan Yang, Hon Yung Wong, Ryosuke Kawamura.

Figure 1
Figure 1. Figure 1: Illustration of our proposal. For sports involving stick-like implements (e.g., golf, baseball, hockey, and kendo), we perform a novel tool-free multi-camera extrinsic calibration by leveraging both the stick and the human poses. priori. While some approaches attempt to resolve scale using assumptions about average height [14] or rely on additional sensors such as LiDAR or IMUs [15], [16], these solutions … view at source ↗
Figure 2
Figure 2. Figure 2: Three-stage optimization. (1) An initial, unscaled 3D pose is reconstructed from multi-view 2D keypoints via Bundle Adjustment (BA). (2) The real-world scale is recovered using a known measurement, such as the length of the baseball bat (0.86 m). (3) A final, scale-aware BA is performed to refine the metric 3D reconstruction. A. Preliminaries and Notation We consider a setup of C synchronized cameras obser… view at source ↗
Figure 3
Figure 3. Figure 3: Our Sports-Stick-Syn dataset. Top: Statistics of our dataset, including sports categories, number of cameras, and noise levels. Bottom: An example visualization from the dataset. Algorithm 1 Multi-Camera Calibration with Metric Stick Require: Intrinsics {Ki}, 2D human points {vi,f,j}, 2D stick endpoints {ui,f,e}, known stick length L. 1: Initialize: 2: Fix gauge: R1 ← I, t1 ← 0. 3: Compute pairwise essenti… view at source ↗
Figure 4
Figure 4. Figure 4: Runtime distribution for our method. Each box plot shows the median (orange line), interquartile range, and outliers. TABLE III MEMORY USAGE SUMMARY (MB) Cams Initial Mem. Peak Mem. (typical) Peak Mem. (worst) 3 0.08 0.2 1.5 4 0.09 0.2 2.1 5 0.11 0.2 2.6 6 0.13 0.3 3.2 7 0.14 0.3 3.8 8 0.16 0.4 4.4 9 0.18 0.4 5.0 10 0.19 0.5 5.6 which effectively stabilizes rotation estimation and eliminates metric drift i… view at source ↗
Figure 5
Figure 5. Figure 5: Per-camera Error Distribution. The human + stick approach yields lower errors and variance in both rotation and translation [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Per-noise-level Error Distribution. The human + stick approach yields lower errors and smaller variance in both rotation and translation. TABLE V ABLATION STUDY: STAGE 3 COMPONENTS. Configuration Avg Rot. Err. (◦)↓Avg Trans. Err. (m)↓ Ours (Full) 0.020 0.001 w/o Length Constraint 0.091 0.011 w/o Temporal Smoothness 0.022 0.002 As presented in Tab. IV, our analysis evaluates the performance of different inp… view at source ↗
Figure 7
Figure 7. Figure 7: Enabling precise 3D sports gesture analysis via tool-free self-calibration. Our method utilizes rigid stick constraints to self-calibrate multi-camera setups (Left & Center), enabling scale-aware 3D pose reconstruction. This supports robust downstream applications in sports analytics (Right), allowing for the accurate analysis of sport gestures and movement. per-noise-level errors to support a more detaile… view at source ↗
read the original abstract

Multi-camera systems are widely employed in sports to capture the 3D motion of athletes and equipment, yet calibrating their extrinsic parameters remains costly and labor-intensive. We introduce an efficient, tool-free method for multi-camera extrinsic calibration tailored to sports involving stick-like implements (e.g., golf clubs, bats, hockey sticks). Our approach jointly exploits two complementary cues from synchronized multi-camera videos: (i) human body keypoints with unknown metric scale and (ii) a rigid stick-like implement of known length. We formulate a three-stage optimization pipeline that refines camera extrinsics, reconstructs human and stick trajectories, and resolves global scale via the stick-length constraint. Our method achieves accurate extrinsic calibration without dedicated calibration tools. To benchmark this task, we present the first dataset for multi-camera self-calibration in stick-based sports, consisting of synthetic sequences across four sports categories with 3 to 10 cameras. Comprehensive experiments demonstrate that our method delivers SOTA performance, achieving low rotation and translation errors. Our project page: https://fandulu.github.io/sport_stick_multi_cam_calib/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a three-stage optimization pipeline for multi-camera extrinsic self-calibration in stick-based sports (e.g., golf, baseball). It jointly exploits scale-ambiguous human body keypoints and rigid sticks of known length from synchronized videos to refine camera extrinsics, reconstruct human/stick trajectories, and resolve global scale via the stick-length constraint. A new synthetic dataset with 3–10 cameras across four sports is presented, and experiments claim SOTA performance with low rotation and translation errors.

Significance. If the accuracy claims hold under realistic conditions, the method offers a practical, tool-free alternative to traditional calibration for sports motion capture, potentially reducing setup costs. The introduction of the first benchmark dataset for this specific task is a clear positive contribution that could enable future comparisons.

major comments (2)
  1. [§4 (Experiments)] §4 (Experiments): All quantitative results (rotation/translation errors, SOTA comparisons) are reported exclusively on synthetic sequences with idealized, noise-free 2D detections. No real-world multi-camera sports footage, no ablation on keypoint detector noise (e.g., OpenPose/HRNet errors), and no occlusion/motion-blur tests are included. This is load-bearing for the central claim of 'accurate extrinsic calibration' because the joint optimization of extrinsics, human trajectories, and stick trajectories is sensitive to 2D errors that propagate into scale drift or local minima.
  2. [§3.2–3.3 (Optimization Pipeline)] §3.2–3.3 (Optimization Pipeline): The scale-resolution step applies the known stick length only after trajectory reconstruction; the manuscript provides no analysis of how modest 2D keypoint errors affect convergence or final extrinsic accuracy, nor any initialization sensitivity study. This directly affects whether the three-stage pipeline supports the stated low-error claims outside idealized conditions.
minor comments (2)
  1. [Abstract and §4] The abstract and results section use vague phrasing ('low rotation and translation errors') without immediately citing the exact numerical values or table rows for the best-performing configurations.
  2. [Figures in §4] Figure captions and axis labels in the qualitative results could more clearly distinguish camera counts and sports categories to improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, clarifying the scope of our synthetic benchmark while committing to targeted revisions that strengthen the analysis of robustness.

read point-by-point responses
  1. Referee: [§4 (Experiments)] All quantitative results (rotation/translation errors, SOTA comparisons) are reported exclusively on synthetic sequences with idealized, noise-free 2D detections. No real-world multi-camera sports footage, no ablation on keypoint detector noise (e.g., OpenPose/HRNet errors), and no occlusion/motion-blur tests are included. This is load-bearing for the central claim of 'accurate extrinsic calibration' because the joint optimization of extrinsics, human trajectories, and stick trajectories is sensitive to 2D errors that propagate into scale drift or local minima.

    Authors: We agree that robustness to realistic 2D detection noise is essential for practical claims. The synthetic dataset was deliberately constructed with noise-free detections to isolate the calibration pipeline's behavior and to establish the first controlled benchmark for this task, enabling precise SOTA comparisons. In the revised manuscript we will add ablations that inject Gaussian noise calibrated to typical OpenPose/HRNet error distributions, as well as simulated occlusion and motion-blur patterns, and report the resulting extrinsic and scale errors. Real-world multi-view sports sequences with accurate ground-truth extrinsics remain difficult to acquire at scale; we will explicitly discuss this limitation and the synthetic results as an upper-bound reference. revision: partial

  2. Referee: [§3.2–3.3 (Optimization Pipeline)] The scale-resolution step applies the known stick length only after trajectory reconstruction; the manuscript provides no analysis of how modest 2D keypoint errors affect convergence or final extrinsic accuracy, nor any initialization sensitivity study. This directly affects whether the three-stage pipeline supports the stated low-error claims outside idealized conditions.

    Authors: We will add a dedicated sensitivity subsection (likely in §4) that quantifies the effect of increasing 2D keypoint noise on convergence rate, final rotation/translation errors, and scale accuracy. The study will also include an initialization sensitivity analysis by applying controlled perturbations to the initial camera poses and reporting success rates and accuracy statistics across multiple random seeds. These additions will directly address how the three-stage pipeline behaves beyond the noise-free setting. revision: yes

standing simulated objections not resolved
  • Quantitative evaluation on real-world multi-camera sports footage with precise ground-truth extrinsics, owing to the substantial practical difficulties in collecting and annotating such data.

Circularity Check

0 steps flagged

No circularity; scale constraint is an external independent input.

full rationale

The claimed three-stage pipeline refines extrinsics and trajectories while using the known physical length of the stick as an external constraint to resolve the metric scale that is otherwise ambiguous from human keypoints alone. This length is supplied as a fixed, independent measurement rather than being fitted or derived from the same data being calibrated. No equations or steps in the provided description reduce the output calibration to a tautology or to a self-citation chain; the optimization is a standard bundle-adjustment-style procedure with an added rigid-length prior. Synthetic-data experiments do not alter the logical independence of the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard computer-vision assumptions about pose detection and rigid-body geometry rather than new invented entities. No free parameters are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Human body keypoints can be reliably detected from synchronized multi-view video with unknown metric scale.
    Invoked as cue (i) in the joint optimization.
  • domain assumption The stick-like implement is rigid and has a known physical length.
    Used as the scale-resolving constraint in the final stage.

pith-pipeline@v0.9.0 · 5500 in / 1419 out tokens · 50258 ms · 2026-05-10T05:54:43.694849+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Multi-view hockey tracking with trajectory smoothing and camera selection,

    L. Wu, “Multi-view hockey tracking with trajectory smoothing and camera selection,” Ph.D. dissertation, University of British Columbia, 2008

  2. [2]

    Multi-camera video surveillance for real-time analysis and reconstruction of soccer games,

    J. Ren, M. Xu, J. Orwell, and G. A. Jones, “Multi-camera video surveillance for real-time analysis and reconstruction of soccer games,” Machine Vision and Applications, vol. 21, no. 6, pp. 855–863, 2010

  3. [3]

    Feature extraction and representation for distributed multi-view human action recognition,

    J. Luo, W. Wang, and H. Qi, “Feature extraction and representation for distributed multi-view human action recognition,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 3, no. 2, pp. 145–154, 2013

  4. [4]

    Multi-camera multi-player tracking with deep player identification in sports video,

    R. Zhang, L. Wu, Y . Yang, W. Wu, Y . Chen, and M. Xu, “Multi-camera multi-player tracking with deep player identification in sports video,” Pattern Recognition, vol. 102, p. 107260, 2020

  5. [5]

    Diffusion convolution neural network-based multiview gesture recognition for athletes in dynamic scenes,

    Q. Wang and H. Li, “Diffusion convolution neural network-based multiview gesture recognition for athletes in dynamic scenes,”Journal of Circuits, Systems and Computers, vol. 33, no. 06, p. 2450114, 2024

  6. [6]

    Enhancing multi-camera gymnast tracking through domain knowledge integration,

    F. Yang, S. Odashima, S. Masui, I. Kusajima, S. Yamao, and S. Jiang, “Enhancing multi-camera gymnast tracking through domain knowledge integration,”IEEE Transactions on Circuits and Systems for Video Technology, 2024

  7. [7]

    A conceptual framework and review of multi-method approaches for 3d markerless motion capture in sports and exercise,

    H. Noorbhai, S. Moon, and T. Fukushima, “A conceptual framework and review of multi-method approaches for 3d markerless motion capture in sports and exercise,”Journal of sports sciences, vol. 43, no. 12, pp. 1167–1174, 2025

  8. [8]

    Biomechanical golf swing analysis using markerless three-dimensional skeletal tracking through truncation-robust heatmaps,

    B. F. Tayloret al., “Biomechanical golf swing analysis using markerless three-dimensional skeletal tracking through truncation-robust heatmaps,” Ph.D. dissertation, Massachusetts Institute of Technology, 2025

  9. [9]

    Multi-camera calibration with pattern rigs, including for non-overlapping cameras: Calico,

    A. Tabb, H. Medeiros, M. J. Feldmann, and T. T. Santos, “Multi-camera calibration with pattern rigs, including for non-overlapping cameras: Calico,”arXiv preprint arXiv:1903.06811, 2019

  10. [10]

    The double sphere camera model,

    V . Usenko, N. Demmel, and D. Cremers, “The double sphere camera model,” in2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 552–560

  11. [11]

    A new calibration technique for multi-camera systems of limited overlapping field-of-views,

    Z. Xing, J. Yu, and Y . Ma, “A new calibration technique for multi-camera systems of limited overlapping field-of-views,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5892–5899

  12. [12]

    Extrinsic camera calibration from a moving person,

    S.-E. Lee, K. Shibata, S. Nonaka, S. Nobuhara, and K. Nishino, “Extrinsic camera calibration from a moving person,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 344–10 351, 2022

  13. [13]

    Wide-baseline multi- camera calibration using person re-identification,

    Y . Xu, Y .-J. Li, X. Weng, and K. Kitani, “Wide-baseline multi- camera calibration using person re-identification,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 134–13 143

  14. [14]

    Robust piecewise-planar 3d reconstruction and completion from large-scale unstructured point data,

    A.-L. Chauve, P. Labatut, and J.-P. Pons, “Robust piecewise-planar 3d reconstruction and completion from large-scale unstructured point data,” in2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 1261–1268

  15. [15]

    Robust extrinsic calibration of multiple rgb-d cameras with body tracking and feature matching,

    S.-h. Lee, J. Yoo, M. Park, J. Kim, and S. Kwon, “Robust extrinsic calibration of multiple rgb-d cameras with body tracking and feature matching,”Sensors, vol. 21, no. 3, p. 1013, 2021

  16. [16]

    Yowo: You only walk once to jointly map an indoor scene and register ceiling-mounted cameras,

    F. Yang, S. Yamao, I. Kusajima, A. Moteki, S. Masui, and S. Jiang, “Yowo: You only walk once to jointly map an indoor scene and register ceiling-mounted cameras,”IEEE Transactions on Circuits and Systems for Video Technology, 2024

  17. [17]

    Calib3r: A 3d foundation model for multi-camera to robot calibration and 3d metric-scaled scene reconstruction,

    D. Allegro, M. Terreran, and S. Ghidoni, “Calib3r: A 3d foundation model for multi-camera to robot calibration and 3d metric-scaled scene reconstruction,”arXiv preprint arXiv:2509.08813, 2025

  18. [18]

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    N. Keetha, N. M ¨uller, J. Sch ¨onberger, L. Porzi, Y . Zhang, T. Fischer, A. Knapitsch, D. Zauss, E. Weber, N. Antuneset al., “Mapanything: Universal feed-forward metric 3d reconstruction,”arXiv preprint arXiv:2509.13414, 2025

  19. [19]

    ewand: An extrinsic calibration framework for wide baseline frame-based and event-based camera systems,

    T. Gossard, A. Ziegler, L. Kolmar, J. Tebbe, and A. Zell, “ewand: An extrinsic calibration framework for wide baseline frame-based and event-based camera systems,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 14 534–14 540

  20. [20]

    Charuco Board-Based Omni- directional Camera Calibration Method,

    D.-Y . Kim, J.-H. Kim, and K.-T. Kim, “Charuco Board-Based Omni- directional Camera Calibration Method,”Sensors, vol. 18, no. 12, p. 421, 2018

  21. [21]

    Caliscope: Gui based multicamera calibration and motion tracking,

    D. Prible, “Caliscope: Gui based multicamera calibration and motion tracking,”Journal of Open Source Software, vol. 9, no. 102, p. 7155, 2024

  22. [22]

    Skeleton-based continuous extrinsic calibration of multiple rgb-d kinect cameras,

    K. Desai, B. Prabhakaran, and S. Raghuraman, “Skeleton-based continuous extrinsic calibration of multiple rgb-d kinect cameras,” in Proceedings of the 9th ACM multimedia systems conference, 2018, pp. 250–257

  23. [23]

    Vggt: Visual geometry grounded transformer,

    J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5294–5306

  24. [24]

    Reconstructing people, places, and cameras,

    L. M ¨uller, H. Choi, A. Zhang, B. Yi, J. Malik, and A. Kanazawa, “Reconstructing people, places, and cameras,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 21 948– 21 958

  25. [25]

    Kineo: Calibration-free metric motion capture from sparse rgb cameras,

    C. Javerliat, P. Raimbaud, and G. Lavou ´e, “Kineo: Calibration-free metric motion capture from sparse rgb cameras,”arXiv preprint arXiv:2510.24464, 2025

  26. [26]

    Fast automatic camera network calibration through human mesh recovery,

    N. Garau, F. G. De Natale, and N. Conci, “Fast automatic camera network calibration through human mesh recovery,”Journal of Real- Time Image Processing, vol. 17, no. 6, pp. 1757–1768, 2020

  27. [27]

    Spatiotemporal multi-camera calibration using freely moving people,

    S.-E. Lee, K. Nishino, and S. Nobuhara, “Spatiotemporal multi-camera calibration using freely moving people,”IEEE Robotics and Automation Letters, 2025

  28. [28]

    Global structure-from-motion revisited,

    L. Pan, D. Bar ´ath, M. Pollefeys, and J. L. Sch ¨onberger, “Global structure-from-motion revisited,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 58–77

  29. [29]

    Online marker-free extrinsic camera calibration using person keypoint detections,

    B. P ¨atzold, S. Bultmann, and S. Behnke, “Online marker-free extrinsic camera calibration using person keypoint detections,” inDAGM German Conference on Pattern Recognition. Springer, 2022, pp. 300–316

  30. [30]

    A device for capturing inward-looking spherical light fields,

    Q. Bols ´ee, W. Darwish, D. Bonatto, G. Lafruit, and A. Munteanu, “A device for capturing inward-looking spherical light fields,” in2020 International Conference on 3D Immersion (IC3D). IEEE, 2020, pp. 1–5

  31. [31]

    A survey on video action recognition in sports: Datasets, methods and applications,

    F. Wu, Q. Wang, J. Bian, N. Ding, F. Lu, J. Cheng, D. Dou, and H. Xiong, “A survey on video action recognition in sports: Datasets, methods and applications,”IEEE Transactions on Multimedia, vol. 25, pp. 7943–7966, 2022

  32. [32]

    Robust self-supervised extrinsic self-calibration,

    T. Kanai, I. Vasiljevic, V . Guizilini, A. Gaidon, and R. Ambrus, “Robust self-supervised extrinsic self-calibration,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 1932–1939

  33. [33]

    Pricosa: High-precision 3d camera calibration with non-overlapping field of views,

    O. Kedilioglu, T. T. Nova, M. Landesberger, L. Wang, M. Hofmann, J. Franke, and S. Reitelsh ¨ofer, “Pricosa: High-precision 3d camera calibration with non-overlapping field of views,” inProceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications-(V olume 2). SciTePress, 2025, pp. 801–809

  34. [34]

    A comparison between 2d plate calibration and wand calibration for 3d kinematic systems,

    T. Pribani ´c, S. Peharec, and V . Medved, “A comparison between 2d plate calibration and wand calibration for 3d kinematic systems,” Kinesiology, vol. 41, no. 2., pp. 147–155, 2009

  35. [35]

    Mc-calib: A generic and robust calibration toolbox for multi-camera systems,

    F. Rameau, J. Park, O. Bailo, and I. S. Kweon, “Mc-calib: A generic and robust calibration toolbox for multi-camera systems,”Computer Vision and Image Understanding, vol. 217, p. 103353, 2022

  36. [36]

    Multi-camera extrinsic calibration for real-time tracking in large outdoor environments,

    P. Tripicchio, S. D’Avella, G. Camacho-Gonzalez, L. Landolfi, G. Baris, C. A. Avizzano, and A. Filippeschi, “Multi-camera extrinsic calibration for real-time tracking in large outdoor environments,”Journal of Sensor and Actuator Networks, vol. 11, no. 3, p. 40, 2022

  37. [37]

    Multi-camera calibration using far-range dual-led wand and near-range chessboard fused in bundle adjustment,

    P. Jatesiktat, G. M. Lim, and W. T. Ang, “Multi-camera calibration using far-range dual-led wand and near-range chessboard fused in bundle adjustment,”Sensors, vol. 24, no. 23, p. 7416, 2024

  38. [38]

    Enhanced three-axis frame and wand-based multi-camera calibration method using adaptive iteratively reweighted least squares and comprehensive error integration,

    O. Yuhai, Y . Cho, A. Choi, and J. H. Mun, “Enhanced three-axis frame and wand-based multi-camera calibration method using adaptive iteratively reweighted least squares and comprehensive error integration,” inPhotonics, vol. 11, no. 9. MDPI, 2024, p. 867

  39. [39]

    Multicamera rig calibration by double-sided thick checkerboard,

    M. Marcon, A. Sarti, and S. Tubaro, “Multicamera rig calibration by double-sided thick checkerboard,”IET Computer Vision, vol. 11, no. 6, pp. 448–454, 2017

  40. [40]

    Caltag: High precision fiducial markers for camera calibration

    B. Atcheson, F. Heide, and W. Heidrich, “Caltag: High precision fiducial markers for camera calibration.” inVMV, vol. 10, 2010, pp. 41–48

  41. [41]

    Calibrating multiple cameras with non-overlapping views using coded checkerboard targets,

    T. Strauß, J. Ziegler, and J. Beck, “Calibrating multiple cameras with non-overlapping views using coded checkerboard targets,” in17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, 2014, pp. 2623–2628

  42. [42]

    Sports camera calibration via synthetic data,

    J. Chen and J. J. Little, “Sports camera calibration via synthetic data,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019, pp. 0–0. 9

  43. [43]

    Tvcalib: Camera calibration for sports field registration in soccer,

    J. Theiner and R. Ewerth, “Tvcalib: Camera calibration for sports field registration in soccer,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 1166–1175

  44. [44]

    Real-time camera pose estimation for sports fields,

    L. Citraro, P. M ´arquez-Neila, S. Savare, V . Jayaram, C. Dubout, F. Renaut, A. Hasfura, H. Ben Shitrit, and P. Fua, “Real-time camera pose estimation for sports fields,”Machine Vision and Applications, vol. 31, no. 3, p. 16, 2020

  45. [45]

    Openmmlab pose estimation toolbox and benchmark,

    M. Contributors, “Openmmlab pose estimation toolbox and benchmark,” https://github.com/open-mmlab/mmpose, 2020

  46. [46]

    Ultralytics yolov11,

    G. Jocher and J. Qiu, “Ultralytics yolov11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

  47. [47]

    Least-squares estimation of transformation parameters between two point patterns,

    S. Umeyama, “Least-squares estimation of transformation parameters between two point patterns,”IEEE Transactions on pattern analysis and machine intelligence, vol. 13, no. 4, pp. 376–380, 2002. 10