pith. machine review for the scientific record. sign in

arxiv: 2605.01234 · v1 · submitted 2026-05-02 · 💻 cs.CV

Recognition: unknown

TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:11 UTC · model grok-4.3

classification 💻 cs.CV
keywords table tennis4D reconstructionmonocular videoball trajectorysports dataset3D spin estimationshot segmentationbroadcast video
0
0 comments X

The pith

Lifting 2D ball tracks to 3D before segmentation enables reliable table tennis reconstruction from monocular videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a pipeline and accompanying dataset that produces over 140 hours of high-fidelity 4D table tennis reconstructions, including 3D ball positions, spin, player meshes, and camera calibrations, all derived from ordinary broadcast videos. The central move is to lift the entire unsegmented 2D ball track to 3D first via a learned network that also infers spin, after which time segmentation into shots becomes reliable. A reader would care because this supplies precise, large-scale data for virtual replays, player performance analysis, and training robotic systems on real competitive play, where earlier approaches could not handle occlusions or changing viewpoints in general footage.

Core claim

The paper claims that inverting the usual order—lifting the full unsegmented 2D ball detections to 3D trajectories and spin estimates first, then performing segmentation—yields accurate 3D ball paths and enables reconstruction of complete table tennis gameplay from monocular broadcast videos, including cases of high occlusion, and produces a multimodal dataset with time segmentation, 3D human meshes, and calibrations that supports downstream tasks such as racket impact estimation and generative modeling of rallies.

What carries the argument

The learned lifting network that converts the entire unsegmented 2D ball track into 3D positions and spin estimates, which then supports reliable time segmentation and full 4D reconstruction.

If this is right

  • Racket pose and velocity at impact can be estimated directly from the reconstructed 3D trajectories.
  • Generative models of competitive rallies can be trained on the high-fidelity 4D data.
  • Virtual replays and detailed player analysis become feasible using existing broadcast footage.
  • Gameplay reconstruction succeeds on general-view videos where 2D-based segmentation previously failed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The lift-first order may extend to other fast ball sports with frequent occlusions, such as tennis or volleyball.
  • Inferred spin values could support quantitative studies of technique differences across players.
  • Large 4D sports datasets of this kind could accelerate simulation-based training for robotic athletes.

Load-bearing premise

The learned lifting network can produce accurate 3D ball positions and spin from noisy 2D detections even under high occlusion and varied camera viewpoints.

What would settle it

A multi-view capture of a real match with known 3D ball ground truth where the single-video pipeline outputs positions that deviate substantially or produces incorrect shot segmentations during occluded periods.

Figures

Figures reproduced from arXiv: 2605.01234 by Daniel Kienzle, Dvij Kalaria, Nima Rahmanian, Rainer Lienhart, Shankar Sastry, Thomas Gossard.

Figure 1
Figure 1. Figure 1: “Lift-First Pipeline" to generate TT4D, a massive multimodal 140+ hour dataset that recovers camera parameters, 3D view at source ↗
Figure 2
Figure 2. Figure 2: A visual outline of our proposed Lift-First Pipeline. Instead of depending on challenging and noisy temporal view at source ↗
Figure 3
Figure 3. Figure 3: Distributions of the two quality metrics used in our view at source ↗
Figure 4
Figure 4. Figure 4: Ball position densities for the TT4D dataset. The view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of ball spin strength per spin category. view at source ↗
Figure 6
Figure 6. Figure 6: Reconstructed 3D trajectory. Even when the 2D view at source ↗
Figure 8
Figure 8. Figure 8: (Top) Physical plausibility: The distribution of max view at source ↗
read the original abstract

We present TT4D, a large-scale, high-fidelity table tennis dataset. It provides $140+$ hours of reconstructed singles and doubles gameplay from monocular broadcast videos, featuring multimodal annotations like high-quality camera calibrations, precise 3D ball positions, ball spin, time segmentation, and 3D human meshes over time. This rich data provides a new foundation for virtual replay, in-depth player analysis, and robot learning. The dataset's combination of scale and precision is achieved through a novel reconstruction pipeline. Prior methods first partition a game sequence into individual shot segments based on the 2D ball track, and only then attempt reconstruction. However, 2D-based time segmentation collapses under occlusion and varied camera viewpoints, preventing reliable reconstruction. We invert this paradigm by first lifting the entire unsegmented 2D ball track to 3D through a learned lifting network. This 3D trajectory then allows us to reliably perform time segmentation. The learned lifting network also infers the ball's spin, handles unreliable ball detections, and successfully reconstructs the ball trajectory in cases of high occlusion. This lift-first design is necessary, as our pipeline is the only method capable of reconstructing table tennis gameplay from general-view broadcast monocular videos. We demonstrate the dataset's fidelity through two downstream tasks: estimating the racket's pose \& velocity at impact, and training a generative model of competitive rallies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents TT4D, a large-scale dataset of over 140 hours of 4D table tennis reconstructions from monocular broadcast videos, with multimodal annotations including camera calibrations, 3D ball positions, ball spin, time segmentation, and 3D human meshes. It introduces a lift-first pipeline that uses a learned network to lift unsegmented 2D ball tracks to 3D before performing time segmentation, spin estimation, and handling occlusions, claiming this is the only approach that works for general-view broadcast videos. The dataset is validated indirectly through two downstream tasks: racket pose and velocity estimation at impact, and training a generative model of competitive rallies.

Significance. If the reconstruction fidelity holds, the work would deliver a substantial new resource for computer vision in sports, supporting virtual replay, player analysis, and robot learning applications. The scale and precision of the annotations represent a clear advance over existing table tennis datasets, and the empirical, data-driven pipeline avoids parameter-fitting circularity.

major comments (2)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The central claim that the learned lifting network reliably produces accurate 3D ball positions and spin from noisy 2D detections under occlusion and viewpoint variation lacks any quantitative support such as 3D position RMSE, spin error, or ablation on occlusion levels. This directly undermines the assertion that the lift-first design is necessary and that the pipeline is the only viable method for general broadcast videos.
  2. [Methods] Methods section: No details are provided on the lifting network architecture, training procedure, loss functions, or dataset splits used to learn 3D lifting and spin inference, making it impossible to assess robustness to the weakest assumption of reliable performance on held-out broadcast sequences.
minor comments (2)
  1. [Abstract] The abstract claims '140+ hours' but does not report the exact number of videos, games, or total frames, which would better contextualize the dataset scale.
  2. [Figures] Figure captions and text could more explicitly distinguish between qualitative visualizations of the pipeline and any indirect evidence from downstream tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for strengthening the presentation of our lift-first pipeline and its validation. We address each major comment below and commit to revisions that improve clarity without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The central claim that the learned lifting network reliably produces accurate 3D ball positions and spin from noisy 2D detections under occlusion and viewpoint variation lacks any quantitative support such as 3D position RMSE, spin error, or ablation on occlusion levels. This directly undermines the assertion that the lift-first design is necessary and that the pipeline is the only viable method for general broadcast videos.

    Authors: We agree that direct quantitative metrics would provide stronger support for the claims regarding the lifting network. The current manuscript demonstrates fidelity indirectly via two downstream tasks (racket pose/velocity estimation at impact and training a generative model of rallies), which rely on the accuracy of the 3D reconstructions. However, to directly address this point, we will add quantitative evaluations in the revised manuscript, including 3D position RMSE, spin estimation errors, and ablations on occlusion levels using available ground-truth subsets from controlled multi-view captures. These additions will be placed in the Evaluation section and referenced in the abstract to better justify the lift-first design. revision: yes

  2. Referee: [Methods] Methods section: No details are provided on the lifting network architecture, training procedure, loss functions, or dataset splits used to learn 3D lifting and spin inference, making it impossible to assess robustness to the weakest assumption of reliable performance on held-out broadcast sequences.

    Authors: We acknowledge this omission and will expand the Methods section in the revision to include complete details on the lifting network architecture, training procedure, loss functions for 3D lifting and spin inference, and the dataset splits used. This will enable assessment of robustness on held-out sequences and improve reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline without derivations or self-referential fits

full rationale

The paper describes a data-driven pipeline that trains a lifting network on 2D-to-3D ball data and then uses the resulting 3D trajectories for downstream segmentation and reconstruction. No equations, uniqueness theorems, or first-principles derivations are presented that reduce to fitted parameters or self-citations by construction. The claim that 2D segmentation fails under occlusion is an external empirical observation, not a self-defined loop, and the lift-first ordering is justified by that observation rather than by re-using the network's own outputs as its inputs. The work remains self-contained as an empirical dataset and pipeline contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides insufficient technical detail to enumerate free parameters or specific axioms; the approach implicitly relies on standard computer vision assumptions for 3D lifting from 2D tracks.

axioms (1)
  • standard math Standard pinhole camera model and basic ball physics hold for lifting 2D detections to 3D trajectories.
    Implicit in any monocular 3D ball reconstruction method.

pith-pipeline@v0.9.0 · 5573 in / 1315 out tokens · 28485 ms · 2026-05-09T15:11:37.934537+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Joel A E Andersson, Joris Gillis, Greg Horn, James B Rawlings, and Moritz Diehl

  2. [2]

    CasADi – A software framework for nonlinear optimization and optimal control.Mathematical Programming Computation11, 1 (2019), 1–36

  3. [3]

    Joao Pedro Araujo, Yanjie Ze, Pei Xu, Jiajun Wu, and C Karen Liu. 2026. Retar- geting matters: General motion retargeting for humanoid motion tracking. In IEEE International Conference on Robotics and Automation (ICRA)

  4. [4]

    Jiang Bian, Xuhong Li, Tao Wang, Qingzhong Wang, Jun Huang, Chen Liu, Jun Zhao, Feixiang Lu, Dejing Dou, and Haoyi Xiong. 2024. P2ANet: a large-scale benchmark for dense action detection from table tennis match broadcasting videos.ACM Transactions on Multimedia Computing, Communications and Appli- cations20, 4 (2024), 1–23

  5. [5]

    Yu-Jou Chen and Yu-Shuen Wang. 2024. TrackNetV3: Enhancing ShuttleCock Tracking with Augmentations and Trajectory Rectification. InProceedings of the 5th ACM International Conference on Multimedia in Asia(Tainan, Taiwan) (MMAsia ’23). Article 1, 7 pages

  6. [6]

    Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. 2025. PaddleOCR 3.0 Technical Report. arXiv:2507.05595 [cs.CV]

  7. [7]

    D’Ambrosio, Saminda Abeyruwan, Laura Graesser, et al

    David B. D’Ambrosio, Saminda Abeyruwan, Laura Graesser, et al. 2025. Achieving human level competitive robot table tennis. InIEEE International Conference on Robotics and Automation (ICRA). 74–82

  8. [8]

    Alexander Dittrich, Jan Schneider, Simon Guist, Nico Gürtler, Heiko Ott, Thomas Steinbrenner, Bernhard Schölkopf, and Dieter Büchler. 2023. AIMY: An Open- source Table Tennis Ball Launcher for Versatile and High-fidelity Trajectory Generation.IEEE International Conference on Robotics and Automation (ICRA)

  9. [9]

    Moritz Einfalt, Katja Ludwig, and Rainer Lienhart. 2023. Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

  10. [10]

    Morten Holck Ertner, Sofus Schou Konglevoll, Magnus Ibh, and Stella Graßhof

  11. [11]

    InProceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports

    SynthNet: Leveraging Synthetic Data for 3D Trajectory Estimation from Monocular Video. InProceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports. 51–58

  12. [12]

    Shankar Sastry

    Daniel Etaat, Dvij Kalaria, Nima Rahmanian, and S. Shankar Sastry. 2025. LATTE- MV: Learning to Anticipate Table Tennis Hits from Monocular Videos. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  13. [13]

    Yuta Fujihara, Tomoyasu Shimada, Xiangbo Kong, Ami Tanaka, Hiroki Nishikawa, and Hiroyuki Tomiyama. 2025. Stroke Classification in Table Tennis as a Multi-Label Classification Task with Two Labels Per Stroke.Sensors25, 3 (2025)

  14. [14]

    Yapeng Gao, Jonas Tebbe, Julian Krismer, and Andreas Zell. 2019. Markerless Racket Pose Detection and Stroke Classification Based on Stereo Vision for Table Tennis Robots. In2019 Third IEEE International Conference on Robotic Computing (IRC). 189–196

  15. [15]

    Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. 2023. Humans in 4D: Reconstructing and Tracking Humans with Transformers. InIEEE/CVF International Conference on Computer Vision (ICCV)

  16. [16]

    Thomas Gossard, Julian Krismer, Andreas Ziegler, Jonas Tebbe, and Andreas Zell. 2024. Table tennis ball spin estimation with an event camera. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 3347–3356

  17. [17]

    Thomas Gossard, Filip Radovic, Andreas Ziegler, and Andrea Zell. 2026. Blurball: Joint ball and motion blur estimation for table tennis ball tracking. InInternational Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

  18. [18]

    Thomas Gossard, Jonas Tebbe, Andreas Ziegler, and Andreas Zell. 2023. Spindoe: A ball spin estimation method for table tennis robot. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5744–5750

  19. [19]

    Thomas Gossard, Andreas Ziegler, and Andreas Zell. 2025. TT3D: Table Tennis 3D Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

  20. [20]

    Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141

  21. [21]

    Huang, I

    Y. Huang, I. Liao, C. Chen, T. İk, and W. Peng. 2019. TrackNet: A Deep Learning Network for Tracking High-speed and Tiny Objects in Sports Applications. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (A VSS). 1–8

  22. [22]

    Daniel Kienzle, Marco Kantonis, Robin Schön, and Rainer Lienhart. 2024. Seg- former++: Efficient token-merging strategies for high-resolution semantic seg- mentation. InIEEE International Conference on Multimedia Information Processing and Retrieval (MIPR)

  23. [23]

    Daniel Kienzle, Katja Ludwig, Julian Lorenz, Shin’ichi Satoh, and Rainer Lienhart

  24. [24]

    InIEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

    Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation. InIEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

  25. [25]

    Daniel Kienzle, Robin Schön, Rainer Lienhart, and Shin’ichi Satoh. 2025. Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer. InIEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

  26. [26]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. InInternational Conference on Learning Representations ICLR, Yoshua Bengio and Yann LeCun (Eds.)

  27. [27]

    Komorowski, G

    J. Komorowski, G. Kurzejamski, and G. Sarwas. 2019. DeepBall: Deep Neural- Network Ball Detector. InProceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 5: VISAPP. 297–304

  28. [28]

    Kaustubh Milind Kulkarni and Sucheth Shenoy. 2021. Table tennis stroke recog- nition using two-dimensional human pose estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4576–4584

  29. [29]

    Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, and Ye Yuan. 2025. GENMO: Generative Models for Human Motion Synthesis.arXiv preprint arXiv:2505.01425(2025)

  30. [30]

    Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yuman Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. 2025. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241(2025)

  31. [31]

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. 2023. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations

  32. [32]

    Chunfang Liu, Yoshikazu Hayakawa, and Akira Nakashima. 2012. Racket control and its experiments for robot playing table tennis. In2012 IEEE International Conference on Robotics and Biomimetics (ROBIO). 241–246

  33. [33]

    Liu and J

    P. Liu and J. Wang. 2022. MonoTrack: Shuttle Trajectory Reconstruction from Monocular Badminton Video. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 3512–3521

  34. [34]

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model.ACM Trans. Graphics (Proc. SIGGRAPH Asia)34, 6 (Oct. 2015), 248:1–248:16

  35. [35]

    Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. International Conference on Learning Representations(2019)

  36. [36]

    Katharina Muelling, Abdeslam Boularias, Betty Mohler, Bernhard Schölkopf, and Jan Peters. 2014. Learning strategies in table tennis using inverse reinforcement learning.Biol. Cybern.108, 5 (Oct. 2014), 603–619

  37. [37]

    Takuya Nakabayashi, Kyota Higa, Masahiro Yamaguchi, Ryo Fujiwara, and Hideo Saito. 2024. Event-based ball spin estimation in sports. InProceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports. 3367– 3375

  38. [38]

    Akira Nakashima, Yuki Ogawa, Yosuke Kobayashi, and Yoshikazu Hayakawa

  39. [39]

    InProceedings of the 2010 American Control Conference

    Modeling of rebound phenomenon of a rigid ball with friction and elastic effects. InProceedings of the 2010 American Control Conference. 1410–1415

  40. [40]

    Puntawat Ponglertnapakorn and Supasorn Suwajanakorn. 2025. Where is the ball: 3d ball trajectory estimation from 2d monocular tracking. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops. 6122–6131

  41. [41]

    Arjun Raj, Lei Wang, and Tom Gedeon. 2025. TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps. InICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  42. [42]

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. RoFormer: Enhanced transformer with Rotary Position Embedding. Neurocomputing568 (2024), 127063

  43. [43]

    Zhi Su, Bike Zhang, Nima Rahmanian, Yuman Gao, Qiayuan Liao, Caitlin Regan, Koushil Sreenath, and S Shankar Sastry. 2025. Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043 (2025)

  44. [44]

    N. Sun, Y. Lin, S. Chuang, T. Hsu, D. Yu, H. Chung, and T. İk. 2020. TrackNetV2: Efficient Shuttlecock Tracking Network. In2020 International Conference on Pervasive Artificial Intelligence (ICPAI). 86–91

  45. [45]

    Mingxing Tan and Quoc V Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. InProceedings of the 36th International Conference on Machine Learning (ICML), Vol. 97. 6105–6114

  46. [46]

    Shuhei Tarashima, Muhammad Abdul Haq, Yushan Wang, and Norio Tagawa

  47. [47]

    In 34th British Machine Vision Conference 2023, BMVC 2023

    Widely applicable strong baseline for sports ball detection and tracking. In 34th British Machine Vision Conference 2023, BMVC 2023

  48. [48]

    Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. InProceedings of the 31st International Conference on Neural Information Processing Systems. 1195–1204

  49. [49]

    Jonas Tebbe, Lukas Klamt, Yapeng Gao, and Andreas Zell. 2020. Spin Detection in Robotic Table Tennis. In2020 IEEE International Conference on Robotics and Automation (ICRA). 9694–9700

  50. [50]

    Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In2012 IEEE/RSJ International Conference on Intelligent Rahmanian, Kienzle, Gossard et al. Robots and Systems. 5026–5033

  51. [51]

    Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. 2024. Improving and gener- alizing flow-based generative models with minibatch optimal transport.Trans- actions on Machine Learning Research(2024)

  52. [52]

    Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. 2024. Simulation-Free Schrödinger Bridges via Score and Flow Matching.International Conference on Artificial Intelligence and Statistics(2024)

  53. [53]

    Rejin Varghese and Sambath M. 2024. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS). 1–6

  54. [54]

    Roman Voeikov, Nikolay Falaleev, and Ruslan Baikulov. 2020. TTNet: Real-time temporal and spatial video analysis of table tennis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 884–885

  55. [55]

    Andreas Wächter and Lorenz T. Biegler. 2006. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming106 (2006), 25–57

  56. [56]

    Qizhi Wang and Luyan Shi. 2013. Pose estimation based on PnP algorithm for the racket of table tennis robot. In2013 25th Chinese Control and Decision Conference (CCDC). 2642–2647

  57. [57]

    Zhikun Wang, Abdeslam Boularias, Katharina Mülling, Bernhard Schölkopf, and Jan Peters. 2017. Anticipatory action selection for human–robot table tennis. Artificial Intelligence247 (2017), 399–414. Special Issue on AI and Robotics

  58. [58]

    Zhou Wang, Alan Bovik, Hamid Sheikh, and Eero Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity.Image Processing, IEEE Transactions on13 (05 2004), 600–612

  59. [59]

    Wang, M Deisenroth, H

    Z. Wang, M Deisenroth, H. Ben Amor, D. Vogt, B. Schoelkopf, and J. Peters

  60. [60]

    In Proceedings of Robotics: Science and Systems (R:SS)

    Probabilistic Modeling of Human Movements for Intention Inference. In Proceedings of Robotics: Science and Systems (R:SS)

  61. [61]

    Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Rebecca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al . 2025. Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885 (2025)

  62. [62]

    Van Zandycke and C

    G. Van Zandycke and C. De Vleeschouwer. 2019. Real-Time CNN-based Seg- mentation Architecture for Ball Detection in a Single View Setup. InProceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports (MMSports ’19). 51–58

  63. [63]

    Zhikai Zhang, Haofei Lu, Yunrui Lian, Ziqing Chen, Yun Liu, Chenghuai Lin, Han Xue, Zicheng Zeng, Zekun Qi, Shaolin Zheng, et al. 2026. Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data.arXiv preprint arXiv:2603.12686(2026)

  64. [64]

    Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. Unet++: A Nested U-Net Architecture for Medical Image Segmenta- tion. InDeep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. 3–11. TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos TT4D: A Pipe...

  65. [65]

    near-zero

    Note that this stage may fail and produce clips that do not exhibit any gameplay. This is not a problem, however, since these clips are removed in the filtering stage. Duplicated Frame RemovalWhile processing online table-tennis footage, we observed that certain frames were duplicated within the video stream. This phenomenon typically arises when the fram...