Pan-tilt-zoom SLAM for Sports Videos

James J. Little; Jianhui Chen; Jikai Lu

arxiv: 1907.08816 · v1 · pith:EVSGKYUKnew · submitted 2019-07-20 · 💻 cs.CV

Pan-tilt-zoom SLAM for Sports Videos

Jikai Lu , Jianhui Chen , James J. Little This is my paper

Pith reviewed 2026-05-24 18:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords pan-tilt-zoom cameraSLAMsports videocamera pose estimationray landmarksmoving object detectiononline mapping

0 comments

The pith

An online SLAM system uses rays as landmarks to track PTZ cameras in dynamic sports videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an online SLAM method built for pan-tilt-zoom cameras filming fast-paced sports such as basketball and soccer. It replaces point landmarks with rays to address the lack of depth data that comes from pure camera rotation and adds player detection to reduce interference from large moving foreground regions. The approach also includes a novel camera model for tracking and an online pan-tilt forest for building the map. Experiments on synthetic and real datasets are presented to show improved camera pose estimates compared with earlier techniques. A sympathetic reader would care because reliable real-time camera tracking in these settings supports automated analysis and production of live sports footage.

Core claim

The authors claim that treating rays as landmarks inside a pure-rotation camera model, together with an online pan-tilt forest and explicit moving-object detection, produces more accurate online pose estimates for PTZ cameras in sports videos than previous methods.

What carries the argument

Rays as landmarks inside a pure-rotation camera model that supplies direction without depth for mapping.

If this is right

The system runs online and therefore supports real-time applications during live sports broadcasts.
Player detection reduces the disruptive effect of large foreground regions on pose estimation.
Ray landmarks enable mapping even when the camera undergoes only rotation.
An online pan-tilt forest maintains the map structure as the camera moves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ray-based representation could be tested on other pure-rotation camera scenarios outside sports.
Coupling the pose estimates with separate player tracking pipelines might improve overall scene reconstruction.
The method opens a route for handling depth-less mapping in additional SLAM variants that encounter rapid rotations.

Load-bearing premise

That rays as landmarks overcome the missing depth information in pure-rotation cameras and that moving-object detection sufficiently mitigates foreground interference.

What would settle it

A test sequence of PTZ sports video with known ground-truth camera poses in which the estimated poses deviate beyond the error levels reported for competing methods.

Figures

Figures reproduced from arXiv: 1907.08816 by James J. Little, Jianhui Chen, Jikai Lu.

**Figure 2.** Figure 2: The coordinate system and ray landmarks. The camera pose is represented by pan, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Player detection. Left: keypoints without player detection; right: keypoints with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Synthetic image examples. Sequence Reprojection error (pix.) EKF-H EKF-PTZ (ours) Seq. ID Velocity Mean Median Max Mean Median Max 1 0.02 0.1 0.1 0.1 0.1 0.1 0.2 2 0.83 0.4 0.4 0.7 0.3 0.1 1.1 3 0.70 1.0 0.3 16.0 0.3 0.3 0.5 4 0.08 2.1 2.2 3.9 0.7 0.7 1.3 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Estimated camera trajectories of our method. (a) basketball; (b) soccer. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison with EKF-H. The left figure shows pan angle errors of [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

We present an online SLAM system specifically designed to track pan-tilt-zoom (PTZ) cameras in highly dynamic sports such as basketball and soccer games. In these games, PTZ cameras rotate very fast and players cover large image areas. To overcome these challenges, we propose to use a novel camera model for tracking and to use rays as landmarks in mapping. Rays overcome the missing depth in pure-rotation cameras. We also develop an online pan-tilt forest for mapping and introduce moving objects (players) detection to mitigate negative impacts from foreground objects. We test our method on both synthetic and real datasets. The experimental results show the superior performance of our method over previous methods for online PTZ camera pose estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts SLAM to fast PTZ cameras in sports via ray landmarks and player detection, but the superiority claim sits on unablated components with no numbers shown.

read the letter

The core idea here is using rays as landmarks to sidestep the depth problem in pure-rotation PTZ motion, plus an online pan-tilt forest and foreground player detection to keep the map clean in crowded game footage. That combination is the actual new piece; standard point-based SLAM does not directly apply when the camera only rotates and the scene is full of moving players. The work targets a narrow but practical setting in sports video analysis, and the abstract frames the method around those constraints rather than claiming broad SLAM advances. Credit for identifying the right failure modes in existing PTZ trackers and proposing targeted fixes. The soft spot is the evidence. The abstract states superior performance on synthetic and real data yet supplies no error numbers, no baseline tables, and no component ablations. Without those, it is impossible to tell whether the ray model or the player detection actually produces the reported gains or whether other implementation choices explain the difference. The stress-test concern lands: the load-bearing claims about rays overcoming depth loss and detection mitigating foreground are not isolated. A reader cannot yet judge if the central argument holds. This is the kind of applied paper that could interest people building sports analytics pipelines or PTZ calibration tools. It is narrow enough that most SLAM groups would skip it, but the specific integration is worth a look if the experiments can be strengthened. I would send it to review rather than desk-reject; the problem is real and the proposed mechanisms are coherent, but the current write-up needs quantitative support and ablations before it can be evaluated properly.

Referee Report

2 major / 1 minor

Summary. The paper presents an online SLAM system for tracking fast-moving PTZ cameras in dynamic sports videos (e.g., basketball, soccer). It proposes a novel camera model, ray-based landmarks to address missing depth in pure-rotation scenarios, an online pan-tilt forest for mapping, and moving-object detection to reduce foreground interference. Experiments on synthetic and real datasets are claimed to demonstrate superior performance over prior methods for online PTZ pose estimation.

Significance. If the performance claims hold with proper validation, the work addresses a practical gap in sports video analysis and broadcasting, where PTZ cameras operate under extreme rotation speeds and heavy foreground occlusion. The ray-landmark idea is a direct response to the pure-rotation depth problem and could be reusable; the online pan-tilt forest is a domain-specific mapping contribution.

major comments (2)

[Experiments] Experiments section: the central claim of 'superior performance' over previous PTZ methods is not supported by any reported quantitative metrics, error tables, baseline comparisons, or statistical tests in the manuscript description; without these, the assertion that ray landmarks and moving-object detection drive the gains cannot be evaluated.
[Method] Method (ray landmarks and moving-object detection): no ablation studies or component-wise error breakdowns are described that isolate whether rays overcome depth loss or whether foreground detection mitigates interference; full-system comparisons alone leave open the possibility that other factors (camera model or dataset) explain results.

minor comments (1)

[Abstract] Abstract and introduction repeat the performance claim without previewing any numerical results or dataset sizes, which weakens the reader's ability to assess scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper to strengthen the experimental presentation.

read point-by-point responses

Referee: [Experiments] Experiments section: the central claim of 'superior performance' over previous PTZ methods is not supported by any reported quantitative metrics, error tables, baseline comparisons, or statistical tests in the manuscript description; without these, the assertion that ray landmarks and moving-object detection drive the gains cannot be evaluated.

Authors: We agree that the current presentation of results does not sufficiently document the quantitative evidence. The revised manuscript will expand the experiments section to include explicit error tables, direct numerical comparisons against prior PTZ methods, and statistical tests on both the synthetic and real datasets. revision: yes
Referee: [Method] Method (ray landmarks and moving-object detection): no ablation studies or component-wise error breakdowns are described that isolate whether rays overcome depth loss or whether foreground detection mitigates interference; full-system comparisons alone leave open the possibility that other factors (camera model or dataset) explain results.

Authors: We acknowledge that component-wise analysis would clarify the individual contributions. The revision will add ablation studies that report error breakdowns when ray landmarks and moving-object detection are enabled or disabled independently. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external experimental comparison without self-referential reduction

full rationale

The provided abstract and context describe a PTZ SLAM system using a novel camera model, ray landmarks to address pure rotation, pan-tilt forest, and moving-object detection. No equations, parameter fits, or derivations are shown that reduce a claimed result to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems. The central claim is empirical superiority on synthetic and real datasets, which is externally falsifiable and does not collapse into a renaming, ansatz smuggling, or fitted-input prediction. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the central technical premise is the effectiveness of ray landmarks for depth-less pure-rotation cases, treated as a domain assumption.

axioms (1)

domain assumption Rays overcome the missing depth in pure-rotation cameras
Explicitly proposed in the abstract as the solution to depth absence.

pith-pipeline@v0.9.0 · 5643 in / 1028 out tokens · 22209 ms · 2026-05-24T18:53:09.996400+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

CodeSLAM - Learning a compact, optimisable representation for dense vi- sual SLAM

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, and Andrew J Davison. CodeSLAM - Learning a compact, optimisable representation for dense vi- sual SLAM. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018
[2]

Simultaneous lo- calization and mapping: A survey of current trends in autonomous driving

Guillaume Bresson, Zayed Alsayed, Li Yu, and Sébastien Glaser. Simultaneous lo- calization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent V ehicles, 20:1–1, 2017

work page 2017
[3]

Automatic panoramic image stitching using in- variant features

Matthew Brown and David G Lowe. Automatic panoramic image stitching using in- variant features. International Journal of Computer Vision (IJCV) , 74(1):59–73, 2007

work page 2007
[4]

Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

Tommaso Cavallari, Luca Bertinetto, Jishnu Mukhoti, Philip Torr, and Stuart Golodetz. Let’s take this online: Adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. arXiv preprint arXiv:1906.08744, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[5]

Mimicking human camera operators

Jianhui Chen and Peter Carr. Mimicking human camera operators. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

work page 2015
[6]

Sports camera calibration via synthetic data

Jianhui Chen and James J Little. Sports camera calibration via synthetic data. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019

work page 2019
[7]

A two-point method for PTZ camera calibration in sports

Jianhui Chen, Fangrui Zhu, and James J Little. A two-point method for PTZ camera calibration in sports. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018

work page 2018
[8]

ARTHuS: Adaptive real-time human segmentation in sports through online distillation

Anthony Cioppa, Adrien Deliege, Maxime Istasse, Christophe De Vleeschouwer, and Marc Van Droogenbroeck. ARTHuS: Adaptive real-time human segmentation in sports through online distillation. InIEEE Conference on Computer Vision and Pattern Recog- nition Workshops (CVPRW), 2019

work page 2019
[9]

Drift-free real- time sequential mosaicing

Javier Civera, Andrew J Davison, Juan A Magallón, and JMM Montiel. Drift-free real- time sequential mosaicing. International Journal of Computer Vision (IJCV) , 81(2): 128–137, 2009

work page 2009
[10]

Visual-inertial direct SLAM

Alejo Concha, Giuseppe Loianno, Vijay Kumar, and Javier Civera. Visual-inertial direct SLAM. In IEEE International Conference on Robotics and Automation (ICRA) , 2016

work page 2016
[11]

MonoSLAM: Real-time single camera SLAM

Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), (6):1052–1067, 2007

work page 2007
[12]

Exploiting distinctive visual landmark maps in pan–tilt–zoom camera networks

Alberto Del Bimbo, Fabrizio Dini, Giuseppe Lisanti, and Federico Pernici. Exploiting distinctive visual landmark maps in pan–tilt–zoom camera networks. Computer Vision and Image Understanding (CVIU), 114(6):611–623, 2010

work page 2010
[13]

Direct sparse odometry

Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 40(3):611–625, 2018. 12 LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS

work page 2018
[14]

Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

work page 1981
[15]

Live tracking and mapping from both general and rotation-only camera mo- tion

Steffen Gauglitz, Chris Sweeney, Jonathan Ventura, Matthew Turk, and Tobias Höllerer. Live tracking and mapping from both general and rotation-only camera mo- tion. In IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , 2012

work page 2012
[16]

Using line and ellipse features for rectiﬁcation of broadcast hockey video

Ankur Gupta, James J Little, and Robert J Woodham. Using line and ellipse features for rectiﬁcation of broadcast hockey video. InCanadian Conference on Computer and Robot Vision (CRV), 2011

work page 2011
[17]

Robust incremental rectiﬁcation of sports video sequences

Jean-Bernard Hayet, Justus Piater, and Jacques Verly. Robust incremental rectiﬁcation of sports video sequences. In British Machine Vision Conference (BMVC), 2004

work page 2004
[18]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV) , 2017

work page 2017
[19]

3D-TV production from conventional cameras for sports broadcast.IEEE Transactions on Broadcasting, 57(2):462–476, 2011

Adrian Hilton, Jean-Yves Guillemaut, Joe Kilner, Oliver Grau, and Graham Thomas. 3D-TV production from conventional cameras for sports broadcast.IEEE Transactions on Broadcasting, 57(2):462–476, 2011

work page 2011
[20]

Sports ﬁeld localization via deep structured models

Namdar Homayounfar, Sanja Fidler, and Raquel Urtasun. Sports ﬁeld localization via deep structured models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[21]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár. Panoptic segmentation. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2019

work page 2019
[22]

LATCH: learned arrangements of three patch codes

Gil Levi and Tal Hassner. LATCH: learned arrangements of three patch codes. InIEEE Winter Conference on Applications of Computer Vision (WACV), 2016

work page 2016
[23]

Continuous localization and mapping of a pan-tilt-zoom camera for wide area tracking

Giuseppe Lisanti, Iacopo Masi, Federico Pernici, and Alberto Del Bimbo. Continuous localization and mapping of a pan-tilt-zoom camera for wide area tracking. Machine Vision and Applications (MVA), 27(7):1071–1085, 2016

work page 2016
[24]

Real-time spherical mosaicing using whole image alignment

Steven Lovegrove and Andrew J Davison. Real-time spherical mosaicing using whole image alignment. In European Conference on Computer Vision (ECCV), 2010

work page 2010
[25]

Light cascaded convolutional neural networks for accurate player detection

Keyu Lu, Jianhui Chen, James J Little, and He Hangen. Light cascaded convolutional neural networks for accurate player detection. In British Machine Vision Conference (BMVC), 2017

work page 2017
[26]

Backtracking regression forests for accurate camera relocalization

Lili Meng, Jianhui Chen, Frederick Tung, James Little J., Julien Valentin, and Clarence Silva. Backtracking regression forests for accurate camera relocalization. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017

work page 2017
[27]

ORB-SLAM: a versatile and accurate monocular SLAM system

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics , 31 (5):1147–1163, 2015. LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS 13

work page 2015
[28]

Keep your eye on the puck: Automatic hockey videography

Hemanth Pidaparthy and James Elder. Keep your eye on the puck: Automatic hockey videography. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2019

work page 2019
[29]

Homography-based planar mapping and tracking for mobile phones

Christian Pirchheim and Gerhard Reitmayr. Homography-based planar mapping and tracking for mobile phones. In IEEE International Symposium on Mixed and Aug- mented Reality (ISMAR), 2011

work page 2011
[30]

Handling pure camera rotation in keyframe-based slam

Christian Pirchheim, Dieter Schmalstieg, and Gerhard Reitmayr. Handling pure camera rotation in keyframe-based slam. In IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2013

work page 2013
[31]

Unsupervised calibration of camera networks and virtual PTZ cameras

Horst Possegger, Matthias Rüther, Sabine Sternig, Thomas Mauthner, Manfred Klops- chitz, Peter M Roth, and Horst Bischof. Unsupervised calibration of camera networks and virtual PTZ cameras. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2012

work page 2012
[32]

Robust multi-view cam- era calibration for wide-baseline camera networks

Jens Puwein, Remo Ziegler, Julia V ogel, and Marc Pollefeys. Robust multi-view cam- era calibration for wide-baseline camera networks. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2011

work page 2011
[33]

Soccer on your tabletop

Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz. Soccer on your tabletop. In IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2018

work page 2018
[34]

Faster R-CNN: Towards real-time object detection with region proposal networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Infor- mation Processing Systems (NIPS), 2015

work page 2015
[35]

On- line random forests

Amir Saffari, Christian Leistner, Jakob Santner, Martin Godec, and Horst Bischof. On- line random forests. In IEEE International Conference on Computer Vision (ICCV) Workshops, 2009

work page 2009
[36]

Scene coordinate regression forests for camera relocalization in rgb-d images

Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene coordinate regression forests for camera relocalization in rgb-d images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

work page 2013
[37]

Pan–tilt–zoom camera calibration and high- resolution mosaic generation

Sudipta N Sinha and Marc Pollefeys. Pan–tilt–zoom camera calibration and high- resolution mosaic generation. Computer Vision and Image Understanding (CVIU) , 103(3):170–183, 2006

work page 2006
[38]

Improving RGB-D SLAM in dynamic environments: A motion removal approach

Yuxiang Sun, Ming Liu, and Max Q-H Meng. Improving RGB-D SLAM in dynamic environments: A motion removal approach. Robotics and Autonomous Systems (RAS) , 89:110–122, 2017

work page 2017
[39]

CNN-SLAM: Real- time dense monocular slam with learned depth prediction

Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. CNN-SLAM: Real- time dense monocular slam with learned depth prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[40]

Real-time camera tracking using sports pitch markings

Graham Thomas. Real-time camera tracking using sports pitch markings. Journal of Real-Time Image Processing, 2(2-3):117–132, 2007. 14 LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS

work page 2007
[41]

Computer vision for sports: Current applications and research topics

Graham Thomas, Rikke Gade, Thomas B Moeslund, Peter Carr, and Adrian Hilton. Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding (CVIU), 159:3–18, 2017

work page 2017
[42]

Bun- dle adjustment – a modern synthesis

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. Bun- dle adjustment – a modern synthesis. In International workshop on vision algorithms , 1999

work page 1999
[43]

Simultaneous localization and mapping with de- tection and tracking of moving objects

Chieh-Chih Wang and Chuck Thorpe. Simultaneous localization and mapping with de- tection and tracking of moving objects. In IEEE International Conference on Robotics and Automation (ICRA), 2002

work page 2002
[44]

Pop-up SLAM: Seman- tic monocular plane SLAM for low-texture environments

Shichao Yang, Yu Song, Michael Kaess, and Sebastian Scherer. Pop-up SLAM: Seman- tic monocular plane SLAM for low-texture environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2016

work page 2016
[45]

Keyframe-based monocular SLAM: design, survey, and future directions

Georges Younes, Daniel Asmar, Elie Shammas, and John Zelek. Keyframe-based monocular SLAM: design, survey, and future directions. Robotics and Autonomous Systems, 98:67–88, 2017

work page 2017
[46]

SceneCode: Monocular dense semantic reconstruction using learned encoded scene representations

Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, and Andrew J Davison. SceneCode: Monocular dense semantic reconstruction using learned encoded scene representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[47]

Detect-SLAM: Mak- ing object detection and SLAM mutually beneﬁcial

Fangwei Zhong, Sheng Wang, Ziqi Zhang, and Yizhou Wang. Detect-SLAM: Mak- ing object detection and SLAM mutually beneﬁcial. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018

work page 2018

[1] [1]

CodeSLAM - Learning a compact, optimisable representation for dense vi- sual SLAM

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, and Andrew J Davison. CodeSLAM - Learning a compact, optimisable representation for dense vi- sual SLAM. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018

[2] [2]

Simultaneous lo- calization and mapping: A survey of current trends in autonomous driving

Guillaume Bresson, Zayed Alsayed, Li Yu, and Sébastien Glaser. Simultaneous lo- calization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent V ehicles, 20:1–1, 2017

work page 2017

[3] [3]

Automatic panoramic image stitching using in- variant features

Matthew Brown and David G Lowe. Automatic panoramic image stitching using in- variant features. International Journal of Computer Vision (IJCV) , 74(1):59–73, 2007

work page 2007

[4] [4]

Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

Tommaso Cavallari, Luca Bertinetto, Jishnu Mukhoti, Philip Torr, and Stuart Golodetz. Let’s take this online: Adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. arXiv preprint arXiv:1906.08744, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[5] [5]

Mimicking human camera operators

Jianhui Chen and Peter Carr. Mimicking human camera operators. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

work page 2015

[6] [6]

Sports camera calibration via synthetic data

Jianhui Chen and James J Little. Sports camera calibration via synthetic data. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019

work page 2019

[7] [7]

A two-point method for PTZ camera calibration in sports

Jianhui Chen, Fangrui Zhu, and James J Little. A two-point method for PTZ camera calibration in sports. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018

work page 2018

[8] [8]

ARTHuS: Adaptive real-time human segmentation in sports through online distillation

Anthony Cioppa, Adrien Deliege, Maxime Istasse, Christophe De Vleeschouwer, and Marc Van Droogenbroeck. ARTHuS: Adaptive real-time human segmentation in sports through online distillation. InIEEE Conference on Computer Vision and Pattern Recog- nition Workshops (CVPRW), 2019

work page 2019

[9] [9]

Drift-free real- time sequential mosaicing

Javier Civera, Andrew J Davison, Juan A Magallón, and JMM Montiel. Drift-free real- time sequential mosaicing. International Journal of Computer Vision (IJCV) , 81(2): 128–137, 2009

work page 2009

[10] [10]

Visual-inertial direct SLAM

Alejo Concha, Giuseppe Loianno, Vijay Kumar, and Javier Civera. Visual-inertial direct SLAM. In IEEE International Conference on Robotics and Automation (ICRA) , 2016

work page 2016

[11] [11]

MonoSLAM: Real-time single camera SLAM

Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), (6):1052–1067, 2007

work page 2007

[12] [12]

Exploiting distinctive visual landmark maps in pan–tilt–zoom camera networks

Alberto Del Bimbo, Fabrizio Dini, Giuseppe Lisanti, and Federico Pernici. Exploiting distinctive visual landmark maps in pan–tilt–zoom camera networks. Computer Vision and Image Understanding (CVIU), 114(6):611–623, 2010

work page 2010

[13] [13]

Direct sparse odometry

Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 40(3):611–625, 2018. 12 LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS

work page 2018

[14] [14]

Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

work page 1981

[15] [15]

Live tracking and mapping from both general and rotation-only camera mo- tion

Steffen Gauglitz, Chris Sweeney, Jonathan Ventura, Matthew Turk, and Tobias Höllerer. Live tracking and mapping from both general and rotation-only camera mo- tion. In IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , 2012

work page 2012

[16] [16]

Using line and ellipse features for rectiﬁcation of broadcast hockey video

Ankur Gupta, James J Little, and Robert J Woodham. Using line and ellipse features for rectiﬁcation of broadcast hockey video. InCanadian Conference on Computer and Robot Vision (CRV), 2011

work page 2011

[17] [17]

Robust incremental rectiﬁcation of sports video sequences

Jean-Bernard Hayet, Justus Piater, and Jacques Verly. Robust incremental rectiﬁcation of sports video sequences. In British Machine Vision Conference (BMVC), 2004

work page 2004

[18] [18]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV) , 2017

work page 2017

[19] [19]

3D-TV production from conventional cameras for sports broadcast.IEEE Transactions on Broadcasting, 57(2):462–476, 2011

Adrian Hilton, Jean-Yves Guillemaut, Joe Kilner, Oliver Grau, and Graham Thomas. 3D-TV production from conventional cameras for sports broadcast.IEEE Transactions on Broadcasting, 57(2):462–476, 2011

work page 2011

[20] [20]

Sports ﬁeld localization via deep structured models

Namdar Homayounfar, Sanja Fidler, and Raquel Urtasun. Sports ﬁeld localization via deep structured models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[21] [21]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár. Panoptic segmentation. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2019

work page 2019

[22] [22]

LATCH: learned arrangements of three patch codes

Gil Levi and Tal Hassner. LATCH: learned arrangements of three patch codes. InIEEE Winter Conference on Applications of Computer Vision (WACV), 2016

work page 2016

[23] [23]

Continuous localization and mapping of a pan-tilt-zoom camera for wide area tracking

Giuseppe Lisanti, Iacopo Masi, Federico Pernici, and Alberto Del Bimbo. Continuous localization and mapping of a pan-tilt-zoom camera for wide area tracking. Machine Vision and Applications (MVA), 27(7):1071–1085, 2016

work page 2016

[24] [24]

Real-time spherical mosaicing using whole image alignment

Steven Lovegrove and Andrew J Davison. Real-time spherical mosaicing using whole image alignment. In European Conference on Computer Vision (ECCV), 2010

work page 2010

[25] [25]

Light cascaded convolutional neural networks for accurate player detection

Keyu Lu, Jianhui Chen, James J Little, and He Hangen. Light cascaded convolutional neural networks for accurate player detection. In British Machine Vision Conference (BMVC), 2017

work page 2017

[26] [26]

Backtracking regression forests for accurate camera relocalization

Lili Meng, Jianhui Chen, Frederick Tung, James Little J., Julien Valentin, and Clarence Silva. Backtracking regression forests for accurate camera relocalization. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017

work page 2017

[27] [27]

ORB-SLAM: a versatile and accurate monocular SLAM system

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics , 31 (5):1147–1163, 2015. LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS 13

work page 2015

[28] [28]

Keep your eye on the puck: Automatic hockey videography

Hemanth Pidaparthy and James Elder. Keep your eye on the puck: Automatic hockey videography. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2019

work page 2019

[29] [29]

Homography-based planar mapping and tracking for mobile phones

Christian Pirchheim and Gerhard Reitmayr. Homography-based planar mapping and tracking for mobile phones. In IEEE International Symposium on Mixed and Aug- mented Reality (ISMAR), 2011

work page 2011

[30] [30]

Handling pure camera rotation in keyframe-based slam

Christian Pirchheim, Dieter Schmalstieg, and Gerhard Reitmayr. Handling pure camera rotation in keyframe-based slam. In IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2013

work page 2013

[31] [31]

Unsupervised calibration of camera networks and virtual PTZ cameras

Horst Possegger, Matthias Rüther, Sabine Sternig, Thomas Mauthner, Manfred Klops- chitz, Peter M Roth, and Horst Bischof. Unsupervised calibration of camera networks and virtual PTZ cameras. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2012

work page 2012

[32] [32]

Robust multi-view cam- era calibration for wide-baseline camera networks

Jens Puwein, Remo Ziegler, Julia V ogel, and Marc Pollefeys. Robust multi-view cam- era calibration for wide-baseline camera networks. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2011

work page 2011

[33] [33]

Soccer on your tabletop

Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz. Soccer on your tabletop. In IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2018

work page 2018

[34] [34]

Faster R-CNN: Towards real-time object detection with region proposal networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Infor- mation Processing Systems (NIPS), 2015

work page 2015

[35] [35]

On- line random forests

Amir Saffari, Christian Leistner, Jakob Santner, Martin Godec, and Horst Bischof. On- line random forests. In IEEE International Conference on Computer Vision (ICCV) Workshops, 2009

work page 2009

[36] [36]

Scene coordinate regression forests for camera relocalization in rgb-d images

Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene coordinate regression forests for camera relocalization in rgb-d images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

work page 2013

[37] [37]

Pan–tilt–zoom camera calibration and high- resolution mosaic generation

Sudipta N Sinha and Marc Pollefeys. Pan–tilt–zoom camera calibration and high- resolution mosaic generation. Computer Vision and Image Understanding (CVIU) , 103(3):170–183, 2006

work page 2006

[38] [38]

Improving RGB-D SLAM in dynamic environments: A motion removal approach

Yuxiang Sun, Ming Liu, and Max Q-H Meng. Improving RGB-D SLAM in dynamic environments: A motion removal approach. Robotics and Autonomous Systems (RAS) , 89:110–122, 2017

work page 2017

[39] [39]

CNN-SLAM: Real- time dense monocular slam with learned depth prediction

Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. CNN-SLAM: Real- time dense monocular slam with learned depth prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[40] [40]

Real-time camera tracking using sports pitch markings

Graham Thomas. Real-time camera tracking using sports pitch markings. Journal of Real-Time Image Processing, 2(2-3):117–132, 2007. 14 LU, CHEN AND LITTLE: PAN-TILT-ZOOM SLAM FOR SPORTS VIDEOS

work page 2007

[41] [41]

Computer vision for sports: Current applications and research topics

Graham Thomas, Rikke Gade, Thomas B Moeslund, Peter Carr, and Adrian Hilton. Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding (CVIU), 159:3–18, 2017

work page 2017

[42] [42]

Bun- dle adjustment – a modern synthesis

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. Bun- dle adjustment – a modern synthesis. In International workshop on vision algorithms , 1999

work page 1999

[43] [43]

Simultaneous localization and mapping with de- tection and tracking of moving objects

Chieh-Chih Wang and Chuck Thorpe. Simultaneous localization and mapping with de- tection and tracking of moving objects. In IEEE International Conference on Robotics and Automation (ICRA), 2002

work page 2002

[44] [44]

Pop-up SLAM: Seman- tic monocular plane SLAM for low-texture environments

Shichao Yang, Yu Song, Michael Kaess, and Sebastian Scherer. Pop-up SLAM: Seman- tic monocular plane SLAM for low-texture environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2016

work page 2016

[45] [45]

Keyframe-based monocular SLAM: design, survey, and future directions

Georges Younes, Daniel Asmar, Elie Shammas, and John Zelek. Keyframe-based monocular SLAM: design, survey, and future directions. Robotics and Autonomous Systems, 98:67–88, 2017

work page 2017

[46] [46]

SceneCode: Monocular dense semantic reconstruction using learned encoded scene representations

Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, and Andrew J Davison. SceneCode: Monocular dense semantic reconstruction using learned encoded scene representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019

[47] [47]

Detect-SLAM: Mak- ing object detection and SLAM mutually beneﬁcial

Fangwei Zhong, Sheng Wang, Ziqi Zhang, and Yizhou Wang. Detect-SLAM: Mak- ing object detection and SLAM mutually beneﬁcial. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018

work page 2018