Markerless Augmented Advertising for Sports Videos

Cambron Carter; Divyaa Ravichandran; Emmanuel Antonio Cuevas; Hallee E. Wong; Iris Fu; Iuliana Tabian; Osman Akar

arxiv: 1907.09394 · v1 · pith:TDKJDLYOnew · submitted 2019-07-22 · 💻 cs.CV

Markerless Augmented Advertising for Sports Videos

Hallee E. Wong , Osman Akar , Emmanuel Antonio Cuevas , Iuliana Tabian , Divyaa Ravichandran , Iris Fu , Cambron Carter This is my paper

Pith reviewed 2026-05-24 18:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords markerless augmented realityvideo augmentationsports videoshomography tracking3D scene representationadvertisement placementaugmented advertising

0 comments

The pith

An automated pipeline overlays advertisements in sports videos by building 3D scene models and applying homography tracking without markers or camera parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that markerless augmented advertising can be performed automatically in sports videos by identifying suitable textures, constructing a 3D representation of the scene, placing the ad within that model, projecting it back to each frame, and then tracking it across the clip. This process is designed to produce natural and perspective-correct results even under smooth camera motion or at shot boundaries. If the approach holds, ads could appear as part of the original footage rather than requiring separate commercial interruptions. A reader would care because the method removes the need for manual artist intervention or detailed camera calibration data during post-production.

Core claim

The paper claims that an automated video augmentation pipeline identifies textures of interest, builds a 3D representation of the scene, places the advertisement in 3D, projects it back onto the image plane, and uses homography-based shape-preserving tracking to achieve seamless and perspective-correct integration for the duration of a video clip, handling smooth camera motion and shot boundaries without camera intrinsics or markers.

What carries the argument

homography-based shape-preserving tracking applied after 3D advertisement placement and projection

If this is right

The advertisement remains aligned and natural-looking throughout the clip.
No skilled artist or advanced post-production editing tools are required.
Placement succeeds without knowledge of camera intrinsic parameters.
The system supports continuous viewing without separate commercial breaks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline structure could be tested on non-sports video with comparable camera motion patterns.
Integration costs for advertising in broadcast content might decrease if tracking proves reliable.
Extensions could explore handling of lighting changes or partial occlusions not addressed in the current clips.

Load-bearing premise

Homography tracking can maintain perspective-correct placement across video clips with smooth camera motion and shot boundaries even without camera intrinsics or markers.

What would settle it

A sports video sequence with abrupt camera movement or multiple shot changes in which the overlaid advertisement distorts or drifts from its intended surface position.

Figures

Figures reproduced from arXiv: 1907.09394 by Cambron Carter, Divyaa Ravichandran, Emmanuel Antonio Cuevas, Hallee E. Wong, Iris Fu, Iuliana Tabian, Osman Akar.

**Figure 2.** Figure 2: The input image of a baseball game is segmented by PSPNet’s ADE20K [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Segmented images and the SQS associated with the quality of the seg [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: An example of a crowd image and a inverse depth map visualization of [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Perspective correct asset placement with “unnatural” and “natural” ori [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of asset placement procedure using Fig. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Features (red points) to be tracked are identified within a 50 px radius (yellow circles) of the corners (large green points) of the quadrilateral [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Example of the pipeline’s intermediate outputs running on a single image. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Markerless augmented reality can be a challenging computer vision task, especially in live broadcast settings and in the absence of information related to the video capture such as the intrinsic camera parameters. This typically requires the assistance of a skilled artist, along with the use of advanced video editing tools in a post-production environment. We present an automated video augmentation pipeline that identifies textures of interest and overlays an advertisement onto these regions. We constrain the advertisement to be placed in a way that is aesthetic and natural. The aim is to augment the scene such that there is no longer a need for commercial breaks. In order to achieve seamless integration of the advertisement with the original video we build a 3D representation of the scene, place the advertisement in 3D, and then project it back onto the image plane. After successful placement in a single frame, we use homography-based, shape-preserving tracking such that the advertisement appears perspective correct for the duration of a video clip. The tracker is designed to handle smooth camera motion and shot boundaries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level system sketch for placing ads in sports videos via 3D reconstruction and homography tracking, but it supplies no results or metrics so the claims cannot be checked.

read the letter

The paper outlines an automated pipeline that finds textures in sports footage, reconstructs the scene in 3D, inserts an ad in that 3D space, projects it back to the image, and then tracks it with homography to keep it stable across frames and shot changes. Nothing here is a new algorithm; it is a straightforward combination of existing techniques applied to the advertising use case. The practical goal of reducing commercial breaks by embedding ads naturally is clear enough on paper.

Referee Report

2 major / 0 minor

Summary. The paper describes an automated pipeline for markerless augmented advertising in sports videos. It identifies textures of interest, builds a 3D representation of the scene, places the advertisement in 3D, projects it back onto the image plane, and uses homography-based, shape-preserving tracking to maintain perspective-correct placement across frames while handling smooth camera motion and shot boundaries, without requiring camera intrinsics or markers. The goal is seamless integration to eliminate the need for commercial breaks.

Significance. If the described pipeline achieves reliable aesthetic placement and seamless tracking, it could have practical impact on live sports broadcasting by enabling non-intrusive ad augmentation. The approach targets a real-world challenge in markerless AR under unconstrained capture conditions. However, the complete absence of quantitative results, error metrics, or validation experiments makes it impossible to assess whether the claims hold or how the system performs relative to existing methods.

major comments (2)

[Abstract] Abstract: The pipeline is described at a high level but the manuscript provides no quantitative results, error metrics, validation experiments, or implementation details to support the claims of aesthetic/natural placement or seamless integration.
[Abstract] Abstract: The central assumption that homography-based tracking maintains perspective-correct placement across clips despite smooth camera motion and shot boundaries (without camera intrinsics) is stated without evidence, discussion of failure modes (e.g., non-planar surfaces or depth variation), or any supporting experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript describing the markerless augmented advertising pipeline. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The pipeline is described at a high level but the manuscript provides no quantitative results, error metrics, validation experiments, or implementation details to support the claims of aesthetic/natural placement or seamless integration.

Authors: We acknowledge that the manuscript presents the pipeline at a conceptual level. The work emphasizes the overall architecture for texture identification, 3D placement, and homography tracking in unconstrained sports video without requiring camera intrinsics or markers. To address this, the revised manuscript will incorporate additional implementation details and qualitative results from example sequences demonstrating aesthetic placement and tracking across frames. Quantitative error metrics are not included in the original submission as the focus is on system design rather than comparative benchmarking; we will discuss potential evaluation strategies in the revision. revision: partial
Referee: [Abstract] Abstract: The central assumption that homography-based tracking maintains perspective-correct placement across clips despite smooth camera motion and shot boundaries (without camera intrinsics) is stated without evidence, discussion of failure modes (e.g., non-planar surfaces or depth variation), or any supporting experiments.

Authors: The homography tracking is applied under the assumption that the region of interest (e.g., sports field) can be treated as approximately planar, which holds for many broadcast sports scenarios. We will revise the manuscript to include an explicit discussion of this assumption, potential failure cases such as non-planar surfaces or large depth variations, and the method for detecting and handling shot boundaries during tracking. This will provide a more balanced analysis of the approach's scope and limitations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a high-level system description of an automated video augmentation pipeline for markerless AR advertising in sports videos. It covers texture identification, 3D scene building, ad placement in 3D, projection to image plane, and homography-based tracking, but contains no equations, derivations, fitted parameters, predictions, or first-principles results. No self-citations, uniqueness theorems, or ansatzes are invoked in any load-bearing mathematical sense. The work is a descriptive pipeline architecture with no derivation chain that could reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or ad-hoc axioms are stated. The approach relies on standard domain assumptions in computer vision about scene geometry and tracking.

axioms (1)

domain assumption Homography-based tracking suffices to handle smooth camera motion and shot boundaries while preserving shape and perspective.
Invoked in the abstract as the method for maintaining placement across frames.

pith-pipeline@v0.9.0 · 5723 in / 1207 out tokens · 38288 ms · 2026-05-24T18:10:41.380721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´ e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talw...

work page 2015
[2]

In: Proc

Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. In: Proc. of the 12th European Conf. on Computer Vision (ECCV). ECCV’12, vol. 4, pp. 214–

work page
[3]

Scalable Funding of Bitcoin Micropayment Channel Networks

Springer-Verlag, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3- 642-33783-3 16

work page doi:10.1007/978-3- 2012
[4]

Computer Vision Image Understanding 110(3), 346–359 (jun 2008)

Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Computer Vision Image Understanding 110(3), 346–359 (jun 2008). https://doi.org/10.1016/j.cviu.2007.09.014

work page doi:10.1016/j.cviu.2007.09.014 2008
[5]

IEEE Trans

Canny, J.: A computational approach to edge detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986)

work page 1986
[6]

Chang, C.H., Hsieh, K.Y., Chiang, M.C., Wu, J.L.: Virtual spotlighted advertising for tennis videos. J. Visual Commun. and Image Representation21, 595–612 (2010) Markerless Sports Advertising 15

work page 2010
[7]

In: Proc

Chang, C.H., Hsieh, K.Y., Chung, M.C., Wu, J.L.: Visa: Virtual spotlighted adver- tising. In: Proc. of the 16th ACM Int. Conf. on Multimedia. pp. 837–840 (2008). https://doi.org/10.1145/1459359.1459500

work page doi:10.1145/1459359.1459500 2008
[8]

In: BigLearn, NIPS Workshop (2011)

Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: A matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)

work page 2011
[9]

In: Proc

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2016)

work page 2016
[10]

Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972). https://doi.org/10.1145/361237.361242

work page doi:10.1145/361237.361242 1972
[11]

IEEE Robotics Automation Magazine 13, 99 – 110 (2006)

Durrant-whyte, H., Bailey, T.: Simultaneous localization and mapping: Part i. IEEE Robotics Automation Magazine 13, 99 – 110 (2006). https://doi.org/10.1109/MRA.2006.1638022

work page doi:10.1109/mra.2006.1638022 2006
[12]

Network Theory Limited (2002)

Eaton, J.W.: GNU Octave Manual. Network Theory Limited (2002)

work page 2002
[13]

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012), http: //host.robots.ox.ac.uk/pascal/VOC/voc2012/

work page 2012
[14]

Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

work page doi:10.1145/358669.358692 1981
[15]

In: Multimedia Content Analysis and Mining

Han, J., de With, P.H.N.: 3-d camera modeling and its applications in sports broadcast video analysis. In: Multimedia Content Analysis and Mining. pp. 434–

work page
[16]

Springer Berlin Heidelberg, Berlin, Heidelberg (2007)

work page 2007
[17]

Cam- bridge University Press, New York, NY, USA, 2 edn

Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cam- bridge University Press, New York, NY, USA, 2 edn. (2003)

work page 2003
[18]

Kalman, R.: A new approach to linear ﬁltering and prediction problems. J. of Basic Engineering (ASME) 82D, 35–45 (01 1960)

work page 1960
[19]

In: Advances in Visual Computing

Li, B., Peng, K., Ying, X., Zha, H.: Simultaneous vanishing point detection and camera calibration from single images. In: Advances in Visual Computing. pp. 151–160. Springer Berlin Heidelberg (2010)

work page 2010
[20]

In: Proc

Li, Y., Wan, K.W., Yan, X., Xu, C.: Real time advertisement insertion in baseball video based on advertisement eﬀect. In: Proc. of the 13th Annual ACM Int. Conf. on Multimedia. pp. 343–346 (2005). https://doi.org/10.1145/1101149.1101221

work page doi:10.1145/1101149.1101221 2005
[21]

In: Proc

Li, Z., Snavely, N.: Megadepth: Learning single-view depth prediction from internet photos. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2018)

work page 2018
[22]

16th IEEE Int

Liu, H., Qiu, X., Huang, Q., Jiang, S., Xu, C.: Advertise gently - in-image adver- tising with low intrusiveness. 16th IEEE Int. Conf. on Image Process. (ICIP) pp. 3105–3108 (2009)

work page 2009
[23]

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

work page doi:10.1023/b:visi.0000029664.99615.94 2004
[24]

In: Proc

Lucas, B.D., Kanade, T.: An iterative image registration technique with an appli- cation to stereo vision. In: Proc. of the 7th Int. Joint Conf. on Artiﬁcial Intelligence. IJCAI’81, vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1981)

work page 1981
[25]

In: Multimedia Commun

Medioni, G., Guy, G., Rom, H., Fran¸ cois, A.: Real-time billboard substitution in a video stream. In: Multimedia Commun. pp. 71–84. Springer London (1999) 16 H. E. Wong et al

work page 1999
[26]

Multimedia Syst

Mei, T., Guo, J., Hua, X.S., Liu, F.: Adon: Toward contextual overlay in-video advertising. Multimedia Syst. 16(4-5), 335–344 (2010)

work page 2010
[27]

In: Proc

Mei, T., Hua, X.S., Li, S.: Contextual in-image advertising. In: Proc. of the 16th ACM Int. Conf. on Multimedia. pp. 439–448. ACM (2008). https://doi.org/10.1145/1459359.1459418

work page doi:10.1145/1459359.1459418 2008
[28]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. Int J. of Computer Vision (IJCV) 115(3), 211–252 (2015)

work page 2015
[29]

In: Proc

Sturm, P., Triggs, B.: A factorization based algorithm for multi-image projective structure and motion. In: Proc. of the 4th European Conf. on Computer Vision (ECCV). pp. 709–720. ECCV ’96, Springer Berlin Heidelberg, Berlin, Heidelberg (1996)

work page 1996
[30]

2006 IEEE Int

Wan, K.W., Xu, C.: Automatic content placement in sports highlights. 2006 IEEE Int. Conf. on Multimedia and Expo pp. 1893–1896 (2006)

work page 2006
[31]

Xu, C., Wan, K.W., Bui, S.H., Tian, Q.: Implanting virtual advertisement into broadcast soccer video. In: Adv. in Multimedia Inf. Process. - PCM 2004. pp. 264–271. Springer Berlin Heidelberg (2005)

work page 2004
[32]

https://github.com/yasinyildirim/ShotDetection (2015)

Yildrim, Y.: Shotdetection. https://github.com/yasinyildirim/ShotDetection (2015)

work page 2015
[33]

In: Proc

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017
[34]

In: Proc

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017
[35]

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. of Computer Vision (2018). https://doi.org/10.1007/s11263-018-1140-0

work page doi:10.1007/s11263-018-1140-0 2018

[1] [1]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´ e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talw...

work page 2015

[2] [2]

In: Proc

Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. In: Proc. of the 12th European Conf. on Computer Vision (ECCV). ECCV’12, vol. 4, pp. 214–

work page

[3] [3]

Scalable Funding of Bitcoin Micropayment Channel Networks

Springer-Verlag, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3- 642-33783-3 16

work page doi:10.1007/978-3- 2012

[4] [4]

Computer Vision Image Understanding 110(3), 346–359 (jun 2008)

Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Computer Vision Image Understanding 110(3), 346–359 (jun 2008). https://doi.org/10.1016/j.cviu.2007.09.014

work page doi:10.1016/j.cviu.2007.09.014 2008

[5] [5]

IEEE Trans

Canny, J.: A computational approach to edge detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986)

work page 1986

[6] [6]

Chang, C.H., Hsieh, K.Y., Chiang, M.C., Wu, J.L.: Virtual spotlighted advertising for tennis videos. J. Visual Commun. and Image Representation21, 595–612 (2010) Markerless Sports Advertising 15

work page 2010

[7] [7]

In: Proc

Chang, C.H., Hsieh, K.Y., Chung, M.C., Wu, J.L.: Visa: Virtual spotlighted adver- tising. In: Proc. of the 16th ACM Int. Conf. on Multimedia. pp. 837–840 (2008). https://doi.org/10.1145/1459359.1459500

work page doi:10.1145/1459359.1459500 2008

[8] [8]

In: BigLearn, NIPS Workshop (2011)

Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: A matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)

work page 2011

[9] [9]

In: Proc

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2016)

work page 2016

[10] [10]

Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972). https://doi.org/10.1145/361237.361242

work page doi:10.1145/361237.361242 1972

[11] [11]

IEEE Robotics Automation Magazine 13, 99 – 110 (2006)

Durrant-whyte, H., Bailey, T.: Simultaneous localization and mapping: Part i. IEEE Robotics Automation Magazine 13, 99 – 110 (2006). https://doi.org/10.1109/MRA.2006.1638022

work page doi:10.1109/mra.2006.1638022 2006

[12] [12]

Network Theory Limited (2002)

Eaton, J.W.: GNU Octave Manual. Network Theory Limited (2002)

work page 2002

[13] [13]

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012), http: //host.robots.ox.ac.uk/pascal/VOC/voc2012/

work page 2012

[14] [14]

Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

work page doi:10.1145/358669.358692 1981

[15] [15]

In: Multimedia Content Analysis and Mining

Han, J., de With, P.H.N.: 3-d camera modeling and its applications in sports broadcast video analysis. In: Multimedia Content Analysis and Mining. pp. 434–

work page

[16] [16]

Springer Berlin Heidelberg, Berlin, Heidelberg (2007)

work page 2007

[17] [17]

Cam- bridge University Press, New York, NY, USA, 2 edn

Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cam- bridge University Press, New York, NY, USA, 2 edn. (2003)

work page 2003

[18] [18]

Kalman, R.: A new approach to linear ﬁltering and prediction problems. J. of Basic Engineering (ASME) 82D, 35–45 (01 1960)

work page 1960

[19] [19]

In: Advances in Visual Computing

Li, B., Peng, K., Ying, X., Zha, H.: Simultaneous vanishing point detection and camera calibration from single images. In: Advances in Visual Computing. pp. 151–160. Springer Berlin Heidelberg (2010)

work page 2010

[20] [20]

In: Proc

Li, Y., Wan, K.W., Yan, X., Xu, C.: Real time advertisement insertion in baseball video based on advertisement eﬀect. In: Proc. of the 13th Annual ACM Int. Conf. on Multimedia. pp. 343–346 (2005). https://doi.org/10.1145/1101149.1101221

work page doi:10.1145/1101149.1101221 2005

[21] [21]

In: Proc

Li, Z., Snavely, N.: Megadepth: Learning single-view depth prediction from internet photos. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2018)

work page 2018

[22] [22]

16th IEEE Int

Liu, H., Qiu, X., Huang, Q., Jiang, S., Xu, C.: Advertise gently - in-image adver- tising with low intrusiveness. 16th IEEE Int. Conf. on Image Process. (ICIP) pp. 3105–3108 (2009)

work page 2009

[23] [23]

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

work page doi:10.1023/b:visi.0000029664.99615.94 2004

[24] [24]

In: Proc

Lucas, B.D., Kanade, T.: An iterative image registration technique with an appli- cation to stereo vision. In: Proc. of the 7th Int. Joint Conf. on Artiﬁcial Intelligence. IJCAI’81, vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1981)

work page 1981

[25] [25]

In: Multimedia Commun

Medioni, G., Guy, G., Rom, H., Fran¸ cois, A.: Real-time billboard substitution in a video stream. In: Multimedia Commun. pp. 71–84. Springer London (1999) 16 H. E. Wong et al

work page 1999

[26] [26]

Multimedia Syst

Mei, T., Guo, J., Hua, X.S., Liu, F.: Adon: Toward contextual overlay in-video advertising. Multimedia Syst. 16(4-5), 335–344 (2010)

work page 2010

[27] [27]

In: Proc

Mei, T., Hua, X.S., Li, S.: Contextual in-image advertising. In: Proc. of the 16th ACM Int. Conf. on Multimedia. pp. 439–448. ACM (2008). https://doi.org/10.1145/1459359.1459418

work page doi:10.1145/1459359.1459418 2008

[28] [28]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. Int J. of Computer Vision (IJCV) 115(3), 211–252 (2015)

work page 2015

[29] [29]

In: Proc

Sturm, P., Triggs, B.: A factorization based algorithm for multi-image projective structure and motion. In: Proc. of the 4th European Conf. on Computer Vision (ECCV). pp. 709–720. ECCV ’96, Springer Berlin Heidelberg, Berlin, Heidelberg (1996)

work page 1996

[30] [30]

2006 IEEE Int

Wan, K.W., Xu, C.: Automatic content placement in sports highlights. 2006 IEEE Int. Conf. on Multimedia and Expo pp. 1893–1896 (2006)

work page 2006

[31] [31]

Xu, C., Wan, K.W., Bui, S.H., Tian, Q.: Implanting virtual advertisement into broadcast soccer video. In: Adv. in Multimedia Inf. Process. - PCM 2004. pp. 264–271. Springer Berlin Heidelberg (2005)

work page 2004

[32] [32]

https://github.com/yasinyildirim/ShotDetection (2015)

Yildrim, Y.: Shotdetection. https://github.com/yasinyildirim/ShotDetection (2015)

work page 2015

[33] [33]

In: Proc

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017

[34] [34]

In: Proc

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017

[35] [35]

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. of Computer Vision (2018). https://doi.org/10.1007/s11263-018-1140-0

work page doi:10.1007/s11263-018-1140-0 2018