pith. sign in

arxiv: 1906.09909 · v1 · pith:WA6WWBMDnew · submitted 2019-06-24 · 💻 cs.CV · cs.AI· cs.LG

Deep Exemplar-based Video Colorization

Pith reviewed 2026-05-25 17:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords exemplar-based video colorizationrecurrent networktemporal consistencysemantic correspondencecolor propagationdeep learningcomputer vision
0
0 comments X

The pith

A recurrent network unifies semantic matching and color propagation to colorize video sequences from one reference image while maintaining temporal consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the first end-to-end network for exemplar-based video colorization. It introduces a recurrent framework that combines the tasks of finding semantic correspondences between the reference and each frame with propagating colors across the sequence. Video frames are processed in order using prior colorization results, and a temporal consistency loss is applied during training to enforce coherence. This setup is intended to reduce error accumulation that occurs when correspondence and propagation are handled in separate stages.

Core claim

By training a recurrent network end-to-end that unifies semantic correspondence and color propagation steps, with both steps guided by the reference image and reinforced by a temporal consistency loss, realistic videos can be produced that remain faithful to the reference style and exhibit good temporal stability.

What carries the argument

The recurrent framework that unifies semantic correspondence and color propagation, allowing the reference image to guide colorization of every frame based on colorization history.

If this is right

  • Each frame receives guidance from the reference through the unified correspondence and propagation steps.
  • Sequential processing based on colorization history reduces accumulated propagation errors.
  • The temporal consistency loss enforces coherency across the entire sequence.
  • The resulting videos are claimed to be superior to prior methods in both quantitative metrics and visual quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same recurrent unification might apply to other reference-guided video tasks such as style transfer or segmentation.
  • Efficiency improvements would be needed before the method could run on long sequences in real time.
  • Performance on videos with rapid motion or lighting changes would need separate verification beyond the reported experiments.

Load-bearing premise

Training the recurrent network end-to-end with the temporal consistency loss will produce realistic videos with good temporal stability without introducing new artifacts or drifting from the reference style across long sequences.

What would settle it

A test on a long video sequence showing either accumulated color drift away from the reference or visible flickering despite the temporal loss would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.09909 by Amine Bermak, Bo Zhang, Dong Chen, Jing Liao, Lu Yuan, Mingming He, Pedro V. Sander.

Figure 1
Figure 1. Figure 1: The framework of our video colorization network. The [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The detailed diagram of the proposed network. The correspondence subnet finds the correspondence of source image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Augmented training images from ImageNet dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: First row: nearest neighbor matching. Second row: [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study for different loss functions. Please refer to the supplementary material for the quantitative comparisons. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison with image colorization with state-of-the-art methods. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Quantitative comparison with video color propagation. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: User study results [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison with video color propagation. With a given color frame as start, colors are propagated to the succeeding video [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison with automatic video colorization. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Colorization on legacy videos. Appendix E. User study We conduct two user studies: one to measure the video colorization quality and another for video propagation. For the first study, we first compare our video colorization with three methods of per-frame automatic video colorization: Larsson et al. [16], Zhang et al. [17] and Iizuka et al. [15]. We use 19 videos randomly selected from the video test dat… view at source ↗
Figure 16
Figure 16. Figure 16: Limitation: our method cannot assure long-term temporal consistency. The color of the train gradually changes (from red to [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗
read the original abstract

This paper presents the first end-to-end network for exemplar-based video colorization. The main challenge is to achieve temporal consistency while remaining faithful to the reference style. To address this issue, we introduce a recurrent framework that unifies the semantic correspondence and color propagation steps. Both steps allow a provided reference image to guide the colorization of every frame, thus reducing accumulated propagation errors. Video frames are colorized in sequence based on the colorization history, and its coherency is further enforced by the temporal consistency loss. All of these components, learned end-to-end, help produce realistic videos with good temporal stability. Experiments show our result is superior to the state-of-the-art methods both quantitatively and qualitatively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. This paper claims to present the first end-to-end recurrent network for exemplar-based video colorization. The recurrent framework unifies semantic correspondence and color propagation so a single reference image guides colorization of every frame, reducing accumulated propagation errors. Video frames are processed sequentially using colorization history, with a temporal consistency loss enforcing coherency; all components are learned end-to-end to produce realistic videos with good temporal stability. Experiments are said to demonstrate quantitative and qualitative superiority over state-of-the-art methods.

Significance. If the experimental claims hold with proper validation, the unified recurrent approach would represent a meaningful advance in exemplar-based video colorization by addressing error accumulation and temporal instability in a single learned model, potentially outperforming prior separate-stage pipelines.

major comments (2)
  1. [Abstract] Abstract: the central claim of quantitative and qualitative superiority (and reduced propagation errors via the recurrent unification) is asserted without error bars, dataset details, ablation results, or long-sequence experiments, so the experimental summary cannot be verified and the claim that end-to-end training prevents reference drift remains untested.
  2. [Abstract] Abstract: no loss equation, recurrence depth analysis, or ablation isolating the unification of semantic correspondence and color propagation is supplied, leaving the assumption that the recurrent state maintains reference fidelity without introducing new artifacts or style drift across long sequences unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and indicate where revisions to the abstract or supporting text are feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of quantitative and qualitative superiority (and reduced propagation errors via the recurrent unification) is asserted without error bars, dataset details, ablation results, or long-sequence experiments, so the experimental summary cannot be verified and the claim that end-to-end training prevents reference drift remains untested.

    Authors: The abstract is a concise summary; full dataset descriptions appear in Section 4.1, quantitative/qualitative results and comparisons in Section 4.2, and ablations in Section 5. Error bars were not reported in the submitted version. The recurrent unification is motivated in Section 3 as a means to reduce propagation drift, with temporal stability shown on the evaluated sequences. We will revise the abstract to qualify the superiority claim and add a forward reference to the experimental sections. revision: partial

  2. Referee: [Abstract] Abstract: no loss equation, recurrence depth analysis, or ablation isolating the unification of semantic correspondence and color propagation is supplied, leaving the assumption that the recurrent state maintains reference fidelity without introducing new artifacts or style drift across long sequences unsupported.

    Authors: The temporal consistency loss is formalized in the method section (with the relevant equation). The recurrent state and unification of correspondence and propagation are detailed in Section 3, and component ablations appear in Section 5. A dedicated recurrence-depth study and explicit long-sequence drift measurements are not present; we can add a brief reference to the loss equation in the abstract and note the design rationale for fidelity preservation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML method with external benchmarks

full rationale

The paper proposes a recurrent end-to-end neural network architecture for exemplar-based video colorization, trained with a temporal consistency loss and evaluated on external datasets against prior methods. No derivation chain, equations, or first-principles results are presented that reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. Claims rest on learned behavior and quantitative/qualitative experiments, which are self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on standard supervised learning assumptions (gradient-based optimization of a neural network) and the untested premise that the proposed architecture and loss suffice for temporal stability.

axioms (1)
  • domain assumption Gradient-based optimization can jointly learn semantic correspondence, color propagation, and temporal consistency from paired training data.
    Implicit in any end-to-end deep learning claim; location: entire abstract description of training.

pith-pipeline@v0.9.0 · 5655 in / 1170 out tokens · 22855 ms · 2026-05-25T17:41:49.006445+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 17 internal anchors

  1. [1]

    Colorization us- ing optimization,

    A. Levin, D. Lischinski, and Y . Weiss, “Colorization us- ing optimization,” in ACM transactions on graphics (TOG), vol. 23, pp. 689–694, ACM, 2004. 1, 2

  2. [2]

    Fast image and video colorization using chrominance blending,

    L. Yatziv and G. Sapiro, “Fast image and video colorization using chrominance blending,” 2004. 1, 2

  3. [3]

    An adaptive edge detection based colorization algo- rithm and its applications,

    Y .-C. Huang, Y .-S. Tung, J.-C. Chen, S.-W. Wang, and J.- L. Wu, “An adaptive edge detection based colorization algo- rithm and its applications,” inProceedings of the 13th annual ACM international conference on Multimedia, pp. 351–354, ACM, 2005. 1, 2

  4. [4]

    Manga colorization,

    Y . Qu, T.-T. Wong, and P.-A. Heng, “Manga colorization,” in ACM Transactions on Graphics (TOG) , vol. 25, pp. 1214– 1220, ACM, 2006. 1, 2

  5. [5]

    Natural image colorization,

    Q. Luan, F. Wen, D. Cohen-Or, L. Liang, Y .-Q. Xu, and H.- Y . Shum, “Natural image colorization,” in Proceedings of the 18th Eurographics conference on Rendering Techniques, pp. 309–320, Eurographics Association, 2007. 1, 2

  6. [6]

    Transferring color to greyscale images,

    T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring color to greyscale images,” in ACM Transactions on Graph- ics (TOG), vol. 21, pp. 277–280, ACM, 2002. 1, 2

  7. [7]

    Variational exemplar-based image colorization,

    A. Bugeau, V .-T. Ta, and N. Papadakis, “Variational exemplar-based image colorization,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 298–307, 2014. 1, 2

  8. [8]

    Intrinsic colorization,

    X. Liu, L. Wan, Y . Qu, T.-T. Wong, S. Lin, C.-S. Leung, and P.-A. Heng, “Intrinsic colorization,” inACM Transactions on Graphics (TOG), vol. 27, p. 152, ACM, 2008. 1, 2

  9. [9]

    Semantic colorization with internet im- ages,

    A. Y .-S. Chia, S. Zhuo, R. K. Gupta, Y .-W. Tai, S.-Y . Cho, P. Tan, and S. Lin, “Semantic colorization with internet im- ages,” in ACM Transactions on Graphics (TOG) , vol. 30, p. 156, ACM, 2011. 1, 2

  10. [10]

    Image colorization using similar images,

    R. K. Gupta, A. Y .-S. Chia, D. Rajan, E. S. Ng, and H. Zhiy- ong, “Image colorization using similar images,” in Proceed- ings of the 20th ACM international conference on Multime- dia, pp. 369–378, ACM, 2012. 1, 2

  11. [11]

    Automatic im- age colorization via multimodal predictions,

    G. Charpiat, M. Hofmann, and B. Sch¨olkopf, “Automatic im- age colorization via multimodal predictions,” in European conference on computer vision, pp. 126–139, Springer, 2008. 1, 2

  12. [12]

    Colorization by example.,

    R. Ironi, D. Cohen-Or, and D. Lischinski, “Colorization by example.,” in Rendering Techniques, pp. 201–210, Citeseer,

  13. [13]

    Local color transfer via probabilistic segmentation by expectation-maximization,

    Y .-W. Tai, J.-Y . Jia, and C.-K. Tang, “Local color transfer via probabilistic segmentation by expectation-maximization,” in IEEE Conference on Computer Vision & Pattern Recognition (CVPR), 2005. 1, 2

  14. [14]

    Deep colorization,

    Z. Cheng, Q. Yang, and B. Sheng, “Deep colorization,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 415–423, 2015. 1, 2

  15. [15]

    Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classification,

    S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classification,” ACM Transactions on Graphics (TOG) , vol. 35, no. 4, p. 110, 2016. 1, 2, 6, 7, 8, 14

  16. [16]

    Learning rep- resentations for automatic colorization,

    G. Larsson, M. Maire, and G. Shakhnarovich, “Learning rep- resentations for automatic colorization,” in European Con- ference on Computer Vision , pp. 577–593, Springer, 2016. 1, 2, 6, 7, 8, 14

  17. [17]

    Colorful image col- orization,

    R. Zhang, P. Isola, and A. A. Efros, “Colorful image col- orization,” in European Conference on Computer Vision , pp. 649–666, Springer, 2016. 1, 2, 6, 7, 8, 14

  18. [18]

    Learning large- scale automatic image colorization,

    A. Deshpande, J. Rock, and D. Forsyth, “Learning large- scale automatic image colorization,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 567–575, 2015. 1, 2

  19. [19]

    Pixel-level Semantics Guided Image Colorization

    J. Zhao, L. Liu, C. G. Snoek, J. Han, and L. Shao, “Pixel- level semantics guided image colorization,” arXiv preprint arXiv:1808.01597, 2018. 1, 2

  20. [20]

    Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2

    F. Baldassarre, D. G. Mor ´ın, and L. Rod ´es-Guirao, “Deep koalarization: Image colorization using cnns and inception- resnet-v2,” arXiv preprint arXiv:1712.03400, 2017. 1, 2

  21. [21]

    Blind video temporal consistency,

    N. Bonneel, J. Tompkin, K. Sunkavalli, D. Sun, S. Paris, and H. Pfister, “Blind video temporal consistency,” ACM Trans- actions on Graphics (TOG), vol. 34, no. 6, p. 196, 2015. 1, 2

  22. [22]

    Learning Blind Video Temporal Consistency

    W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consis- tency,”arXiv preprint arXiv:1808.00449, 2018. 1, 2, 7

  23. [23]

    Video colorization using parallel optimization in feature space,

    B. Sheng, H. Sun, M. Magnor, and P. Li, “Video colorization using parallel optimization in feature space,” IEEE Transac- tions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 407–417, 2014. 1, 2

  24. [24]

    Key- frame based spatiotemporal scribble propagation,

    P. Do ˘gan, T. O. Aydın, N. Stefanoski, and A. Smolic, “Key- frame based spatiotemporal scribble propagation,” in Pro- ceedings of the Eurographics Workshop on Intelligent Cin- ematography and Editing, pp. 13–20, Eurographics Associ- ation, 2015. 1, 2

  25. [25]

    Spatiotemporal colorization of video using 3d steerable pyramids,

    S. Paul, S. Bhattacharya, and S. Gupta, “Spatiotemporal colorization of video using 3d steerable pyramids,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1605–1619, 2017. 1, 2

  26. [26]

    Video propagation networks,

    V . Jampani, R. Gadde, and P. V . Gehler, “Video propagation networks,” in Proc. CVPR, vol. 6, p. 7, 2017. 1, 2, 7, 8, 14

  27. [27]

    Tracking emerges by colorizing videos,

    C. V ondrick, A. Shrivastava, A. Fathi, S. Guadarrama, and K. Murphy, “Tracking emerges by colorizing videos,” in Proc. ECCV, 2018. 1, 2

  28. [28]

    Switchable Temporal Propagation Network

    S. Liu, G. Zhong, S. De Mello, J. Gu, V . Jampani, M.-H. Yang, and J. Kautz, “Switchable temporal propagation net- work,” arXiv preprint arXiv:1804.08758 , 2018. 1, 2, 7, 8, 14

  29. [29]

    Deep Video Color Propagation

    S. Meyer, V . Cornill `ere, A. Djelouah, C. Schroers, and M. Gross, “Deep video color propagation,” arXiv preprint arXiv:1808.03232, 2018. 1, 2

  30. [30]

    Deep exemplar-based colorization,

    M. He, D. Chen, J. Liao, P. V . Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Transactions on Graph- ics (TOG), vol. 37, no. 4, p. 47, 2018. 1, 2, 6, 7

  31. [31]

    Visual Attribute Transfer through Deep Image Analogy

    J. Liao, Y . Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual at- tribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017. 1, 2, 4

  32. [32]

    Real-Time User-Guided Image Colorization with Learned Deep Priors

    R. Zhang, J.-Y . Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, and A. A. Efros, “Real-time user-guided image colorization with learned deep priors,” arXiv preprint arXiv:1705.02999,

  33. [33]

    Progressive Color Transfer with Dense Semantic Correspondences

    M. He, J. Liao, L. Yuan, and P. V . Sander, “Neural color transfer between images,” arXiv preprint arXiv:1710.00756,

  34. [34]

    Image- to-image translation with conditional adversarial networks,

    P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros, “Image- to-image translation with conditional adversarial networks,” arXiv preprint, 2017. 2

  35. [35]

    Learning diverse image colorization.,

    A. Deshpande, J. Lu, M.-C. Yeh, M. J. Chong, and D. A. Forsyth, “Learning diverse image colorization.,” in CVPR, pp. 2877–2885, 2017. 2

  36. [36]

    Structural Consistency and Controllability for Diverse Colorization

    S. Messaoud, D. Forsyth, and A. G. Schwing, “Struc- tural consistency and controllability for diverse coloriza- tion,” arXiv preprint arXiv:1809.02129, 2018. 2

  37. [37]

    PixColor: Pixel Recursive Colorization

    S. Guadarrama, R. Dahl, D. Bieber, M. Norouzi, J. Shlens, and K. Murphy, “Pixcolor: Pixel recursive colorization,” arXiv preprint arXiv:1705.07208, 2017. 2

  38. [38]

    Probabilistic Image Colorization

    A. Royer, A. Kolesnikov, and C. H. Lampert, “Probabilistic image colorization,”arXiv preprint arXiv:1705.04258, 2017. 2

  39. [39]

    Colorization of grayscale images and videos using a semiautomatic approach,

    V . G. Jacob and S. Gupta, “Colorization of grayscale images and videos using a semiautomatic approach,” in Image Pro- cessing (ICIP), 2009 16th IEEE International Conference on, pp. 1653–1656, IEEE, 2009. 2

  40. [40]

    Approximate nearest neighbor fields in video,

    N. Ben-Zrihem and L. Zelnik-Manor, “Approximate nearest neighbor fields in video,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, pp. 5233– 5242, 2015. 2

  41. [41]

    Robust and au- tomatic video colorization via multiframe reordering refine- ment,

    S. Xia, J. Liu, Y . Fang, W. Yang, and Z. Guo, “Robust and au- tomatic video colorization via multiframe reordering refine- ment,” in Image Processing (ICIP), 2016 IEEE International Conference on, pp. 4017–4021, IEEE, 2016. 2

  42. [42]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 3

  43. [43]

    Non-local Neural Networks

    X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” arXiv preprint arXiv:1711.07971, vol. 10,

  44. [44]

    Perceptual losses for real-time style transfer and super-resolution,

    J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision , pp. 694–711, Springer,

  45. [45]

    The Contextual Loss for Image Transformation with Non-Aligned Data

    R. Mechrez, I. Talmi, and L. Zelnik-Manor, “The contextual loss for image transformation with non-aligned data,” arXiv preprint arXiv:1803.02077, 2018. 4

  46. [46]

    Edge- preserving decompositions for multi-scale tone and detail manipulation,

    Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge- preserving decompositions for multi-scale tone and detail manipulation,” in ACM Transactions on Graphics (TOG) , vol. 27, p. 67, ACM, 2008. 4

  47. [47]

    The relativistic discriminator: a key element missing from standard GAN

    A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard gan,” arXiv preprint arXiv:1807.00734, 2018. 5

  48. [48]

    Coherent online video style transfer,

    D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua, “Coherent online video style transfer,” in Proceedings of the IEEE In- ternational Conference on Computer Vision, pp. 1105–1114,

  49. [49]

    Self-Attention Generative Adversarial Networks

    H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self- attention generative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018. 5

  50. [50]

    Spectral Normalization for Generative Adversarial Networks

    T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida, “Spec- tral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018. 5

  51. [51]

    “Videvo.” https://www.videvo.net/. 5

  52. [52]

    Actions in con- text,

    M. Marszałek, I. Laptev, and C. Schmid, “Actions in con- text,” in IEEE Conference on Computer Vision & Pattern Recognition, 2009. 5

  53. [53]

    Flownet 2.0: Evolution of optical flow estimation with deep networks,

    E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” inIEEE conference on computer vision and pattern recognition (CVPR), vol. 2, p. 6, 2017. 5

  54. [54]

    Artistic style trans- fer for videos,

    M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style trans- fer for videos,” in German Conference on Pattern Recogni- tion, pp. 26–36, Springer, 2016. 5

  55. [55]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium,

    M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” inAdvances in Neural Information Processing Systems, pp. 6626–6637, 2017. 6

  56. [56]

    Measuring colorfulness in natural images,

    D. Hasler and S. E. Suesstrunk, “Measuring colorfulness in natural images,” in Human vision and electronic imaging VIII, vol. 5007, pp. 87–96, International Society for Optics and Photonics, 2003. 6

  57. [57]

    A benchmark dataset and evaluation methodology for video object segmentation,

    F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732, 2016. 8 Appendix A. Details of network architecture The overall network consists of two sub-m...