Fast Universal Style Transfer for Artistic and Photorealistic Rendering

Haoyi Xiong; Jie An; Jiebo Luo; Jinwen Ma; Jun Huan

arxiv: 1907.03118 · v1 · pith:USB5HDUFnew · submitted 2019-07-06 · 💻 cs.CV · cs.GR· eess.IV

Fast Universal Style Transfer for Artistic and Photorealistic Rendering

Jie An , Haoyi Xiong , Jiebo Luo , Jun Huan , Jinwen Ma This is my paper

Pith reviewed 2026-05-25 01:36 UTC · model grok-4.3

classification 💻 cs.CV cs.GReess.IV

keywords universal style transferartistic stylizationphotorealistic stylizationauto-encoderneural style transferfast inferenceimage renderingdeep features

0 comments

The pith

ArtNet and PhotoNet perform universal style transfer in one end-to-end pass with fewer artifacts than prior multi-round methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two new network architectures, ArtNet for artistic style transfer and PhotoNet for photorealistic style transfer. These sit on top of an existing auto-encoder and modify deep features to transfer style from a reference image to a content image. The design replaces the multiple reconstruction rounds used by earlier methods with a single inference pass. Experiments show the resulting images have reduced artifacts and distortions for art styles while preserving sharp details in photorealistic cases, and run substantially faster.

Core claim

The authors claim that novel network architectures named ArtNet and PhotoNet placed on an existing auto-encoder enable universal style transfer for arbitrary artistic or photorealistic reference images in a single end-to-end inference pass, producing fewer artifacts and distortions than state-of-the-art methods while delivering 3X to 100X speed-ups.

What carries the argument

ArtNet and PhotoNet, novel network architectures embedded into the auto-encoder reconstruction procedure that modify deep features for style transfer without multiple reconstruction rounds.

If this is right

ArtNet generates artistic stylizations with fewer artifacts and distortions than existing algorithms.
PhotoNet produces sharp photorealistic images that faithfully preserve rich details of the input content.
Both networks achieve 3X to 100X speed-up over state-of-the-art algorithms.
The single-pass approach supports efficient handling of large content images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-pass design could support real-time style transfer on resource-limited hardware.
The architectures might extend to video stylization with better temporal stability than multi-pass approaches.
Similar one-pass modules could be tested on other image-to-image tasks that currently rely on iterative reconstruction.

Load-bearing premise

Novel architectures placed on top of an existing auto-encoder can deliver improved stylization quality and single-pass speed without requiring the multiple reconstruction rounds used by prior methods.

What would settle it

A side-by-side evaluation on a standard artistic or photorealistic style transfer benchmark in which ArtNet or PhotoNet produces more artifacts or distortions than the best prior method, or fails to show the claimed speed-up on large images.

Figures

Figures reproduced from arXiv: 1907.03118 by Haoyi Xiong, Jie An, Jiebo Luo, Jinwen Ma, Jun Huan.

**Figure 1.** Figure 1: Visual comparison of photorealistic and artistic style transfer. Content images are (a) and (d); Reference style images are shown in the bottom-right corners of (a) and (d). For photorealistic stylization, PhotoWCT [21] consumes significant computing time while producing an overly smooth image shown in (b). Our proposed PhotoNet generates the image shown in (c) of rich details with only 1/50th of the compu… view at source ↗

**Figure 2.** Figure 2: Comparison of architectures. The multi-level stylization scheme first trains an auto-encoder (AE) shown in (a) with an image reconstruction loss, then runs multi-round of AE with the WCT [20] transform for style transfer (shown in (b)). Our proposed ArtNet (c) introduces deep feature aggregation and multi-stage stylization on the decoder to better stylize images, while PhotoNet (d) utilizes additional norm… view at source ↗

**Figure 3.** Figure 3: Results of the contrast experiments against baseline artistisc style transfer methods. (a) Content (b) Style (c) PhotoWCT [21]. (d) PhotoWCT+Smooth [21]. (e) PhotoNet(WCT) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Results of the contrast experiments against baseline photorealistc style transfer method. of-the-art approaches and further provide a comprehensive empirical analysis to substantiate our observations. 4.1. Results on artistic style transfer In order to demonstrate the effectiveness of the proposed ArtNet, we conduct contrast experiments on AdaIN [12] and WCT [20], where we replace the AE part of these two … view at source ↗

**Figure 5.** Figure 5: Artistic stylization results. (a) Content (b) Style (c) IDT [26]. (d) Luan et al. [25]. (e) PhotoWCT [21]. (f) PhotoNet (WCT) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Photo-realistic stylization results. Note that the method of Luan et al. [25] requires additional segmentation masks for stylization while other compared algorithms do not. ror leads to better stylization performance. 4.4.1 Quantitative evaluation In this study, artistic style transfer methods are evaluated on a dataset consisting of 12 content images and 16 style images, where each content image is transf… view at source ↗

**Figure 7.** Figure 7: Ablation study of ArtNet and PhotoNet. (a) AdaIN [12]. (b) WCT [20]. (c) ArtNet (d) PhotoWCT [21]. (e) PhotoNet (f) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Image reconstruction results. The proposed PhotoNet and ArtNet outperform WCT [20], AdaIN [12] and PhotoWCT [21] in preserving details for image reconstruction (i.e., hairs, eye-slashes, and the local structure/textures in both artistic and photorealistic settings) [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Artistic style transfer results by the ArtNet(AdaIN) with control factor β ranging from 0.2 to 1.0. (a) Content (b) β = 0.2 (c) β = 0.4 (d) β = 0.6 (e) β = 0.8 (f) β = 1.0 (g) Style [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Artistic style transfer results by the ArtNet(WCT) with control factor β ranging from 0.2 to 1.0 [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Photorealistic style transfer results by the PhotoNet(WCT) with control factor β ranging from 0.2 to 1.0 [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: Artistic style transfer comparison between the ArtNet(AdaIN) and AdaIN [12] [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗

**Figure 13.** Figure 13: Artistic style transfer comparison between the ArtNet(AdaIN) and AdaIN [12] [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: Artistic style transfer comparison between the ArtNet(AdaIN) and AdaIN [12] [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: Artistic style transfer comparison between the ArtNet(WCT) and WCT [20] [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

**Figure 16.** Figure 16: Artistic style transfer comparison between the ArtNet(WCT) and WCT [20] [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗

**Figure 17.** Figure 17: Artistic style transfer comparison between the ArtNet(WCT) and WCT [20] [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗

**Figure 18.** Figure 18: Photorealistic style transfer comparison between the PhotoNet(WCT) and PhotoWCT [21]. (a) Content (b) Style (c) PhotoWCT [21] (d) PhotoNet(WCT) [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗

**Figure 19.** Figure 19: Photorealistic style transfer comparison between the PhotoNet(WCT) and PhotoWCT [21] [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗

**Figure 20.** Figure 20: Photorealistic style transfer comparison between the PhotoNet(WCT) and PhotoWCT [21] [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗

read the original abstract

Universal style transfer is an image editing task that renders an input content image using the visual style of arbitrary reference images, including both artistic and photorealistic stylization. Given a pair of images as the source of content and the reference of style, existing solutions usually first train an auto-encoder (AE) to reconstruct the image using deep features and then embeds pre-defined style transfer modules into the AE reconstruction procedure to transfer the style of the reconstructed image through modifying the deep features. While existing methods typically need multiple rounds of time-consuming AE reconstruction for better stylization, our work intends to design novel neural network architectures on top of AE for fast style transfer with fewer artifacts and distortions all in one pass of end-to-end inference. To this end, we propose two network architectures named ArtNet and PhotoNet to improve artistic and photo-realistic stylization, respectively. Extensive experiments demonstrate that ArtNet generates images with fewer artifacts and distortions against the state-of-the-art artistic transfer algorithms, while PhotoNet improves the photorealistic stylization results by creating sharp images faithfully preserving rich details of the input content. Moreover, ArtNet and PhotoNet can achieve 3X to 100X speed-up over the state-of-the-art algorithms, which is a major advantage for large content images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ArtNet and PhotoNet wrap an auto-encoder with new one-pass modules to cut repeated reconstructions in style transfer, delivering claimed speed and quality gains.

read the letter

The core move here is to take the existing auto-encoder plus style-module pipeline and replace the multi-round reconstruction loop with two task-specific networks, ArtNet for artistic transfer and PhotoNet for photorealistic cases. Both are designed to run end-to-end in one forward pass while trying to reduce artifacts or preserve detail better than the baselines they compare against. The reported 3X–100X speed-up is the part that would actually matter for anyone processing large images in practice. That design choice directly attacks the runtime bottleneck described in the prior work they cite, and the split into separate artistic and photo nets makes sense given how different the failure modes are in each domain. The paper stays empirical throughout—no new theory or closed-form derivations—so its value rests on whether the experiments hold up. The abstract asserts fewer distortions, sharper outputs, and faithful content preservation, but the strength of those claims depends on the concrete baselines, metrics, and visual comparisons shown later. If the full results include standard quantitative tables and ablation checks, the contribution is a useful engineering step; if they stay mostly qualitative, the speed numbers become the main takeaway. No internal contradictions appear in the stated approach, and the architectures are presented as extensions rather than reinventions of the auto-encoder foundation. This paper is aimed at practitioners who already work with style transfer pipelines and want faster inference without retraining the whole stack from scratch. A reader who needs to ship a stylization tool or benchmark runtime on high-resolution inputs could extract the module designs and test them directly. It is worth sending to peer review because the problem is real, the proposed fix is concrete, and the claims are falsifiable with standard CV evaluation protocols.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ArtNet and PhotoNet, two neural architectures placed on top of an existing auto-encoder, to perform universal style transfer for both artistic and photorealistic rendering. The central claim is that these designs enable single-pass end-to-end inference, yielding fewer artifacts and distortions than prior artistic methods, sharper detail-preserving results for photorealistic stylization, and 3X–100X speed-ups over the state of the art.

Significance. If the experimental support is robust, the contribution would be practically significant: it directly targets the multi-pass reconstruction bottleneck of prior auto-encoder-based style transfer, potentially enabling real-time or large-image applications that current methods cannot handle efficiently.

major comments (2)

[Experiments] Experiments section: the abstract states that 'extensive experiments demonstrate' fewer artifacts, better detail preservation, and 3X–100X speed-ups, yet the provided text contains no tables of quantitative metrics (PSNR, SSIM, user-study scores, or runtime benchmarks), no listed baselines, and no error analysis; these data are load-bearing for the superiority claims and must be supplied with explicit comparisons.
[§3] §3 (network architectures): the description of how ArtNet and PhotoNet modules are inserted into the auto-encoder is presented at a high level without equations or pseudocode showing the precise feature-modification operations; this makes it impossible to verify that the single-pass design actually avoids the multiple-reconstruction rounds criticized in the introduction.

minor comments (2)

[Abstract] Abstract: the speed-up range '3X to 100X' is stated without reference to image resolution or hardware; a parenthetical note on the measurement conditions would improve clarity.
[Related Work] Related-work section: several recent single-pass style-transfer methods are mentioned only generically; explicit citation of the most directly comparable works (with year and venue) is needed for proper positioning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to incorporate the requested clarifications and additions.

read point-by-point responses

Referee: [Experiments] Experiments section: the abstract states that 'extensive experiments demonstrate' fewer artifacts, better detail preservation, and 3X–100X speed-ups, yet the provided text contains no tables of quantitative metrics (PSNR, SSIM, user-study scores, or runtime benchmarks), no listed baselines, and no error analysis; these data are load-bearing for the superiority claims and must be supplied with explicit comparisons.

Authors: We agree that the current manuscript relies on qualitative visual comparisons without accompanying quantitative tables. In the revision we will add explicit tables reporting PSNR, SSIM, user-study scores, runtime benchmarks (with listed baselines), and error analysis to substantiate the claims of fewer artifacts, better detail preservation, and speed-ups. revision: yes
Referee: [§3] §3 (network architectures): the description of how ArtNet and PhotoNet modules are inserted into the auto-encoder is presented at a high level without equations or pseudocode showing the precise feature-modification operations; this makes it impossible to verify that the single-pass design actually avoids the multiple-reconstruction rounds criticized in the introduction.

Authors: We acknowledge that §3 currently provides a high-level overview. The revised manuscript will include the missing equations and pseudocode that define the exact feature-modification operations performed by ArtNet and PhotoNet, thereby making explicit how the single-pass end-to-end inference avoids the iterative reconstruction steps used by prior methods. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical proposal of ArtNet and PhotoNet architectures placed atop an existing auto-encoder to enable single-pass stylization, supported by experimental comparisons rather than any derivation chain, equations, or fitted predictions. No self-definitional relations, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, smuggled ansatzes, or renamings of known results appear. The central claims rest on reported speed and quality improvements from the new modules, which are independent of the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no access to full methods, equations, or experimental details, so free parameters, axioms, and invented entities cannot be enumerated from the text.

pith-pipeline@v0.9.0 · 5767 in / 1023 out tokens · 24562 ms · 2026-05-25T01:36:37.957409+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 7 internal anchors

[1]

D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua. Style- bank: an explicit representation for neural image style transfer. In CVPR, 2017. 3

work page 2017
[2]

T. Q. Chen and M. Schmidt. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337, 2016. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical im- age database. In CVPR, 2009. 4

work page 2009
[4]

Dumoulin, J

V . Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. In ICLR, 2017. 1, 3

work page 2017
[5]

Frigo, N

O. Frigo, N. Sabater, J. Delon, and P. Hellier. Split and match: example-based adaptive patch sampling for unsupervised style transfer. In CVPR, 2016. 3

work page 2016
[6]

L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2015
[7]

L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In CVPR, 2016. 1, 3, 6, 7

work page 2016
[8]

S. Gu, C. Chen, J. Liao, and L. Yuan. Arbitrary style transfer with deep feature reshufﬂe. In CVPR, 2018. 1, 3, 9

work page 2018
[9]

Hertzmann

A. Hertzmann. Painterly rendering with curved brush strokes of multiple sizes. In SIGGRAPH, 1998. 3

work page 1998
[10]

Hertzmann, C

A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In SIGGRAPH, 2001. 3

work page 2001
[11]

Heusel, H

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017. 6, 7

work page 2017
[12]

Huang and S

X. Huang and S. J. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017. 1, 2, 3, 4, 5, 6, 7, 8, 11, 14, 15, 16

work page 2017
[13]

Huang, M.-Y

X. Huang, M.-Y . Liu, S. Belongie, and J. Kautz. Mul- timodal unsupervised image-to-image translation. In ECCV, 2018. 3

work page 2018
[14]

Isola, J.-Y

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros. Image- to-image translation with conditional adversarial net- works. In CVPR, 2017. 3

work page 2017
[15]

Johnson, A

J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016. 1, 3, 4

work page 2016
[16]

D. P. Kingma and J. Ba. Adam: a method for stochas- tic optimization. arXiv preprint arXiv:1412.6980 ,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Li and M

C. Li and M. Wand. Combining markov random ﬁelds and convolutional neural networks for image synthe- sis. In CVPR, 2016. 3

work page 2016
[18]

S. Li, X. Xu, L. Nie, and T.-S. Chua. Laplacian- steered neural style transfer. In ACM MM, 2017. 3

work page 2017
[19]

Y . Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Diversiﬁed texture synthesis with feed-forward networks. In CVPR, 2017. 1

work page 2017
[20]

Y . Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. In NIPS, 2017. 1, 2, 3, 4, 5, 6, 7, 8, 11, 17, 18, 19

work page 2017
[21]

Li, M.-Y

Y . Li, M.-Y . Liu, X. Li, M.-H. Yang, and J. Kautz. A closed-form solution to photorealistic image styliza- tion. In ECCV, 2018. 1, 2, 3, 4, 5, 6, 7, 8, 11, 20, 21

work page 2018
[22]

J. Liao, Y . Yao, L. Yuan, G. Hua, and S. B. Kang. Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

M.-Y . Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. In NIPS, 2017. 3

work page 2017
[24]

Liu and O

M.-Y . Liu and O. Tuzel. Coupled generative adversar- ial networks. In NIPS, 2016. 3

work page 2016
[25]

F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep photo style transfer. In CVPR, 2017. 1, 3, 4, 6, 7

work page 2017
[26]

Pitie, A

F. Pitie, A. C. Kokaram, and R. Dahyot. N- dimensional probability density function transfer and its application to color transfer. In ICCV, 2005. 7

work page 2005
[27]

Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses

E. Risser, P. Wilmot, and C. Barnes. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893, 2017. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: con- volutional networks for biomedical image segmenta- tion. In International Conference on Medical Im- age Computing and Computer-assisted Intervention ,

work page
[29]

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992. 6, 7

work page 1992
[30]

Sheng, Z

L. Sheng, Z. Lin, J. Shao, and X. Wang. Avatar-net: multi-scale zero-shot style transfer by feature decora- tion. In CVPR, 2018. 1, 2, 6, 7, 9

work page 2018
[31]

Y . Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Du- rand. Style transfer for headshot portraits.ACM Trans- actions on Graphics, 33(4):148, 2014. 3

work page 2014
[32]

Y . Shih, S. Paris, F. Durand, and W. T. Freeman. Data- driven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics, 32(6):200, 2013. 3

work page 2013
[33]

Shrivastava, T

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In CVPR, 2017. 3

work page 2017
[34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 4

work page internal anchor Pith review Pith/arXiv arXiv 2014
[35]

Taigman, A

Y . Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. In ICLR, 2017. 3

work page 2017
[36]

Ulyanov, V

D. Ulyanov, V . Lebedev, A. Vedaldi, and V . S. Lem- pitsky. Texture networks: feed-forward synthesis of textures and stylized images. In ICML, 2016. 1, 3

work page 2016
[37]

Instance Normalization: The Missing Ingredient for Fast Stylization

D. Ulyanov, A. Vedaldi, and V . Lempitsky. Instance normalization: the missing ingredient for fast styliza- tion. arXiv preprint arXiv:1607.08022 , 2016. 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Ulyanov, A

D. Ulyanov, A. Vedaldi, and V . S. Lempitsky. Im- proved texture networks: maximizing quality and di- versity in feed-forward stylization and texture synthe- sis. In CVPR, 2017. 1, 3

work page 2017
[39]

Wang, M.-Y

T.-C. Wang, M.-Y . Liu, J.-Y . Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018. 3

work page 2018
[40]

X. Wang, G. Oxholm, D. Zhang, and Y .-F. Wang. Mul- timodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer. InCVPR,

work page
[41]

Winnem ¨oller, S

H. Winnem ¨oller, S. C. Olsen, and B. Gooch. Real- time video abstraction. ACM Transactions on Graph- ics, 25(3):1221–1226, 2006. 3

work page 2006
[42]

F. Yu, D. Wang, E. Shelhamer, and T. Darrell. Deep layer aggregation. In CVPR, 2018. 2, 4

work page 2018
[43]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In CVPR, 2017. 4

work page 2017
[44]

J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent ad- versarial networks. In ICCV, 2017. 3 Supplementary Material A. Network Training Setting We train the ArtNet and PhotoNet with the reconstruc- tion and perceptual loss functions, L =α·L recon + (1−α)·L precep, (5) whereα is used to balance tow los...

work page 2017

[1] [1]

D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua. Style- bank: an explicit representation for neural image style transfer. In CVPR, 2017. 3

work page 2017

[2] [2]

T. Q. Chen and M. Schmidt. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337, 2016. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical im- age database. In CVPR, 2009. 4

work page 2009

[4] [4]

Dumoulin, J

V . Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. In ICLR, 2017. 1, 3

work page 2017

[5] [5]

Frigo, N

O. Frigo, N. Sabater, J. Delon, and P. Hellier. Split and match: example-based adaptive patch sampling for unsupervised style transfer. In CVPR, 2016. 3

work page 2016

[6] [6]

L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2015

[7] [7]

L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In CVPR, 2016. 1, 3, 6, 7

work page 2016

[8] [8]

S. Gu, C. Chen, J. Liao, and L. Yuan. Arbitrary style transfer with deep feature reshufﬂe. In CVPR, 2018. 1, 3, 9

work page 2018

[9] [9]

Hertzmann

A. Hertzmann. Painterly rendering with curved brush strokes of multiple sizes. In SIGGRAPH, 1998. 3

work page 1998

[10] [10]

Hertzmann, C

A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In SIGGRAPH, 2001. 3

work page 2001

[11] [11]

Heusel, H

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017. 6, 7

work page 2017

[12] [12]

Huang and S

X. Huang and S. J. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017. 1, 2, 3, 4, 5, 6, 7, 8, 11, 14, 15, 16

work page 2017

[13] [13]

Huang, M.-Y

X. Huang, M.-Y . Liu, S. Belongie, and J. Kautz. Mul- timodal unsupervised image-to-image translation. In ECCV, 2018. 3

work page 2018

[14] [14]

Isola, J.-Y

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros. Image- to-image translation with conditional adversarial net- works. In CVPR, 2017. 3

work page 2017

[15] [15]

Johnson, A

J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016. 1, 3, 4

work page 2016

[16] [16]

D. P. Kingma and J. Ba. Adam: a method for stochas- tic optimization. arXiv preprint arXiv:1412.6980 ,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Li and M

C. Li and M. Wand. Combining markov random ﬁelds and convolutional neural networks for image synthe- sis. In CVPR, 2016. 3

work page 2016

[18] [18]

S. Li, X. Xu, L. Nie, and T.-S. Chua. Laplacian- steered neural style transfer. In ACM MM, 2017. 3

work page 2017

[19] [19]

Y . Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Diversiﬁed texture synthesis with feed-forward networks. In CVPR, 2017. 1

work page 2017

[20] [20]

Y . Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. In NIPS, 2017. 1, 2, 3, 4, 5, 6, 7, 8, 11, 17, 18, 19

work page 2017

[21] [21]

Li, M.-Y

Y . Li, M.-Y . Liu, X. Li, M.-H. Yang, and J. Kautz. A closed-form solution to photorealistic image styliza- tion. In ECCV, 2018. 1, 2, 3, 4, 5, 6, 7, 8, 11, 20, 21

work page 2018

[22] [22]

J. Liao, Y . Yao, L. Yuan, G. Hua, and S. B. Kang. Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

M.-Y . Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. In NIPS, 2017. 3

work page 2017

[24] [24]

Liu and O

M.-Y . Liu and O. Tuzel. Coupled generative adversar- ial networks. In NIPS, 2016. 3

work page 2016

[25] [25]

F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep photo style transfer. In CVPR, 2017. 1, 3, 4, 6, 7

work page 2017

[26] [26]

Pitie, A

F. Pitie, A. C. Kokaram, and R. Dahyot. N- dimensional probability density function transfer and its application to color transfer. In ICCV, 2005. 7

work page 2005

[27] [27]

Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses

E. Risser, P. Wilmot, and C. Barnes. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893, 2017. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: con- volutional networks for biomedical image segmenta- tion. In International Conference on Medical Im- age Computing and Computer-assisted Intervention ,

work page

[29] [29]

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992. 6, 7

work page 1992

[30] [30]

Sheng, Z

L. Sheng, Z. Lin, J. Shao, and X. Wang. Avatar-net: multi-scale zero-shot style transfer by feature decora- tion. In CVPR, 2018. 1, 2, 6, 7, 9

work page 2018

[31] [31]

Y . Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Du- rand. Style transfer for headshot portraits.ACM Trans- actions on Graphics, 33(4):148, 2014. 3

work page 2014

[32] [32]

Y . Shih, S. Paris, F. Durand, and W. T. Freeman. Data- driven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics, 32(6):200, 2013. 3

work page 2013

[33] [33]

Shrivastava, T

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In CVPR, 2017. 3

work page 2017

[34] [34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 4

work page internal anchor Pith review Pith/arXiv arXiv 2014

[35] [35]

Taigman, A

Y . Taigman, A. Polyak, and L. Wolf. Unsupervised cross-domain image generation. In ICLR, 2017. 3

work page 2017

[36] [36]

Ulyanov, V

D. Ulyanov, V . Lebedev, A. Vedaldi, and V . S. Lem- pitsky. Texture networks: feed-forward synthesis of textures and stylized images. In ICML, 2016. 1, 3

work page 2016

[37] [37]

Instance Normalization: The Missing Ingredient for Fast Stylization

D. Ulyanov, A. Vedaldi, and V . Lempitsky. Instance normalization: the missing ingredient for fast styliza- tion. arXiv preprint arXiv:1607.08022 , 2016. 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Ulyanov, A

D. Ulyanov, A. Vedaldi, and V . S. Lempitsky. Im- proved texture networks: maximizing quality and di- versity in feed-forward stylization and texture synthe- sis. In CVPR, 2017. 1, 3

work page 2017

[39] [39]

Wang, M.-Y

T.-C. Wang, M.-Y . Liu, J.-Y . Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018. 3

work page 2018

[40] [40]

X. Wang, G. Oxholm, D. Zhang, and Y .-F. Wang. Mul- timodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer. InCVPR,

work page

[41] [41]

Winnem ¨oller, S

H. Winnem ¨oller, S. C. Olsen, and B. Gooch. Real- time video abstraction. ACM Transactions on Graph- ics, 25(3):1221–1226, 2006. 3

work page 2006

[42] [42]

F. Yu, D. Wang, E. Shelhamer, and T. Darrell. Deep layer aggregation. In CVPR, 2018. 2, 4

work page 2018

[43] [43]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In CVPR, 2017. 4

work page 2017

[44] [44]

J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent ad- versarial networks. In ICCV, 2017. 3 Supplementary Material A. Network Training Setting We train the ArtNet and PhotoNet with the reconstruc- tion and perceptual loss functions, L =α·L recon + (1−α)·L precep, (5) whereα is used to balance tow los...

work page 2017