pith. sign in

arxiv: 1907.03398 · v1 · pith:VYBEPWCRnew · submitted 2019-07-08 · 💻 cs.CV

Facial Makeup Transfer Combining Illumination Transfer

Pith reviewed 2026-05-25 01:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords facial makeup transferillumination transferfacial landmarkslayer decompositionimage processingvirtual makeupreal-time application
0
0 comments X

The pith

Facial makeup transfer handles dark and white styles by adding illumination transfer to landmark-based layer processing of a single reference image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a real-time Windows application that transfers makeup effects from one reference image to a user's input photo. Both images are split via facial landmarks into structure, color, and detail layers that receive separate processing, after which illumination is transferred from reference to input. The method is presented as solving three practical problems: effective handling of black, dark, and white makeup; completion in seconds without deep-learning frameworks; and successful use of references that include air-bangs.

Core claim

Dividing input and reference images into facial structure, color, and detail layers with facial feature points, applying distinct algorithms to each layer, and adding illumination transfer produces realistic makeup output that works for dark and white styles, finishes quickly, and succeeds even when the reference has air-bangs.

What carries the argument

Landmark-driven decomposition into facial structure, color, and detail layers combined with separate illumination transfer.

If this is right

  • Black, dark, and white makeup styles become transferable without special handling.
  • The entire process runs in seconds on ordinary hardware without training deep models.
  • Reference photos containing air-bangs can be used directly and produce correct results.
  • A single reference image suffices for real-time virtual makeup preview on a Windows platform.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same layer decomposition could support partial makeup edits, such as changing only eye color while keeping skin tone.
  • Extending the pipeline to live video would enable real-time makeup try-on in video calls or mobile cameras.
  • Because illumination is handled separately, the approach might combine with other lighting models for outdoor or studio conditions.

Load-bearing premise

That landmark-based division into independent layers permits separate processing whose results recombine naturally once illumination is matched.

What would settle it

A test set of input-reference pairs where the output shows visible seams or color mismatches at layer boundaries even after illumination transfer is applied.

Figures

Figures reproduced from arXiv: 1907.03398 by Ning Ning, Rui Han, Xiaodong Li, Xiaokun Zhang, Xin Jin.

Figure 1
Figure 1. Figure 1: FIGURE 1: The makeup transfer effect of our method. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2: The pipeline of facial makeup transfer. Our method is divided into five steps: whitening and smoothing, facial [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3: Facial feature points landmarked with Active Shape Model (ASM). [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4: Facial components defined by facial parsing of Liu et al. [6], including hair, eyebrows, eyes, nose, lips, mouth, [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5: Facial alignment by facial warping. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIGURE 6: To make makeup transfer further achieve better result, we whiten and smooth facial skin component. Then facial [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIGURE 7: Comparison results between us and Guo et al. [2], Neural Style makeup transfer examples [13], and Liu et al. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIGURE 8: Comparison results between us and Chang et al. [5]. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIGURE 9: Examples of our makeup transfer result with air-bangs. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

To meet the women appearance needs, we present a novel virtual experience approach of facial makeup transfer, developed into windows platform application software. The makeup effects could present on the user's input image in real time, with an only single reference image. The input image and reference image are divided into three layers by facial feature points landmarked: facial structure layer, facial color layer, and facial detail layer. Except for the above layers are processed by different algorithms to generate output image, we also add illumination transfer, so that the illumination effect of the reference image is automatically transferred to the input image. Our approach has the following three advantages: (1) Black or dark and white facial makeup could be effectively transferred by introducing illumination transfer; (2) Efficiently transfer facial makeup within seconds compared to those methods based on deep learning frameworks; (3) Reference images with the air-bangs could transfer makeup perfectly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a method for real-time facial makeup transfer on a Windows platform using a single reference image. Input and reference images are divided via facial landmarks into three layers (structure, color, detail), each processed by separate algorithms, with an added illumination transfer step to apply the reference lighting to the input. The abstract asserts three advantages: effective transfer of black/dark and white makeup via illumination transfer, completion within seconds (unlike deep-learning methods), and perfect transfer for references with air-bangs.

Significance. If validated, the approach could offer a practical, non-deep-learning alternative for efficient real-time makeup transfer applications, especially handling dark makeup and air-bang references. The absence of any results, metrics, or comparisons leaves the claimed advantages and the layer-separation assumption untested, limiting assessment of significance.

major comments (2)
  1. [Abstract] Abstract: The three listed advantages are asserted without any quantitative results, error metrics, baseline comparisons, qualitative examples, or validation data, which is load-bearing because the central claims cannot be evaluated or reproduced from the provided description alone.
  2. [Abstract] Abstract (method description): The core assumption that landmark-based division yields three cleanly separable layers permitting independent algorithmic processing whose recombination with illumination transfer produces artifact-free realistic output (including for black/dark/white makeup and air-bang cases) receives no supporting analysis of layer stability under pose/expression changes, illumination bleeding, or seam/color-shift avoidance in recombination.
minor comments (2)
  1. [Abstract] Abstract: Minor grammatical issues, e.g., 'To meet the women appearance needs' should read 'To meet women's appearance needs'.
  2. [Abstract] Abstract: The procedural description omits concrete details on the algorithms applied to each layer and the exact implementation of illumination transfer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The three listed advantages are asserted without any quantitative results, error metrics, baseline comparisons, qualitative examples, or validation data, which is load-bearing because the central claims cannot be evaluated or reproduced from the provided description alone.

    Authors: The manuscript presents a non-learning pipeline whose claimed advantages follow directly from its design: illumination transfer is introduced specifically to handle dark makeup, the layer-wise processing avoids deep-learning runtimes, and landmark-based separation is intended to accommodate air-bang references. We agree that the abstract states these advantages without supporting measurements or examples. In revision we will add a results section containing qualitative transfer examples on dark/white makeup and air-bang references together with wall-clock timings on the target Windows platform. revision: yes

  2. Referee: [Abstract] Abstract (method description): The core assumption that landmark-based division yields three cleanly separable layers permitting independent algorithmic processing whose recombination with illumination transfer produces artifact-free realistic output (including for black/dark/white makeup and air-bang cases) receives no supporting analysis of layer stability under pose/expression changes, illumination bleeding, or seam/color-shift avoidance in recombination.

    Authors: Landmark-based partitioning into structure, color and detail layers is a standard, deterministic step whose stability is governed by the accuracy of the landmark detector under the frontal or near-frontal poses targeted by the real-time application. Each layer is processed by an independent algorithm chosen to minimize cross-layer interference, and the illumination transfer step is applied globally after recombination to reduce lighting seams. The manuscript does not contain an explicit ablation of extreme pose variation or quantitative seam analysis because the contribution centers on the practical pipeline rather than a robustness study. We will add a short paragraph in the method section discussing these design assumptions and their scope. revision: partial

Circularity Check

0 steps flagged

No circularity: purely procedural description with no equations, derivations, or self-referential predictions

full rationale

The paper presents a landmark-based image processing pipeline described entirely in prose without any equations, fitted parameters, uniqueness theorems, or mathematical derivations. Division into structure/color/detail layers, separate algorithmic processing, and addition of illumination transfer are stated as design choices rather than derived results. No self-citations appear in the provided text, and no quantity is claimed as a 'prediction' that reduces to an input fit by construction. The central claims are therefore not load-bearing on any circular step; they rest on the unproven effectiveness of the described procedure, which is an empirical rather than definitional issue.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified domain assumption that landmark-driven layer separation plus unspecified per-layer algorithms plus illumination transfer will produce coherent, natural results for the targeted makeup styles.

axioms (1)
  • domain assumption Facial feature points can accurately divide both input and reference images into independent structure, color, and detail layers whose separate processing yields natural combined output when illumination is also transferred.
    Invoked at the start of the pipeline description to justify the three-layer decomposition.

pith-pipeline@v0.9.0 · 5683 in / 1179 out tokens · 32171 ms · 2026-05-25T01:30:31.467478+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    Brown, and Ying-Qing Xu

    Wai-Shun Tong, Chi-Keung Tang, Michael S. Brown, and Ying-Qing Xu. reference-based cosmetic transfer. In Proceedings of the Pacific Conference on Computer Graphics and Applications, Pacific Graphics 2007, Maui, Hawaii, USA, October 29 - November 2, 2007, pages 211– 218, 2007

  2. [2]

    Digital facial makeup by reference

    Dong Guo and Terence Sim. Digital facial makeup by reference. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 73–79, 2009

  3. [3]

    Simulating makeup through physics-based manipulation of intrinsic image layers

    Chen Li, Kun Zhou, and Stephen Lin. Simulating makeup through physics-based manipulation of intrinsic image layers. In IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 4621–4629, 2015

  4. [4]

    Makeup like a superstar: Deep localized makeup transfer network

    Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, and Xiaochun Cao. Makeup like a superstar: Deep localized makeup transfer network. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pages 2568–2575. AAAI Press, 2016

  5. [5]

    PairedCy- cleGAN: Asymmetric style transfer for applying and removing makeup

    Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. PairedCy- cleGAN: Asymmetric style transfer for applying and removing makeup. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 40–48, 2018

  6. [6]

    Multi- objective convolutional learning for facial labeling

    Sifei Liu, Jimei Yang, Chang Huang, and Ming-Hsuan Yang. Multi- objective convolutional learning for facial labeling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 3451–3459, 2015

  7. [7]

    Plataniotis, and Anastasios N

    Rastislav Lukac, Konstantinos N. Plataniotis, and Anastasios N. Venet- sanopoulos. Color image processing. Computer Vision and Image Understanding, 107(1-2):1–2, 2007

  8. [8]

    On the Separation of Lumi- nance from Colour in Images

    Alan Woodland and FrÃl’dÃl’ric Labrosse. On the Separation of Lumi- nance from Colour in Images. In Mike Chantler, editor, Vision, Video, and Graphics (2005). The Eurographics Association, 2005

  9. [9]

    Flash photography enhancement via intrinsic relighting

    Elmar Eisemann and Frédo Durand. Flash photography enhancement via intrinsic relighting. ACM Trans. Graph., 23(3):673–678, 2004

  10. [10]

    Enhancing pho- VOLUME 4, 2016 9 X

    Xiaopeng Zhang, Terence Sim, and Xiaoping Miao. Enhancing pho- VOLUME 4, 2016 9 X. Jin et al.: Preparation of Papers for IEEE ACCESS tographs with near infra-red images. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska,USA, 2008

  11. [11]

    Edge-preserving decompositions for multi-scale tone and detail manipu- lation

    Zeev Farbman, Raanan Fattal, Dani Lischinski, and Richard Szeliski. Edge-preserving decompositions for multi-scale tone and detail manipu- lation. ACM Trans. Graph., 27(3):67:1–67:10, 2008

  12. [12]

    Bilateral filtering for gray and color images

    Carlo Tomasi and Roberto Manduchi. Bilateral filtering for gray and color images. In ICCV, pages 839–846, 1998

  13. [13]

    A Neural Algorithm of Artistic Style

    Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style.CoRR, abs/1508.06576, 2015

  14. [14]

    Locating facial features with an extended active structure model

    Stephen Milborrow and Fred Nicolls. Locating facial features with an extended active structure model. In Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12- 18, 2008, Proceedings, Part IV, pages 504–513, 2008

  15. [15]

    facial illumination transfer through edge-preserving filters

    Xiaowu Chen, Mengmeng Chen, Xin Jin, and Qinping Zhao. facial illumination transfer through edge-preserving filters. In The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pages 281–287, 2011

  16. [16]

    Image Processing, 22(11):4249–4259, 2013

    Xiaowu Chen, Hongyu Wu, Xin Jin, and Qinping Zhao.facial illumination manipulation using a single reference image by adaptive layer decomposi- tion.IEEE Trans. Image Processing, 22(11):4249–4259, 2013

  17. [17]

    Learning artistic lighting template from portrait photographs

    Xin Jin, Mingtian Zhao, Xiaowu Chen, Qinping Zhao, and Song Chun Zhu. Learning artistic lighting template from portrait photographs. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings,Part IV, pages 101–114, 2010

  18. [18]

    Artistic illumina- tion transfer for portraits.Comput

    Xiaowu Chen, Xin Jin, Qinping Zhao, and Hongyu Wu. Artistic illumina- tion transfer for portraits.Comput. Graph. Forum, 31(4):1425–1434, 2012

  19. [19]

    IEEE Trans

    Xiaowu Chen, Xin Jin, Hongyu Wu, and Qinping Zhao.Learning templates for artistic portrait lighting analysis. IEEE Trans. Image Processing, 24(2):608–618, 2015

  20. [20]

    Single image based illumination estimation for lighting virtual object in real scene

    Xiaowu Chen, Ke Wang, and Xin Jin. Single image based illumination estimation for lighting virtual object in real scene. In 12th Interna- tional Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2011, Jinan, China, September 15-17, 2011,pages 450– 455, 2011

  21. [21]

    X. Jin, Y . Tian, N. Liu, C. Ye, J. Chi, X. Li, and G. Zhao. Object image relighting through patch match warping and color transfer. In 2016 International Conference on Virtual Reality and Visualization (ICVRV), pages 235–241, Sep. 2016

  22. [22]

    Lighting virtual objects in a single image via coarse scene understanding

    Xiaowu Chen, Xin Jin, and Ke Wang. Lighting virtual objects in a single image via coarse scene understanding. SCIENCE CHINA information Sciences, 57(9):1–14, 2014

  23. [23]

    Scene Relighting Using a Single Reference Image Through Material Constrained Layer Decomposition, pages 37–44

    Xin Jin, Yannan Li, Ningning Liu, Xiaodong Li, Quan Zhou, Yulu Tian, and Shiming Ge. Scene Relighting Using a Single Reference Image Through Material Constrained Layer Decomposition, pages 37–44. Arti- ficial Intelligence and Robotics, 01, 2018

  24. [24]

    Single Reference Image based Scene Relighting via Material Guided Filtering

    Xin Jin, Yannan Li, Ningning Liu, Xiaodong Li, Xianggang Jiang, Chaoen Xiao,and Shiming Ge. Single reference image based scene relighting via material guided filtering.CoRR, abs/1708.07066, 2017

  25. [25]

    Multi-scale deep context convolutional neural networks for semantic segmentation

    Quan Zhou, Wenbin Yang, Guangwei Gao, Weihua Ou, Huimin Lu, Jie Chen, and Longin Jan Latecki. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web, 22(2):555– 570, 2019

  26. [26]

    Multimedia Tools and Applications, Nov 2018

    Quan Zhou, Jie Cheng, Huimin Lu, Yawen Fan, Suofei Zhang, Xiaofu Wu, Baoyu Zheng, Weihua Ou, and Longin Jan Latecki.Learning adaptive contrast combinations for visual saliency detection. Multimedia Tools and Applications, Nov 2018

  27. [27]

    facial recognition via fast dense correspondence

    Quan Zhou, Cheng Zhang, Wenbin Yu, Yawen Fan, Hu Zhu, Xiaofu Wu, Weihua Ou,Wei-Ping Zhu, and Longin Jan Latecki. facial recognition via fast dense correspondence. Multimedia Tools Appl., 77(9):10501–10519, 2018

  28. [28]

    Multi- scale context for scene labeling via flexible segmentation graph

    Quan Zhou, Baoyu Zheng, Wei-Ping Zhu, and Longin Jan Latecki. Multi- scale context for scene labeling via flexible segmentation graph. Pattern Recognition, 59:312–324, 2016

  29. [29]

    Underwater image dehazing using joint trilateral filter.Computers & Electrical Engineering, 40(1):41–50, 2014

    Seiichi Serikawa and Huimin Lu. Underwater image dehazing using joint trilateral filter.Computers & Electrical Engineering, 40(1):41–50, 2014

  30. [30]

    Motor anomaly detection for unmanned aerial vehicles using reinforcement learning

    Huimin Lu, Yujie Li, Shenglin Mu, Dong Wang, Hyoungseop Kim, and Seiichi Serikawa. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet of Things Journal, 5(4):2315– 2322, 2018

  31. [31]

    Brain intelligence: Go beyond artificial intelligence

    Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, and Seiichi Serikawa. Brain intelligence: Go beyond artificial intelligence. MONET, 23(2):368– 375, 2018

  32. [32]

    CONet: A Cognitive Ocean Network

    Huimin Lu, Dong Wang, Yujie Li, Jianru Li, Xin Li, Hyoungseop Kim, Seiichi Serikawa, and Iztok Humar. Conet: A cognitive ocean network. CoRR, abs/1901.06253, 2019

  33. [33]

    Low illumination underwater light field images reconstruction using deep convolutional neural networks.Future Generation Comp

    Huimin Lu, Yujie Li, Tomoki Uemura, Hyoungseop Kim, and Seiichi Serikawa. Low illumination underwater light field images reconstruction using deep convolutional neural networks.Future Generation Comp. Syst., 82:142–148, 2018. 10 VOLUME 4, 2016