pith. sign in

arxiv: 2606.29308 · v1 · pith:3EGJC7VXnew · submitted 2026-06-28 · 💻 cs.CV

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Pith reviewed 2026-06-30 07:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords exemplar-based editingportrait retouchingdiffusion transformerstructural image editingidentity preservationdata self-augmentationcurriculum learning
0
0 comments X

The pith

MirrorPPR extracts subtle retouching operations from exemplar pairs and applies them to new portrait images via a diffusion transformer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up exemplar-based portrait photo retouching as a task where a model receives an original-retouched pair and must replicate the same structural changes on a fresh query image. Text prompts cannot specify the tiny localized adjustments to features and proportions that this task requires, so the approach focuses on learning the operations directly from examples. A Retouching Operation Extractor pulls the differences out of the pair, then a connector and LoRA modules feed those signals into a pre-trained Diffusion Transformer. An advanced self-augmentation technique builds strictly aligned training pairs across different identities, backed by the new MirrorPPR47M dataset of 47 million pairs split into simulated and professional subsets for staged curriculum learning. Experiments indicate the resulting model delivers higher retouching quality and stronger identity preservation than prior baselines.

Core claim

MirrorPPR uses a Retouching Operation Extractor to capture subtle differences from exemplar pairs, injects the representations into a pre-trained DiT through a connector and LoRA modules, and relies on a data self-augmentation paradigm to produce aligned cross-identity pairs, supported by the MirrorPPR47M dataset, to achieve accurate transfer of delicate structural retouching operations.

What carries the argument

Retouching Operation Extractor that identifies subtle differences between the original and retouched images in an exemplar pair for transfer to new queries.

If this is right

  • Structural edits that cannot be described in text become feasible through direct operation transfer from examples.
  • Cross-identity alignment via self-augmentation allows training on large volumes of data without manual pairing.
  • Curriculum progression from simulated to professional subsets stabilizes optimization for delicate modifications.
  • Identity preservation improves because the method focuses on operation extraction rather than global image translation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The extractor-plus-injection design could be tested on non-face domains that also need precise localized changes, such as product photography adjustments.
  • Adding a consistency loss across multiple exemplars of the same person might further reduce any residual identity leakage.
  • The 47-million-pair scale suggests the method could support few-shot adaptation to new retouching styles with minimal additional data.

Load-bearing premise

The Retouching Operation Extractor can accurately capture and represent extremely delicate and localized structural modifications from exemplar pairs, and the data self-augmentation paradigm produces strictly aligned retouching operations without misalignment across cross-identity pairs.

What would settle it

Create an exemplar pair showing one precise small change such as a 3 percent narrowing of the jawline, run the trained model on a held-out query face, and verify whether the output exhibits exactly that change with no additional alterations or identity drift.

read the original abstract

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operation misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation. The project page is available at https://sjtu-deng-lab.github.io/MirrorPPR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces MirrorPPR for exemplar-based portrait photo retouching: given an exemplar pair, the model extracts subtle retouching operations and applies them to a new query image. It proposes a Retouching Operation Extractor to encode differences from the pair, injects the representations into a pre-trained Diffusion Transformer via a connector and LoRA modules, and uses a data self-augmentation paradigm to create strictly aligned cross-identity training pairs. A new 47M-pair dataset (MirrorPPR47M) with simulated and professional subsets supports curriculum learning. Experiments claim significant outperformance over baselines in retouching quality and identity preservation.

Significance. If the central claims hold, the work fills a gap in fine-grained structural portrait editing where text prompts are insufficient, offering a new task formulation, a large-scale dataset, and a curriculum-learning pipeline that could support downstream applications in photo editing and generative media. The explicit construction of a 47M-pair dataset with progressive subsets is a concrete contribution that lowers the barrier for future exemplar-based editing research.

major comments (3)
  1. [§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.
  2. [§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.
  3. [§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.
minor comments (2)
  1. [Abstract] Abstract: states 'extensive experiments demonstrate...' but supplies no quantitative results, baseline names, or dataset split details; adding one or two key numbers would make the summary self-contained.
  2. [Dataset] Dataset description: the construction criteria separating simulated versus professional subsets and the exact curriculum schedule (epoch counts, loss weighting) are not specified; these details are needed for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of the task formulation, the MirrorPPR47M dataset, and the curriculum-learning pipeline. We address each major comment below and will revise the manuscript to provide the requested quantitative support.

read point-by-point responses
  1. Referee: [§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.

    Authors: We agree that quantitative verification of the Retouching Operation Extractor’s precision on subtle, localized changes is necessary to substantiate the claim. In the revised manuscript we will add sub-pixel error measurements, localized structural similarity scores on facial landmarks, and an ablation varying extractor resolution, all evaluated on held-out exemplar pairs. revision: yes

  2. Referee: [§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.

    Authors: We acknowledge that direct quantitative evidence of alignment precision would strengthen confidence in the self-augmentation approach. The revision will include pixel-wise consistency metrics, misalignment statistics across the training pairs, and an ablation comparing results with and without the alignment step to demonstrate removal of drift artifacts. revision: yes

  3. Referee: [§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.

    Authors: The full manuscript contains experimental comparisons, yet we accept that the quantitative results must be presented more explicitly and with clear baseline controls. The revised version will feature dedicated tables reporting standard quality and identity-preservation metrics against all listed baselines, together with statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method and claims rest on empirical evaluation of proposed components

full rationale

The paper introduces a new task (exemplar-based portrait retouching), a new framework (MirrorPPR with Retouching Operation Extractor + DiT+LoRA injection), a new data-augmentation paradigm, and a new dataset (MirrorPPR47M). These are presented as engineering contributions whose performance is measured by experiments against baselines. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The derivation chain consists of standard supervised training and transfer, with no self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full text would be required to audit any modeling assumptions or fitted components.

pith-pipeline@v0.9.1-grok · 5837 in / 1164 out tokens · 38277 ms · 2026-06-30T07:48:04.508112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 24 canonical work pages · 15 internal anchors

  1. [1]

    Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923

  2. [2]

    In: Proceedings of the 26th annual international conference on machine learning

    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning. pp. 41–48 (2009)

  3. [3]

    IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

    Bharati, A., Singh, R., Vatsa, M., Bowyer, K.W.: Detecting facial retouching using supervised deep learning. IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

  4. [4]

    In: 2017 IEEE international joint conference on biometrics (IJCB)

    Bharati, A., Vatsa, M., Singh, R., Bowyer, K.W., Tong, X.: Demography-based facial retouching detection using subclass supervised sparse autoencoder. In: 2017 IEEE international joint conference on biometrics (IJCB). pp. 474–482. IEEE (2017)

  5. [5]

    Brack, M., Friedrich, F., Kornmeier, K., Tsaban, L., Schramowski, P., Kersting, K., Passos, A.: Ledits++: Limitless image editing using text-to-image models (2024),https://arxiv.org/abs/2311.16711

  6. [6]

    In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18392–18402 (2023)

  7. [7]

    In: CVPR 2011

    Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: CVPR 2011. pp. 97–104. IEEE (2011)

  8. [8]

    Cai, H., Wang, X., Bai, Y., Zhou, T., Xu, S., Hao, Y., Cui, Z., Yang, Y., Zhu, W., Chen, Y., Tang, X., Hu, Y., Li, Z.: Idglow: Dynamic identity modulation for multi-subject generation (2026),https://arxiv.org/abs/2603.00607

  9. [9]

    IEEE transactions on image processing27(4), 2049–2062 (2018)

    Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE transactions on image processing27(4), 2049–2062 (2018)

  10. [10]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 22560–22570 (2023)

  11. [11]

    arXiv:2503.13327 [cs.CV] https: //arxiv.org/abs/2503.13327 Zhu-Tian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu

    Chen, L., Mao, Q., Gu, Y., Shou, M.Z.: Edit transfer: Learning image editing via vision in-context relations. arXiv preprint arXiv:2503.13327 (2025)

  12. [12]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Chen, Y., Ge, Y., Tang, W., Li, Y., Ge, Y., Ding, M., Shan, Y., Liu, X.: Moto: Latent motion token as the bridging language for learning robot manipulation from videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19752–19763 (2025)

  13. [13]

    Couairon, G., Verbeek, J., Schwenk, H., Cord, M.: Diffedit: Diffusion-based semantic image editing with mask guidance (2022),https://arxiv.org/abs/2210.11427

  14. [14]

    DCGM: ffhq-features-dataset: Gender, age, and emotion for flickr-faces-hq dataset (ffhq).https://github.com/DCGM/ ffhq-features-dataset(2019), accessed on June 24, 2026

  15. [15]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

  16. [16]

    Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

    DiffSynth-Studio: Qwen-image-edit-2511-icedit-lora. Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

  17. [17]

    arXiv preprint arXiv:2506.02528 (2025)

    Gong, Y., Song, Y., Li, Y., Li, C., Zhang, Y.: Relationadapter: Learning and transferring visual relation with diffusion transformers. arXiv preprint arXiv:2506.02528 (2025)

  18. [18]

    ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

    Gu, Z., Yang, S., Liao, J., Huo, J., Gao, Y.: Analogist: Out-of-the-box visual in-context learning with image diffusion model. ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

  20. [20]

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)

  21. [21]

    Advances in neural information processing systems33, 6840–6851 (2020)

    Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

  22. [22]

    Iclr1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

  23. [23]

    ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

    Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

  24. [24]

    GPT-4o System Card

    Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

  25. [25]

    Jocher, G., Qiu, J.: Ultralytics yolo11 (2024),https://github.com/ultralytics/ultralytics, accessed on June 24, 2026

  26. [26]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

  27. [27]

    IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

    Kim, Y.T.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

  28. [28]

    In: Proceedings of the AAAI conference on artificial intelligence

    Kosugi, S., Yamasaki, T.: Unpaired image enhancement featuring reinforcement-learning-controlled image editing software. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 11296–11303 (2020)

  29. [29]

    Labs, B.F.: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025), accessed on June 24, 2026

  30. [30]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Lai, B., Juefei-Xu, F., Liu, M., Dai, X., Mehta, N., Zhu, C., Huang, Z., Rehg, J.M., Lee, S., Zhang, N., et al.: Unleashing in-context learning of autoregressive models for few-shot image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18346–18357 (2025)

  31. [31]

    arXiv preprint arXiv:2312.06738 (2023)

    Li, S., Singh, H., Grover, A.: Instructany2pix: Flexible visual editing via multimodal instruction following. arXiv preprint arXiv:2312.06738 (2023)

  32. [32]

    In: Proceedings of the 26th ACM international conference on Multimedia

    Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: Beautygan: Instance-level facial makeup trans- fer with deep generative adversarial network. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 645–653 (2018)

  33. [33]

    arXiv preprint arXiv:2602.03210 (2026)

    Li, Z., Duan, Z., Ye, J., Chen, C., Chen, D., Li, Y., Chen, Y.: Viral: Visual in-context reasoning via analogy in diffusion transformers. arXiv preprint arXiv:2602.03210 (2026)

  34. [34]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liang, J., Zeng, H., Cui, M., Xie, X., Zhang, L.: Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 653–661 (2021)

  35. [35]

    Visual Attribute Transfer through Deep Image Analogy

    Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)

  36. [36]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Liu, J., Ying, Q., Qian, Z., Li, S., Zhang, R., Liu, J., Zhang, X.: Mofrr: Mixture of diffusion models for face retouching restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12842–12851 (2025)

  37. [37]

    Step1X-Edit: A Practical Framework for General Image Editing

    Liu, S., Han, Y., Xing, P., Yin, F., Wang, R., Cheng, W., Liao, J., Wang, Y., Fu, H., Han, C., Li, G., Peng, Y., Sun, Q., Wu, J., Cai, Y., Ge, Z., Ming, R., Xia, L., Zeng, X., Zhu, Y., Jiao, B., Zhang, X., Yu, G., Jiang, D.: Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761 (2025)

  38. [38]

    arXiv preprint arXiv:2506.07992 (2025)

    Lu, H., Chen, J., Yang, Z., Gnanha, A.T., Wang, F.L., Qing, L., Mao, X.: Pairedit: Learning semantic variations for exemplar-based image editing. arXiv preprint arXiv:2506.07992 (2025)

  39. [39]

    MediaPipe: A Framework for Building Perception Pipelines

    Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)

  40. [40]

    In: ACM SIGGRAPH 2008 papers, pp

    Mantiuk, R., Daly, S., Kerofsky, L.: Display adaptive tone mapping. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)

  41. [41]

    In: Proceedings of the AAAI conference on artificial intelligence

    Medin, S.C., Egger, B., Cherian, A., Wang, Y., Tenenbaum, J.B., Liu, X., Marks, T.K.: Most-gan: 3d mor- phable stylegan for disentangled face image manipulation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 1962–1971 (2022)

  42. [42]

    SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

    Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

  43. [43]

    IEEE Transactions on Image processing17(10), 1783–1794 (2008)

    Mukherjee, J., Mitra, S.K.: Enhancement of color images by scaling the dct coefficients. IEEE Transactions on Image processing17(10), 1783–1794 (2008)

  44. [44]

    Advances in Neural Information Processing Systems36, 9598–9613 (2023)

    Nguyen, T., Li, Y., Ojha, U., Lee, Y.J.: Visual instruction inversion: Image editing via image prompting. Advances in Neural Information Processing Systems36, 9598–9613 (2023)

  45. [45]

    Transfer between Modalities with MetaQueries

    Pan, X., Shukla, S.N., Singh, A., Zhao, Z., Mishra, S.K., Wang, J., Xu, Z., Chen, J., Li, K., Juefei-Xu, F., et al.: Transfer between modalities with metaqueries. arXiv preprint arXiv:2504.06256 (2025)

  46. [46]

    Qwen Team: Qwen2.5 technical report (2025),https://arxiv.org/abs/2412.15115

  47. [47]

    IET Biometrics9(4), 154–164 (2020)

    Rathgeb, C., Botaljov, A., Stockhardt, F., Isadskiy, S., Debiasi, L., Uhl, A., Busch, C.: Prnu-based detection of facial retouching. IET Biometrics9(4), 154–164 (2020)

  48. [48]

    IEEE Access8, 106373–106385 (2020)

    Rathgeb, C., Satnoianu, C.I., Haryanto, N.E., Bernardo, K., Busch, C.: Differential detection of facial retouching: A multi-biometric approach. IEEE Access8, 106373–106385 (2020)

  49. [49]

    In: ACM SIGGRAPH 2006 Papers, pp

    Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)

  50. [50]

    Seedream 4.0: Toward Next-generation Multimodal Image Generation

    Seedream, T., Chen, Y., Gao, Y., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y., et al.: Seedream 4.0: Toward next-generation multimodal image generation. arXiv preprint arXiv:2509.20427 (2025)

  51. [51]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Shafaei, A., Little, J.J., Schmidt, M.: Autoretouch: Automatic professional face retouching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 990–998 (2021)

  52. [52]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Sheynin, S., Polyak, A., Singer, U., Kirstain, Y., Zohar, A., Ashual, O., Parikh, D., Taigman, Y.: Emu edit: Precise image editing via recognition and generation tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8871–8879 (2024)

  53. [53]

    In: International conference on machine learning

    Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilib- rium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)

  54. [54]

    Advances in neural information processing systems32(2019)

    Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)

  55. [55]

    arXiv preprint arXiv:2411.03982 (2024)

    Srivastava, A., Menta, T.R., Java, A., Jadhav, A., Singh, S., Jandial, S., Krishnamurthy, B.: Reedit: Multimodal exemplar-based image editing with diffusion models. arXiv preprint arXiv:2411.03982 (2024)

  56. [56]

    Gemini: A Family of Highly Capable Multimodal Models

    Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  57. [57]

    LongCat-Image Technical Report

    Team, M.L., Ma, H., Tan, H., Huang, J., Wu, J., He, J.Y., Gao, L., Xiao, S., Wei, X., Ma, X., Cai, X., Guan, Y., Hu, J.: Longcat-image technical report. arXiv preprint arXiv:2512.07584 (2025)

  58. [58]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.P., Pérez, P., Zollhofer, M., Theobalt, C.: Stylerig: Rigging stylegan for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6142–6151 (2020)

  59. [59]

    Advances in Neural Information Processing Systems36, 8542–8562 (2023)

    Wang, Z., Jiang, Y., Lu, Y., He, P., Chen, W., Wang, Z., Zhou, M., et al.: In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems36, 8542–8562 (2023)

  60. [60]

    IEEE transactions on image processing13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to struc- tural similarity. IEEE transactions on image processing13(4), 600–612 (2004)

  61. [61]

    Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., ming Yin, S., Bai, S., Xu, X., Chen, Y., Chen, Y., Tang, Z., Zhang, Z., Wang, Z., Yang, A., Yu, B., Cheng, C., Liu, D., Li, D., Zhang, H., Meng, H., Wei, H., Ni, J., Chen, K., Cao, K., Peng, L., Qu, L., Wu, M., Wang, P., Yu, S., Wen, T., Feng, W., Xu, X., Wang, Y., Zhang, Y., Zhu, Y., Wu, Y., Cai, Y., L...

  62. [62]

    OmniGen2: Towards Instruction-Aligned Multimodal Generation

    Wu, C., Zheng, P., Yan, R., Xiao, S., Luo, X., Wang, Y., Li, W., Jiang, X., Liu, Y., Zhou, J., Liu, Z., Xia, Z., Li, C., Deng, H., Wang, J., Luo, K., Zhang, B., Lian, D., Wang, X., Wang, Z., Huang, T., Liu, Z.: Omnigen2: Exploration to advanced multimodal generation. arXiv preprint arXiv:2506.18871 (2025)

  63. [63]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Xie, Z., Geng, Z., Hu, J., Zhang, Z., Hu, H., Cao, Y.: Revealing the dark secrets of masked image modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14475–14485 (2023)

  64. [64]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Xu, P., Fan, Q., Kou, F., Qin, S., Gu, H., Zhao, R., Ling, C., Wang, B.: Textualize visual prompt for image editing via diffusion bridge. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21779–21787 (2025)

  65. [65]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X., Chen, D., Wen, F.: Paint by example: Exemplar- based image editing with diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18381–18391 (2023)

  66. [66]

    Advances in Neural Information Processing Systems36, 48723–48743 (2023)

    Yang, Y., Peng, H., Shen, Y., Yang, Y., Hu, H., Qiu, L., Koike, H., et al.: Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation. Advances in Neural Information Processing Systems36, 48723–48743 (2023)

  67. [67]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

  68. [68]

    Self w/o Aug

    Zhao, R., Fan, Q., Kou, F., Qin, S., Gu, H., Wu, W., Xu, P., Zhu, M., Wang, N., Gao, X.: InstructBrush: Learning Attention-based Instruction Optimization for Image Editing (2024). https://doi.org/10.48550/arXiv.2403.18660 A Dataset Details In this section, we provide comprehensive details regarding the construction of theMirrorPPR47Mdataset. This includes...