MirrorPPR: Exemplar-Based Portrait Photo Retouching

Fengpei Yu; Jiachun Jin; Siqi Kou; Yitao Jian; Zheng Li; Zhihong Liu; Zhijie Deng

arxiv: 2606.29308 · v1 · pith:3EGJC7VXnew · submitted 2026-06-28 · 💻 cs.CV

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Zhihong Liu , Zheng Li , Jiachun Jin , Siqi Kou , Yitao Jian , Fengpei Yu , Zhijie Deng This is my paper

Pith reviewed 2026-06-30 07:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords exemplar-based editingportrait retouchingdiffusion transformerstructural image editingidentity preservationdata self-augmentationcurriculum learning

0 comments

The pith

MirrorPPR extracts subtle retouching operations from exemplar pairs and applies them to new portrait images via a diffusion transformer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up exemplar-based portrait photo retouching as a task where a model receives an original-retouched pair and must replicate the same structural changes on a fresh query image. Text prompts cannot specify the tiny localized adjustments to features and proportions that this task requires, so the approach focuses on learning the operations directly from examples. A Retouching Operation Extractor pulls the differences out of the pair, then a connector and LoRA modules feed those signals into a pre-trained Diffusion Transformer. An advanced self-augmentation technique builds strictly aligned training pairs across different identities, backed by the new MirrorPPR47M dataset of 47 million pairs split into simulated and professional subsets for staged curriculum learning. Experiments indicate the resulting model delivers higher retouching quality and stronger identity preservation than prior baselines.

Core claim

MirrorPPR uses a Retouching Operation Extractor to capture subtle differences from exemplar pairs, injects the representations into a pre-trained DiT through a connector and LoRA modules, and relies on a data self-augmentation paradigm to produce aligned cross-identity pairs, supported by the MirrorPPR47M dataset, to achieve accurate transfer of delicate structural retouching operations.

What carries the argument

Retouching Operation Extractor that identifies subtle differences between the original and retouched images in an exemplar pair for transfer to new queries.

If this is right

Structural edits that cannot be described in text become feasible through direct operation transfer from examples.
Cross-identity alignment via self-augmentation allows training on large volumes of data without manual pairing.
Curriculum progression from simulated to professional subsets stabilizes optimization for delicate modifications.
Identity preservation improves because the method focuses on operation extraction rather than global image translation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The extractor-plus-injection design could be tested on non-face domains that also need precise localized changes, such as product photography adjustments.
Adding a consistency loss across multiple exemplars of the same person might further reduce any residual identity leakage.
The 47-million-pair scale suggests the method could support few-shot adaptation to new retouching styles with minimal additional data.

Load-bearing premise

The Retouching Operation Extractor can accurately capture and represent extremely delicate and localized structural modifications from exemplar pairs, and the data self-augmentation paradigm produces strictly aligned retouching operations without misalignment across cross-identity pairs.

What would settle it

Create an exemplar pair showing one precise small change such as a 3 percent narrowing of the jawline, run the trained model on a held-out query face, and verify whether the output exhibits exactly that change with no additional alterations or identity drift.

read the original abstract

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operation misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation. The project page is available at https://sjtu-deng-lab.github.io/MirrorPPR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MirrorPPR defines a new task for exemplar-driven subtle portrait retouching and releases a large dataset, but the abstract supplies no metrics or verification that the extractor and self-augmentation actually handle the delicate changes claimed.

read the letter

The paper carves out a focused task: given an exemplar pair showing tiny structural edits like minor eye or jaw adjustments, transfer exactly those operations to a new query face. This sits apart from text-guided editing, which struggles with precise proportions. They introduce a Retouching Operation Extractor to pull the delta, inject it into a pre-trained DiT through a connector and LoRA, and add a self-augmentation step to generate aligned cross-identity pairs. They also ship MirrorPPR47M, a 47-million-pair dataset split into simulated and professional subsets for curriculum training.

The dataset and task definition are the clearest additions. A large, structured collection of retouched pairs addresses a real data gap for this kind of fine-grained work, and the curriculum split is a practical way to stage training. The extractor-plus-injection design is a straightforward way to condition on operations rather than raw appearance.

The main gaps sit in the unshown mechanics. The abstract asserts that the extractor captures extremely localized structural differences and that self-augmentation produces strictly aligned operations, yet it gives no quantitative checks on sub-pixel fidelity or misalignment rates. If either piece slips, identity drift or artifacts would follow, and the outperformance claim over baselines cannot be judged without the numbers, ablations, or controls. The stress-test concern about delicate changes and alignment is the load-bearing assumption here.

This work is aimed at researchers building portrait editing tools or diffusion pipelines that need operation-level control. A reader already working on exemplar or reference-based editing would get the most from the dataset and task framing. The paper deserves peer review because the task and data are new enough to warrant checking the experiments, even if the current description leaves the core claims unverified.

Referee Report

3 major / 2 minor

Summary. The paper introduces MirrorPPR for exemplar-based portrait photo retouching: given an exemplar pair, the model extracts subtle retouching operations and applies them to a new query image. It proposes a Retouching Operation Extractor to encode differences from the pair, injects the representations into a pre-trained Diffusion Transformer via a connector and LoRA modules, and uses a data self-augmentation paradigm to create strictly aligned cross-identity training pairs. A new 47M-pair dataset (MirrorPPR47M) with simulated and professional subsets supports curriculum learning. Experiments claim significant outperformance over baselines in retouching quality and identity preservation.

Significance. If the central claims hold, the work fills a gap in fine-grained structural portrait editing where text prompts are insufficient, offering a new task formulation, a large-scale dataset, and a curriculum-learning pipeline that could support downstream applications in photo editing and generative media. The explicit construction of a 47M-pair dataset with progressive subsets is a concrete contribution that lowers the barrier for future exemplar-based editing research.

major comments (3)

[§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.
[§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.
[§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.

minor comments (2)

[Abstract] Abstract: states 'extensive experiments demonstrate...' but supplies no quantitative results, baseline names, or dataset split details; adding one or two key numbers would make the summary self-contained.
[Dataset] Dataset description: the construction criteria separating simulated versus professional subsets and the exact curriculum schedule (epoch counts, loss weighting) are not specified; these details are needed for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of the task formulation, the MirrorPPR47M dataset, and the curriculum-learning pipeline. We address each major comment below and will revise the manuscript to provide the requested quantitative support.

read point-by-point responses

Referee: [§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.

Authors: We agree that quantitative verification of the Retouching Operation Extractor’s precision on subtle, localized changes is necessary to substantiate the claim. In the revised manuscript we will add sub-pixel error measurements, localized structural similarity scores on facial landmarks, and an ablation varying extractor resolution, all evaluated on held-out exemplar pairs. revision: yes
Referee: [§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.

Authors: We acknowledge that direct quantitative evidence of alignment precision would strengthen confidence in the self-augmentation approach. The revision will include pixel-wise consistency metrics, misalignment statistics across the training pairs, and an ablation comparing results with and without the alignment step to demonstrate removal of drift artifacts. revision: yes
Referee: [§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.

Authors: The full manuscript contains experimental comparisons, yet we accept that the quantitative results must be presented more explicitly and with clear baseline controls. The revised version will feature dedicated tables reporting standard quality and identity-preservation metrics against all listed baselines, together with statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method and claims rest on empirical evaluation of proposed components

full rationale

The paper introduces a new task (exemplar-based portrait retouching), a new framework (MirrorPPR with Retouching Operation Extractor + DiT+LoRA injection), a new data-augmentation paradigm, and a new dataset (MirrorPPR47M). These are presented as engineering contributions whose performance is measured by experiments against baselines. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The derivation chain consists of standard supervised training and transfer, with no self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full text would be required to audit any modeling assumptions or fitted components.

pith-pipeline@v0.9.1-grok · 5837 in / 1164 out tokens · 38277 ms · 2026-06-30T07:48:04.508112+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 24 canonical work pages · 15 internal anchors

[1]

Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

In: Proceedings of the 26th annual international conference on machine learning

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning. pp. 41–48 (2009)

2009
[3]

IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

Bharati, A., Singh, R., Vatsa, M., Bowyer, K.W.: Detecting facial retouching using supervised deep learning. IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

1903
[4]

In: 2017 IEEE international joint conference on biometrics (IJCB)

Bharati, A., Vatsa, M., Singh, R., Bowyer, K.W., Tong, X.: Demography-based facial retouching detection using subclass supervised sparse autoencoder. In: 2017 IEEE international joint conference on biometrics (IJCB). pp. 474–482. IEEE (2017)

2017
[5]

Brack, M., Friedrich, F., Kornmeier, K., Tsaban, L., Schramowski, P., Kersting, K., Passos, A.: Ledits++: Limitless image editing using text-to-image models (2024),https://arxiv.org/abs/2311.16711

work page arXiv 2024
[6]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18392–18402 (2023)

2023
[7]

In: CVPR 2011

Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: CVPR 2011. pp. 97–104. IEEE (2011)

2011
[8]

Cai, H., Wang, X., Bai, Y., Zhou, T., Xu, S., Hao, Y., Cui, Z., Yang, Y., Zhu, W., Chen, Y., Tang, X., Hu, Y., Li, Z.: Idglow: Dynamic identity modulation for multi-subject generation (2026),https://arxiv.org/abs/2603.00607

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

IEEE transactions on image processing27(4), 2049–2062 (2018)

Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE transactions on image processing27(4), 2049–2062 (2018)

2049
[10]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 22560–22570 (2023)

2023
[11]

arXiv:2503.13327 [cs.CV] https: //arxiv.org/abs/2503.13327 Zhu-Tian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu

Chen, L., Mao, Q., Gu, Y., Shou, M.Z.: Edit transfer: Learning image editing via vision in-context relations. arXiv preprint arXiv:2503.13327 (2025)

work page arXiv 2025
[12]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen, Y., Ge, Y., Tang, W., Li, Y., Ge, Y., Ding, M., Shan, Y., Liu, X.: Moto: Latent motion token as the bridging language for learning robot manipulation from videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19752–19763 (2025)

2025
[13]

Couairon, G., Verbeek, J., Schwenk, H., Cord, M.: Diffedit: Diffusion-based semantic image editing with mask guidance (2022),https://arxiv.org/abs/2210.11427

work page arXiv 2022
[14]

DCGM: ffhq-features-dataset: Gender, age, and emotion for flickr-faces-hq dataset (ffhq).https://github.com/DCGM/ ffhq-features-dataset(2019), accessed on June 24, 2026

2019
[15]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

2019
[16]

Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

DiffSynth-Studio: Qwen-image-edit-2511-icedit-lora. Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

2025
[17]

arXiv preprint arXiv:2506.02528 (2025)

Gong, Y., Song, Y., Li, Y., Li, C., Zhang, Y.: Relationadapter: Learning and transferring visual relation with diffusion transformers. arXiv preprint arXiv:2506.02528 (2025)

work page arXiv 2025
[18]

ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

Gu, Z., Yang, S., Liao, J., Huo, J., Gao, Y.: Analogist: Out-of-the-box visual in-context learning with image diffusion model. ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

2024
[19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

2022
[20]

Prompt-to-Prompt Image Editing with Cross Attention Control

Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[22]

Iclr1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

2022
[23]

ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

2018
[24]

GPT-4o System Card

Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Jocher, G., Qiu, J.: Ultralytics yolo11 (2024),https://github.com/ultralytics/ultralytics, accessed on June 24, 2026

2024
[26]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019
[27]

IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

Kim, Y.T.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

1997
[28]

In: Proceedings of the AAAI conference on artificial intelligence

Kosugi, S., Yamasaki, T.: Unpaired image enhancement featuring reinforcement-learning-controlled image editing software. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 11296–11303 (2020)

2020
[29]

Labs, B.F.: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025), accessed on June 24, 2026

2025
[30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lai, B., Juefei-Xu, F., Liu, M., Dai, X., Mehta, N., Zhu, C., Huang, Z., Rehg, J.M., Lee, S., Zhang, N., et al.: Unleashing in-context learning of autoregressive models for few-shot image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18346–18357 (2025)

2025
[31]

arXiv preprint arXiv:2312.06738 (2023)

Li, S., Singh, H., Grover, A.: Instructany2pix: Flexible visual editing via multimodal instruction following. arXiv preprint arXiv:2312.06738 (2023)

work page arXiv 2023
[32]

In: Proceedings of the 26th ACM international conference on Multimedia

Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: Beautygan: Instance-level facial makeup trans- fer with deep generative adversarial network. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 645–653 (2018)

2018
[33]

arXiv preprint arXiv:2602.03210 (2026)

Li, Z., Duan, Z., Ye, J., Chen, C., Chen, D., Li, Y., Chen, Y.: Viral: Visual in-context reasoning via analogy in diffusion transformers. arXiv preprint arXiv:2602.03210 (2026)

work page arXiv 2026
[34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liang, J., Zeng, H., Cui, M., Xie, X., Zhang, L.: Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 653–661 (2021)

2021
[35]

Visual Attribute Transfer through Deep Image Analogy

Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Liu, J., Ying, Q., Qian, Z., Li, S., Zhang, R., Liu, J., Zhang, X.: Mofrr: Mixture of diffusion models for face retouching restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12842–12851 (2025)

2025
[37]

Step1X-Edit: A Practical Framework for General Image Editing

Liu, S., Han, Y., Xing, P., Yin, F., Wang, R., Cheng, W., Liao, J., Wang, Y., Fu, H., Han, C., Li, G., Peng, Y., Sun, Q., Wu, J., Cai, Y., Ge, Z., Ming, R., Xia, L., Zeng, X., Zhu, Y., Jiao, B., Zhang, X., Yu, G., Jiang, D.: Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

arXiv preprint arXiv:2506.07992 (2025)

Lu, H., Chen, J., Yang, Z., Gnanha, A.T., Wang, F.L., Qing, L., Mao, X.: Pairedit: Learning semantic variations for exemplar-based image editing. arXiv preprint arXiv:2506.07992 (2025)

work page arXiv 2025
[39]

MediaPipe: A Framework for Building Perception Pipelines

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906
[40]

In: ACM SIGGRAPH 2008 papers, pp

Mantiuk, R., Daly, S., Kerofsky, L.: Display adaptive tone mapping. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)

2008
[41]

In: Proceedings of the AAAI conference on artificial intelligence

Medin, S.C., Egger, B., Cherian, A., Wang, Y., Tenenbaum, J.B., Liu, X., Marks, T.K.: Most-gan: 3d mor- phable stylegan for disentangled face image manipulation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 1962–1971 (2022)

1962
[42]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[43]

IEEE Transactions on Image processing17(10), 1783–1794 (2008)

Mukherjee, J., Mitra, S.K.: Enhancement of color images by scaling the dct coefficients. IEEE Transactions on Image processing17(10), 1783–1794 (2008)

2008
[44]

Advances in Neural Information Processing Systems36, 9598–9613 (2023)

Nguyen, T., Li, Y., Ojha, U., Lee, Y.J.: Visual instruction inversion: Image editing via image prompting. Advances in Neural Information Processing Systems36, 9598–9613 (2023)

2023
[45]

Transfer between Modalities with MetaQueries

Pan, X., Shukla, S.N., Singh, A., Zhao, Z., Mishra, S.K., Wang, J., Xu, Z., Chen, J., Li, K., Juefei-Xu, F., et al.: Transfer between modalities with metaqueries. arXiv preprint arXiv:2504.06256 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

Qwen Team: Qwen2.5 technical report (2025),https://arxiv.org/abs/2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

IET Biometrics9(4), 154–164 (2020)

Rathgeb, C., Botaljov, A., Stockhardt, F., Isadskiy, S., Debiasi, L., Uhl, A., Busch, C.: Prnu-based detection of facial retouching. IET Biometrics9(4), 154–164 (2020)

2020
[48]

IEEE Access8, 106373–106385 (2020)

Rathgeb, C., Satnoianu, C.I., Haryanto, N.E., Bernardo, K., Busch, C.: Differential detection of facial retouching: A multi-biometric approach. IEEE Access8, 106373–106385 (2020)

2020
[49]

In: ACM SIGGRAPH 2006 Papers, pp

Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)

2006
[50]

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Seedream, T., Chen, Y., Gao, Y., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y., et al.: Seedream 4.0: Toward next-generation multimodal image generation. arXiv preprint arXiv:2509.20427 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Shafaei, A., Little, J.J., Schmidt, M.: Autoretouch: Automatic professional face retouching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 990–998 (2021)

2021
[52]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Sheynin, S., Polyak, A., Singer, U., Kirstain, Y., Zohar, A., Ashual, O., Parikh, D., Taigman, Y.: Emu edit: Precise image editing via recognition and generation tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8871–8879 (2024)

2024
[53]

In: International conference on machine learning

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilib- rium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)

2015
[54]

Advances in neural information processing systems32(2019)

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)

2019
[55]

arXiv preprint arXiv:2411.03982 (2024)

Srivastava, A., Menta, T.R., Java, A., Jadhav, A., Singh, S., Jandial, S., Krishnamurthy, B.: Reedit: Multimodal exemplar-based image editing with diffusion models. arXiv preprint arXiv:2411.03982 (2024)

work page arXiv 2024
[56]

Gemini: A Family of Highly Capable Multimodal Models

Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[57]

LongCat-Image Technical Report

Team, M.L., Ma, H., Tan, H., Huang, J., Wu, J., He, J.Y., Gao, L., Xiao, S., Wei, X., Ma, X., Cai, X., Guan, Y., Hu, J.: Longcat-image technical report. arXiv preprint arXiv:2512.07584 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[58]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.P., Pérez, P., Zollhofer, M., Theobalt, C.: Stylerig: Rigging stylegan for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6142–6151 (2020)

2020
[59]

Advances in Neural Information Processing Systems36, 8542–8562 (2023)

Wang, Z., Jiang, Y., Lu, Y., He, P., Chen, W., Wang, Z., Zhou, M., et al.: In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems36, 8542–8562 (2023)

2023
[60]

IEEE transactions on image processing13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to struc- tural similarity. IEEE transactions on image processing13(4), 600–612 (2004)

2004
[61]

Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., ming Yin, S., Bai, S., Xu, X., Chen, Y., Chen, Y., Tang, Z., Zhang, Z., Wang, Z., Yang, A., Yu, B., Cheng, C., Liu, D., Li, D., Zhang, H., Meng, H., Wei, H., Ni, J., Chen, K., Cao, K., Peng, L., Qu, L., Wu, M., Wang, P., Yu, S., Wen, T., Feng, W., Xu, X., Wang, Y., Zhang, Y., Zhu, Y., Wu, Y., Cai, Y., L...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[62]

OmniGen2: Towards Instruction-Aligned Multimodal Generation

Wu, C., Zheng, P., Yan, R., Xiao, S., Luo, X., Wang, Y., Li, W., Jiang, X., Liu, Y., Zhou, J., Liu, Z., Xia, Z., Li, C., Deng, H., Wang, J., Luo, K., Zhang, B., Lian, D., Wang, X., Wang, Z., Huang, T., Liu, Z.: Omnigen2: Exploration to advanced multimodal generation. arXiv preprint arXiv:2506.18871 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Xie, Z., Geng, Z., Hu, J., Zhang, Z., Hu, H., Cao, Y.: Revealing the dark secrets of masked image modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14475–14485 (2023)

2023
[64]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Xu, P., Fan, Q., Kou, F., Qin, S., Gu, H., Zhao, R., Ling, C., Wang, B.: Textualize visual prompt for image editing via diffusion bridge. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21779–21787 (2025)

2025
[65]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X., Chen, D., Wen, F.: Paint by example: Exemplar- based image editing with diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18381–18391 (2023)

2023
[66]

Advances in Neural Information Processing Systems36, 48723–48743 (2023)

Yang, Y., Peng, H., Shen, Y., Yang, Y., Hu, H., Qiu, L., Koike, H., et al.: Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation. Advances in Neural Information Processing Systems36, 48723–48743 (2023)

2023
[67]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

2018
[68]

Self w/o Aug

Zhao, R., Fan, Q., Kou, F., Qin, S., Gu, H., Wu, W., Xu, P., Zhu, M., Wang, N., Gao, X.: InstructBrush: Learning Attention-based Instruction Optimization for Image Editing (2024). https://doi.org/10.48550/arXiv.2403.18660 A Dataset Details In this section, we provide comprehensive details regarding the construction of theMirrorPPR47Mdataset. This includes...

work page doi:10.48550/arxiv.2403.18660 2024

[1] [1]

Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

In: Proceedings of the 26th annual international conference on machine learning

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning. pp. 41–48 (2009)

2009

[3] [3]

IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

Bharati, A., Singh, R., Vatsa, M., Bowyer, K.W.: Detecting facial retouching using supervised deep learning. IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)

1903

[4] [4]

In: 2017 IEEE international joint conference on biometrics (IJCB)

Bharati, A., Vatsa, M., Singh, R., Bowyer, K.W., Tong, X.: Demography-based facial retouching detection using subclass supervised sparse autoencoder. In: 2017 IEEE international joint conference on biometrics (IJCB). pp. 474–482. IEEE (2017)

2017

[5] [5]

Brack, M., Friedrich, F., Kornmeier, K., Tsaban, L., Schramowski, P., Kersting, K., Passos, A.: Ledits++: Limitless image editing using text-to-image models (2024),https://arxiv.org/abs/2311.16711

work page arXiv 2024

[6] [6]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18392–18402 (2023)

2023

[7] [7]

In: CVPR 2011

Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: CVPR 2011. pp. 97–104. IEEE (2011)

2011

[8] [8]

Cai, H., Wang, X., Bai, Y., Zhou, T., Xu, S., Hao, Y., Cui, Z., Yang, Y., Zhu, W., Chen, Y., Tang, X., Hu, Y., Li, Z.: Idglow: Dynamic identity modulation for multi-subject generation (2026),https://arxiv.org/abs/2603.00607

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

IEEE transactions on image processing27(4), 2049–2062 (2018)

Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE transactions on image processing27(4), 2049–2062 (2018)

2049

[10] [10]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 22560–22570 (2023)

2023

[11] [11]

arXiv:2503.13327 [cs.CV] https: //arxiv.org/abs/2503.13327 Zhu-Tian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu

Chen, L., Mao, Q., Gu, Y., Shou, M.Z.: Edit transfer: Learning image editing via vision in-context relations. arXiv preprint arXiv:2503.13327 (2025)

work page arXiv 2025

[12] [12]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen, Y., Ge, Y., Tang, W., Li, Y., Ge, Y., Ding, M., Shan, Y., Liu, X.: Moto: Latent motion token as the bridging language for learning robot manipulation from videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19752–19763 (2025)

2025

[13] [13]

Couairon, G., Verbeek, J., Schwenk, H., Cord, M.: Diffedit: Diffusion-based semantic image editing with mask guidance (2022),https://arxiv.org/abs/2210.11427

work page arXiv 2022

[14] [14]

DCGM: ffhq-features-dataset: Gender, age, and emotion for flickr-faces-hq dataset (ffhq).https://github.com/DCGM/ ffhq-features-dataset(2019), accessed on June 24, 2026

2019

[15] [15]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

2019

[16] [16]

Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

DiffSynth-Studio: Qwen-image-edit-2511-icedit-lora. Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026

2025

[17] [17]

arXiv preprint arXiv:2506.02528 (2025)

Gong, Y., Song, Y., Li, Y., Li, C., Zhang, Y.: Relationadapter: Learning and transferring visual relation with diffusion transformers. arXiv preprint arXiv:2506.02528 (2025)

work page arXiv 2025

[18] [18]

ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

Gu, Z., Yang, S., Liao, J., Huo, J., Gao, Y.: Analogist: Out-of-the-box visual in-context learning with image diffusion model. ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)

2024

[19] [19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

2022

[20] [20]

Prompt-to-Prompt Image Editing with Cross Attention Control

Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020

[22] [22]

Iclr1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

2022

[23] [23]

ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)

2018

[24] [24]

GPT-4o System Card

Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Jocher, G., Qiu, J.: Ultralytics yolo11 (2024),https://github.com/ultralytics/ultralytics, accessed on June 24, 2026

2024

[26] [26]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019

[27] [27]

IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

Kim, Y.T.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics43(1), 1–8 (1997)

1997

[28] [28]

In: Proceedings of the AAAI conference on artificial intelligence

Kosugi, S., Yamasaki, T.: Unpaired image enhancement featuring reinforcement-learning-controlled image editing software. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 11296–11303 (2020)

2020

[29] [29]

Labs, B.F.: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025), accessed on June 24, 2026

2025

[30] [30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lai, B., Juefei-Xu, F., Liu, M., Dai, X., Mehta, N., Zhu, C., Huang, Z., Rehg, J.M., Lee, S., Zhang, N., et al.: Unleashing in-context learning of autoregressive models for few-shot image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18346–18357 (2025)

2025

[31] [31]

arXiv preprint arXiv:2312.06738 (2023)

Li, S., Singh, H., Grover, A.: Instructany2pix: Flexible visual editing via multimodal instruction following. arXiv preprint arXiv:2312.06738 (2023)

work page arXiv 2023

[32] [32]

In: Proceedings of the 26th ACM international conference on Multimedia

Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: Beautygan: Instance-level facial makeup trans- fer with deep generative adversarial network. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 645–653 (2018)

2018

[33] [33]

arXiv preprint arXiv:2602.03210 (2026)

Li, Z., Duan, Z., Ye, J., Chen, C., Chen, D., Li, Y., Chen, Y.: Viral: Visual in-context reasoning via analogy in diffusion transformers. arXiv preprint arXiv:2602.03210 (2026)

work page arXiv 2026

[34] [34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liang, J., Zeng, H., Cui, M., Xie, X., Zhang, L.: Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 653–661 (2021)

2021

[35] [35]

Visual Attribute Transfer through Deep Image Analogy

Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Liu, J., Ying, Q., Qian, Z., Li, S., Zhang, R., Liu, J., Zhang, X.: Mofrr: Mixture of diffusion models for face retouching restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12842–12851 (2025)

2025

[37] [37]

Step1X-Edit: A Practical Framework for General Image Editing

Liu, S., Han, Y., Xing, P., Yin, F., Wang, R., Cheng, W., Liao, J., Wang, Y., Fu, H., Han, C., Li, G., Peng, Y., Sun, Q., Wu, J., Cai, Y., Ge, Z., Ming, R., Xia, L., Zeng, X., Zhu, Y., Jiao, B., Zhang, X., Yu, G., Jiang, D.: Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

arXiv preprint arXiv:2506.07992 (2025)

Lu, H., Chen, J., Yang, Z., Gnanha, A.T., Wang, F.L., Qing, L., Mao, X.: Pairedit: Learning semantic variations for exemplar-based image editing. arXiv preprint arXiv:2506.07992 (2025)

work page arXiv 2025

[39] [39]

MediaPipe: A Framework for Building Perception Pipelines

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906

[40] [40]

In: ACM SIGGRAPH 2008 papers, pp

Mantiuk, R., Daly, S., Kerofsky, L.: Display adaptive tone mapping. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)

2008

[41] [41]

In: Proceedings of the AAAI conference on artificial intelligence

Medin, S.C., Egger, B., Cherian, A., Wang, Y., Tenenbaum, J.B., Liu, X., Marks, T.K.: Most-gan: 3d mor- phable stylegan for disentangled face image manipulation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 1962–1971 (2022)

1962

[42] [42]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[43] [43]

IEEE Transactions on Image processing17(10), 1783–1794 (2008)

Mukherjee, J., Mitra, S.K.: Enhancement of color images by scaling the dct coefficients. IEEE Transactions on Image processing17(10), 1783–1794 (2008)

2008

[44] [44]

Advances in Neural Information Processing Systems36, 9598–9613 (2023)

Nguyen, T., Li, Y., Ojha, U., Lee, Y.J.: Visual instruction inversion: Image editing via image prompting. Advances in Neural Information Processing Systems36, 9598–9613 (2023)

2023

[45] [45]

Transfer between Modalities with MetaQueries

Pan, X., Shukla, S.N., Singh, A., Zhao, Z., Mishra, S.K., Wang, J., Xu, Z., Chen, J., Li, K., Juefei-Xu, F., et al.: Transfer between modalities with metaqueries. arXiv preprint arXiv:2504.06256 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

Qwen Team: Qwen2.5 technical report (2025),https://arxiv.org/abs/2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

IET Biometrics9(4), 154–164 (2020)

Rathgeb, C., Botaljov, A., Stockhardt, F., Isadskiy, S., Debiasi, L., Uhl, A., Busch, C.: Prnu-based detection of facial retouching. IET Biometrics9(4), 154–164 (2020)

2020

[48] [48]

IEEE Access8, 106373–106385 (2020)

Rathgeb, C., Satnoianu, C.I., Haryanto, N.E., Bernardo, K., Busch, C.: Differential detection of facial retouching: A multi-biometric approach. IEEE Access8, 106373–106385 (2020)

2020

[49] [49]

In: ACM SIGGRAPH 2006 Papers, pp

Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)

2006

[50] [50]

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Seedream, T., Chen, Y., Gao, Y., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y., et al.: Seedream 4.0: Toward next-generation multimodal image generation. arXiv preprint arXiv:2509.20427 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[51] [51]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Shafaei, A., Little, J.J., Schmidt, M.: Autoretouch: Automatic professional face retouching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 990–998 (2021)

2021

[52] [52]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Sheynin, S., Polyak, A., Singer, U., Kirstain, Y., Zohar, A., Ashual, O., Parikh, D., Taigman, Y.: Emu edit: Precise image editing via recognition and generation tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8871–8879 (2024)

2024

[53] [53]

In: International conference on machine learning

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilib- rium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)

2015

[54] [54]

Advances in neural information processing systems32(2019)

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)

2019

[55] [55]

arXiv preprint arXiv:2411.03982 (2024)

Srivastava, A., Menta, T.R., Java, A., Jadhav, A., Singh, S., Jandial, S., Krishnamurthy, B.: Reedit: Multimodal exemplar-based image editing with diffusion models. arXiv preprint arXiv:2411.03982 (2024)

work page arXiv 2024

[56] [56]

Gemini: A Family of Highly Capable Multimodal Models

Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[57] [57]

LongCat-Image Technical Report

Team, M.L., Ma, H., Tan, H., Huang, J., Wu, J., He, J.Y., Gao, L., Xiao, S., Wei, X., Ma, X., Cai, X., Guan, Y., Hu, J.: Longcat-image technical report. arXiv preprint arXiv:2512.07584 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[58] [58]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.P., Pérez, P., Zollhofer, M., Theobalt, C.: Stylerig: Rigging stylegan for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6142–6151 (2020)

2020

[59] [59]

Advances in Neural Information Processing Systems36, 8542–8562 (2023)

Wang, Z., Jiang, Y., Lu, Y., He, P., Chen, W., Wang, Z., Zhou, M., et al.: In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems36, 8542–8562 (2023)

2023

[60] [60]

IEEE transactions on image processing13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to struc- tural similarity. IEEE transactions on image processing13(4), 600–612 (2004)

2004

[61] [61]

Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., ming Yin, S., Bai, S., Xu, X., Chen, Y., Chen, Y., Tang, Z., Zhang, Z., Wang, Z., Yang, A., Yu, B., Cheng, C., Liu, D., Li, D., Zhang, H., Meng, H., Wei, H., Ni, J., Chen, K., Cao, K., Peng, L., Qu, L., Wu, M., Wang, P., Yu, S., Wen, T., Feng, W., Xu, X., Wang, Y., Zhang, Y., Zhu, Y., Wu, Y., Cai, Y., L...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[62] [62]

OmniGen2: Towards Instruction-Aligned Multimodal Generation

Wu, C., Zheng, P., Yan, R., Xiao, S., Luo, X., Wang, Y., Li, W., Jiang, X., Liu, Y., Zhou, J., Liu, Z., Xia, Z., Li, C., Deng, H., Wang, J., Luo, K., Zhang, B., Lian, D., Wang, X., Wang, Z., Huang, T., Liu, Z.: Omnigen2: Exploration to advanced multimodal generation. arXiv preprint arXiv:2506.18871 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Xie, Z., Geng, Z., Hu, J., Zhang, Z., Hu, H., Cao, Y.: Revealing the dark secrets of masked image modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14475–14485 (2023)

2023

[64] [64]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Xu, P., Fan, Q., Kou, F., Qin, S., Gu, H., Zhao, R., Ling, C., Wang, B.: Textualize visual prompt for image editing via diffusion bridge. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21779–21787 (2025)

2025

[65] [65]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X., Chen, D., Wen, F.: Paint by example: Exemplar- based image editing with diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18381–18391 (2023)

2023

[66] [66]

Advances in Neural Information Processing Systems36, 48723–48743 (2023)

Yang, Y., Peng, H., Shen, Y., Yang, Y., Hu, H., Qiu, L., Koike, H., et al.: Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation. Advances in Neural Information Processing Systems36, 48723–48743 (2023)

2023

[67] [67]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

2018

[68] [68]

Self w/o Aug

Zhao, R., Fan, Q., Kou, F., Qin, S., Gu, H., Wu, W., Xu, P., Zhu, M., Wang, N., Gao, X.: InstructBrush: Learning Attention-based Instruction Optimization for Image Editing (2024). https://doi.org/10.48550/arXiv.2403.18660 A Dataset Details In this section, we provide comprehensive details regarding the construction of theMirrorPPR47Mdataset. This includes...

work page doi:10.48550/arxiv.2403.18660 2024