MirrorPPR: Exemplar-Based Portrait Photo Retouching
Pith reviewed 2026-06-30 07:48 UTC · model grok-4.3
The pith
MirrorPPR extracts subtle retouching operations from exemplar pairs and applies them to new portrait images via a diffusion transformer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MirrorPPR uses a Retouching Operation Extractor to capture subtle differences from exemplar pairs, injects the representations into a pre-trained DiT through a connector and LoRA modules, and relies on a data self-augmentation paradigm to produce aligned cross-identity pairs, supported by the MirrorPPR47M dataset, to achieve accurate transfer of delicate structural retouching operations.
What carries the argument
Retouching Operation Extractor that identifies subtle differences between the original and retouched images in an exemplar pair for transfer to new queries.
If this is right
- Structural edits that cannot be described in text become feasible through direct operation transfer from examples.
- Cross-identity alignment via self-augmentation allows training on large volumes of data without manual pairing.
- Curriculum progression from simulated to professional subsets stabilizes optimization for delicate modifications.
- Identity preservation improves because the method focuses on operation extraction rather than global image translation.
Where Pith is reading between the lines
- The extractor-plus-injection design could be tested on non-face domains that also need precise localized changes, such as product photography adjustments.
- Adding a consistency loss across multiple exemplars of the same person might further reduce any residual identity leakage.
- The 47-million-pair scale suggests the method could support few-shot adaptation to new retouching styles with minimal additional data.
Load-bearing premise
The Retouching Operation Extractor can accurately capture and represent extremely delicate and localized structural modifications from exemplar pairs, and the data self-augmentation paradigm produces strictly aligned retouching operations without misalignment across cross-identity pairs.
What would settle it
Create an exemplar pair showing one precise small change such as a 3 percent narrowing of the jawline, run the trained model on a held-out query face, and verify whether the output exhibits exactly that change with no additional alterations or identity drift.
read the original abstract
While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operation misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation. The project page is available at https://sjtu-deng-lab.github.io/MirrorPPR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MirrorPPR for exemplar-based portrait photo retouching: given an exemplar pair, the model extracts subtle retouching operations and applies them to a new query image. It proposes a Retouching Operation Extractor to encode differences from the pair, injects the representations into a pre-trained Diffusion Transformer via a connector and LoRA modules, and uses a data self-augmentation paradigm to create strictly aligned cross-identity training pairs. A new 47M-pair dataset (MirrorPPR47M) with simulated and professional subsets supports curriculum learning. Experiments claim significant outperformance over baselines in retouching quality and identity preservation.
Significance. If the central claims hold, the work fills a gap in fine-grained structural portrait editing where text prompts are insufficient, offering a new task formulation, a large-scale dataset, and a curriculum-learning pipeline that could support downstream applications in photo editing and generative media. The explicit construction of a 47M-pair dataset with progressive subsets is a concrete contribution that lowers the barrier for future exemplar-based editing research.
major comments (3)
- [§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.
- [§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.
- [§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.
minor comments (2)
- [Abstract] Abstract: states 'extensive experiments demonstrate...' but supplies no quantitative results, baseline names, or dataset split details; adding one or two key numbers would make the summary self-contained.
- [Dataset] Dataset description: the construction criteria separating simulated versus professional subsets and the exact curriculum schedule (epoch counts, loss weighting) are not specified; these details are needed for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of the task formulation, the MirrorPPR47M dataset, and the curriculum-learning pipeline. We address each major comment below and will revise the manuscript to provide the requested quantitative support.
read point-by-point responses
-
Referee: [§3] §3 (Retouching Operation Extractor): the claim that the extractor faithfully encodes extremely delicate localized structural modifications (e.g., micro-adjustments to eye shape or jawline) from exemplar pairs is load-bearing for the transfer step, yet no quantitative verification (sub-pixel error, structural similarity on localized regions, or ablation on extractor resolution) is supplied to confirm it resolves the differences that text-guided methods cannot.
Authors: We agree that quantitative verification of the Retouching Operation Extractor’s precision on subtle, localized changes is necessary to substantiate the claim. In the revised manuscript we will add sub-pixel error measurements, localized structural similarity scores on facial landmarks, and an ablation varying extractor resolution, all evaluated on held-out exemplar pairs. revision: yes
-
Referee: [§4] §4 (data self-augmentation paradigm): the assertion that the paradigm produces strictly aligned retouching operations across cross-identity pairs is central to avoiding identity drift during DiT+LoRA injection, but the manuscript provides no direct measurement of alignment precision (e.g., pixel-wise operation consistency or misalignment statistics) or ablation showing that misalignment artifacts are eliminated.
Authors: We acknowledge that direct quantitative evidence of alignment precision would strengthen confidence in the self-augmentation approach. The revision will include pixel-wise consistency metrics, misalignment statistics across the training pairs, and an ablation comparing results with and without the alignment step to demonstrate removal of drift artifacts. revision: yes
-
Referee: [§5] §5 (experiments): the headline claim of significant outperformance in both quality and identity preservation is unsupported by any reported metrics, baselines, or controls in the provided text; without these numbers the empirical validation of the extractor and augmentation assumptions cannot be assessed.
Authors: The full manuscript contains experimental comparisons, yet we accept that the quantitative results must be presented more explicitly and with clear baseline controls. The revised version will feature dedicated tables reporting standard quality and identity-preservation metrics against all listed baselines, together with statistical significance tests. revision: yes
Circularity Check
No significant circularity; method and claims rest on empirical evaluation of proposed components
full rationale
The paper introduces a new task (exemplar-based portrait retouching), a new framework (MirrorPPR with Retouching Operation Extractor + DiT+LoRA injection), a new data-augmentation paradigm, and a new dataset (MirrorPPR47M). These are presented as engineering contributions whose performance is measured by experiments against baselines. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The derivation chain consists of standard supervised training and transfer, with no self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
In: Proceedings of the 26th annual international conference on machine learning
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning. pp. 41–48 (2009)
2009
-
[3]
IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)
Bharati, A., Singh, R., Vatsa, M., Bowyer, K.W.: Detecting facial retouching using supervised deep learning. IEEE Transactions on Information Forensics and Security11(9), 1903–1913 (2016)
1903
-
[4]
In: 2017 IEEE international joint conference on biometrics (IJCB)
Bharati, A., Vatsa, M., Singh, R., Bowyer, K.W., Tong, X.: Demography-based facial retouching detection using subclass supervised sparse autoencoder. In: 2017 IEEE international joint conference on biometrics (IJCB). pp. 474–482. IEEE (2017)
2017
- [5]
-
[6]
In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18392–18402 (2023)
2023
-
[7]
In: CVPR 2011
Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: CVPR 2011. pp. 97–104. IEEE (2011)
2011
-
[8]
Cai, H., Wang, X., Bai, Y., Zhou, T., Xu, S., Hao, Y., Cui, Z., Yang, Y., Zhu, W., Chen, Y., Tang, X., Hu, Y., Li, Z.: Idglow: Dynamic identity modulation for multi-subject generation (2026),https://arxiv.org/abs/2603.00607
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
IEEE transactions on image processing27(4), 2049–2062 (2018)
Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE transactions on image processing27(4), 2049–2062 (2018)
2049
-
[10]
In: Proceedings of the IEEE/CVF international conference on computer vision
Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 22560–22570 (2023)
2023
-
[11]
Chen, L., Mao, Q., Gu, Y., Shou, M.Z.: Edit transfer: Learning image editing via vision in-context relations. arXiv preprint arXiv:2503.13327 (2025)
-
[12]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Chen, Y., Ge, Y., Tang, W., Li, Y., Ge, Y., Ding, M., Shan, Y., Liu, X.: Moto: Latent motion token as the bridging language for learning robot manipulation from videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19752–19763 (2025)
2025
- [13]
-
[14]
DCGM: ffhq-features-dataset: Gender, age, and emotion for flickr-faces-hq dataset (ffhq).https://github.com/DCGM/ ffhq-features-dataset(2019), accessed on June 24, 2026
2019
-
[15]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)
2019
-
[16]
Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026
DiffSynth-Studio: Qwen-image-edit-2511-icedit-lora. Hugging Face Model Repository (2025),https://huggingface.co/ DiffSynth-Studio/Qwen-Image-Edit-2511-ICEdit-LoRA, accessed on June 24, 2026
2025
-
[17]
arXiv preprint arXiv:2506.02528 (2025)
Gong, Y., Song, Y., Li, Y., Li, C., Zhang, Y.: Relationadapter: Learning and transferring visual relation with diffusion transformers. arXiv preprint arXiv:2506.02528 (2025)
-
[18]
ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)
Gu, Z., Yang, S., Liao, J., Huo, J., Gao, Y.: Analogist: Out-of-the-box visual in-context learning with image diffusion model. ACM Transactions on Graphics (TOG)43(4), 1–15 (2024)
2024
-
[19]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)
2022
-
[20]
Prompt-to-Prompt Image Editing with Cross Attention Control
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[21]
Advances in neural information processing systems33, 6840–6851 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)
2020
-
[22]
Iclr1(2), 3 (2022)
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)
2022
-
[23]
ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)
Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG)37(2), 1–17 (2018)
2018
-
[24]
Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Jocher, G., Qiu, J.: Ultralytics yolo11 (2024),https://github.com/ultralytics/ultralytics, accessed on June 24, 2026
2024
-
[26]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)
2019
-
[27]
IEEE transactions on Consumer Electronics43(1), 1–8 (1997)
Kim, Y.T.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics43(1), 1–8 (1997)
1997
-
[28]
In: Proceedings of the AAAI conference on artificial intelligence
Kosugi, S., Yamasaki, T.: Unpaired image enhancement featuring reinforcement-learning-controlled image editing software. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 11296–11303 (2020)
2020
-
[29]
Labs, B.F.: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025), accessed on June 24, 2026
2025
-
[30]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lai, B., Juefei-Xu, F., Liu, M., Dai, X., Mehta, N., Zhu, C., Huang, Z., Rehg, J.M., Lee, S., Zhang, N., et al.: Unleashing in-context learning of autoregressive models for few-shot image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18346–18357 (2025)
2025
-
[31]
arXiv preprint arXiv:2312.06738 (2023)
Li, S., Singh, H., Grover, A.: Instructany2pix: Flexible visual editing via multimodal instruction following. arXiv preprint arXiv:2312.06738 (2023)
-
[32]
In: Proceedings of the 26th ACM international conference on Multimedia
Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: Beautygan: Instance-level facial makeup trans- fer with deep generative adversarial network. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 645–653 (2018)
2018
-
[33]
arXiv preprint arXiv:2602.03210 (2026)
Li, Z., Duan, Z., Ye, J., Chen, C., Chen, D., Li, Y., Chen, Y.: Viral: Visual in-context reasoning via analogy in diffusion transformers. arXiv preprint arXiv:2602.03210 (2026)
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Liang, J., Zeng, H., Cui, M., Xie, X., Zhang, L.: Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 653–661 (2021)
2021
-
[35]
Visual Attribute Transfer through Deep Image Analogy
Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu, J., Ying, Q., Qian, Z., Li, S., Zhang, R., Liu, J., Zhang, X.: Mofrr: Mixture of diffusion models for face retouching restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12842–12851 (2025)
2025
-
[37]
Step1X-Edit: A Practical Framework for General Image Editing
Liu, S., Han, Y., Xing, P., Yin, F., Wang, R., Cheng, W., Liao, J., Wang, Y., Fu, H., Han, C., Li, G., Peng, Y., Sun, Q., Wu, J., Cai, Y., Ge, Z., Ming, R., Xia, L., Zeng, X., Zhu, Y., Jiao, B., Zhang, X., Yu, G., Jiang, D.: Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
arXiv preprint arXiv:2506.07992 (2025)
Lu, H., Chen, J., Yang, Z., Gnanha, A.T., Wang, F.L., Qing, L., Mao, X.: Pairedit: Learning semantic variations for exemplar-based image editing. arXiv preprint arXiv:2506.07992 (2025)
-
[39]
MediaPipe: A Framework for Building Perception Pipelines
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[40]
In: ACM SIGGRAPH 2008 papers, pp
Mantiuk, R., Daly, S., Kerofsky, L.: Display adaptive tone mapping. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)
2008
-
[41]
In: Proceedings of the AAAI conference on artificial intelligence
Medin, S.C., Egger, B., Cherian, A., Wang, Y., Tenenbaum, J.B., Liu, X., Marks, T.K.: Most-gan: 3d mor- phable stylegan for disentangled face image manipulation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 1962–1971 (2022)
1962
-
[42]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[43]
IEEE Transactions on Image processing17(10), 1783–1794 (2008)
Mukherjee, J., Mitra, S.K.: Enhancement of color images by scaling the dct coefficients. IEEE Transactions on Image processing17(10), 1783–1794 (2008)
2008
-
[44]
Advances in Neural Information Processing Systems36, 9598–9613 (2023)
Nguyen, T., Li, Y., Ojha, U., Lee, Y.J.: Visual instruction inversion: Image editing via image prompting. Advances in Neural Information Processing Systems36, 9598–9613 (2023)
2023
-
[45]
Transfer between Modalities with MetaQueries
Pan, X., Shukla, S.N., Singh, A., Zhao, Z., Mishra, S.K., Wang, J., Xu, Z., Chen, J., Li, K., Juefei-Xu, F., et al.: Transfer between modalities with metaqueries. arXiv preprint arXiv:2504.06256 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
Qwen Team: Qwen2.5 technical report (2025),https://arxiv.org/abs/2412.15115
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
IET Biometrics9(4), 154–164 (2020)
Rathgeb, C., Botaljov, A., Stockhardt, F., Isadskiy, S., Debiasi, L., Uhl, A., Busch, C.: Prnu-based detection of facial retouching. IET Biometrics9(4), 154–164 (2020)
2020
-
[48]
IEEE Access8, 106373–106385 (2020)
Rathgeb, C., Satnoianu, C.I., Haryanto, N.E., Bernardo, K., Busch, C.: Differential detection of facial retouching: A multi-biometric approach. IEEE Access8, 106373–106385 (2020)
2020
-
[49]
In: ACM SIGGRAPH 2006 Papers, pp
Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)
2006
-
[50]
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Seedream, T., Chen, Y., Gao, Y., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y., et al.: Seedream 4.0: Toward next-generation multimodal image generation. arXiv preprint arXiv:2509.20427 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
Shafaei, A., Little, J.J., Schmidt, M.: Autoretouch: Automatic professional face retouching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 990–998 (2021)
2021
-
[52]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sheynin, S., Polyak, A., Singer, U., Kirstain, Y., Zohar, A., Ashual, O., Parikh, D., Taigman, Y.: Emu edit: Precise image editing via recognition and generation tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8871–8879 (2024)
2024
-
[53]
In: International conference on machine learning
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilib- rium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)
2015
-
[54]
Advances in neural information processing systems32(2019)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)
2019
-
[55]
arXiv preprint arXiv:2411.03982 (2024)
Srivastava, A., Menta, T.R., Java, A., Jadhav, A., Singh, S., Jandial, S., Krishnamurthy, B.: Reedit: Multimodal exemplar-based image editing with diffusion models. arXiv preprint arXiv:2411.03982 (2024)
-
[56]
Gemini: A Family of Highly Capable Multimodal Models
Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[57]
LongCat-Image Technical Report
Team, M.L., Ma, H., Tan, H., Huang, J., Wu, J., He, J.Y., Gao, L., Xiao, S., Wei, X., Ma, X., Cai, X., Guan, Y., Hu, J.: Longcat-image technical report. arXiv preprint arXiv:2512.07584 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.P., Pérez, P., Zollhofer, M., Theobalt, C.: Stylerig: Rigging stylegan for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6142–6151 (2020)
2020
-
[59]
Advances in Neural Information Processing Systems36, 8542–8562 (2023)
Wang, Z., Jiang, Y., Lu, Y., He, P., Chen, W., Wang, Z., Zhou, M., et al.: In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems36, 8542–8562 (2023)
2023
-
[60]
IEEE transactions on image processing13(4), 600–612 (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to struc- tural similarity. IEEE transactions on image processing13(4), 600–612 (2004)
2004
-
[61]
Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., ming Yin, S., Bai, S., Xu, X., Chen, Y., Chen, Y., Tang, Z., Zhang, Z., Wang, Z., Yang, A., Yu, B., Cheng, C., Liu, D., Li, D., Zhang, H., Meng, H., Wei, H., Ni, J., Chen, K., Cao, K., Peng, L., Qu, L., Wu, M., Wang, P., Yu, S., Wen, T., Feng, W., Xu, X., Wang, Y., Zhang, Y., Zhu, Y., Wu, Y., Cai, Y., L...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
OmniGen2: Towards Instruction-Aligned Multimodal Generation
Wu, C., Zheng, P., Yan, R., Xiao, S., Luo, X., Wang, Y., Li, W., Jiang, X., Liu, Y., Zhou, J., Liu, Z., Xia, Z., Li, C., Deng, H., Wang, J., Luo, K., Zhang, B., Lian, D., Wang, X., Wang, Z., Huang, T., Liu, Z.: Omnigen2: Exploration to advanced multimodal generation. arXiv preprint arXiv:2506.18871 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[63]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Xie, Z., Geng, Z., Hu, J., Zhang, Z., Hu, H., Cao, Y.: Revealing the dark secrets of masked image modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14475–14485 (2023)
2023
-
[64]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Xu, P., Fan, Q., Kou, F., Qin, S., Gu, H., Zhao, R., Ling, C., Wang, B.: Textualize visual prompt for image editing via diffusion bridge. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21779–21787 (2025)
2025
-
[65]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X., Chen, D., Wen, F.: Paint by example: Exemplar- based image editing with diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18381–18391 (2023)
2023
-
[66]
Advances in Neural Information Processing Systems36, 48723–48743 (2023)
Yang, Y., Peng, H., Shen, Y., Yang, Y., Hu, H., Qiu, L., Koike, H., et al.: Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation. Advances in Neural Information Processing Systems36, 48723–48743 (2023)
2023
-
[67]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
2018
-
[68]
Zhao, R., Fan, Q., Kou, F., Qin, S., Gu, H., Wu, W., Xu, P., Zhu, M., Wang, N., Gao, X.: InstructBrush: Learning Attention-based Instruction Optimization for Image Editing (2024). https://doi.org/10.48550/arXiv.2403.18660 A Dataset Details In this section, we provide comprehensive details regarding the construction of theMirrorPPR47Mdataset. This includes...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.