Recognition: unknown
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
Pith reviewed 2026-05-10 02:06 UTC · model grok-4.3
The pith
SmartPhotoCrafter automatically edits photos by reasoning about quality deficiencies and generating targeted enhancements without user instructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SmartPhotoCrafter formulates automatic photographic image editing as a tightly coupled reasoning-to-generation process in which the Image Critic module identifies aesthetic deficiencies and the Photographic Artist module performs targeted edits, trained end-to-end through foundation pretraining, reasoning-guided multi-edit supervision, and coordinated reinforcement learning to deliver photo-realistic results on restoration and retouching tasks while adhering to color- and tone-related semantics.
What carries the argument
The unified reasoning-to-generation pipeline that pairs an Image Critic for deficiency identification with a Photographic Artist for edit realization, jointly optimized via multi-stage training that includes reinforcement learning.
If this is right
- The method supports both image restoration and retouching while maintaining consistent adherence to color- and tone-related semantics.
- It achieves higher tonal sensitivity to retouching needs than existing generative models.
- Photo-realistic enhancements become possible without requiring users to supply explicit aesthetic instructions.
- A stage-specific dataset progressively builds reasoning capability, controllable generation, and cross-module collaboration.
Where Pith is reading between the lines
- The same critic-plus-artist structure with staged reinforcement learning could be adapted to other generative tasks such as video enhancement or style transfer where internal quality assessment is needed.
- If the critic's judgments prove stable across cultural or stylistic variations, the model might reduce reliance on subjective user prompts in consumer photo apps.
- Mobile-camera integration could allow real-time automatic corrections during capture by running the reasoning step on-device before final image output.
Load-bearing premise
The Image Critic can reliably detect aesthetic deficiencies and the training data plus reinforcement learning produce edits that match broad human aesthetic preferences without any explicit instructions.
What would settle it
A side-by-side human evaluation study on the same input photographs in which participants rate SmartPhotoCrafter outputs against those from instruction-based editing models or professional retouchers for realism, tonal accuracy, and overall appeal.
read the original abstract
Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SmartPhotoCrafter, a unified model for automatic photographic image editing formulated as a reasoning-to-generation process. It consists of an Image Critic module that performs image quality comprehension and identifies aesthetic deficiencies, followed by a Photographic Artist module that executes targeted edits for enhancement without requiring explicit human instructions. The approach uses a three-stage training pipeline—foundation pretraining for basic aesthetic understanding, adaptation via reasoning-guided multi-edit supervision, and coordinated reinforcement learning to jointly optimize reasoning and generation—along with stage-specific datasets. Experiments are claimed to show outperformance over existing generative models in photo-realistic enhancement, with strong adherence to color- and tone-related semantics for both restoration and retouching tasks.
Significance. If the empirical results hold after controlling for model scale and data, the work could meaningfully advance automatic, instruction-free image editing by making professional-level photographic adjustments accessible to non-experts. The tight coupling of comprehension and generation through RL coordination, combined with explicit emphasis on photo-realism and tonal sensitivity, offers a coherent pipeline that addresses a practical gap in consumer photography tools. The progressive dataset construction for building cross-module collaboration is a constructive element.
minor comments (2)
- [Abstract] The abstract states that experiments demonstrate outperformance and higher tonal sensitivity but provides no quantitative metrics, baseline models, or dataset sizes; adding these details (even at a high level) would strengthen the summary for readers.
- The description of the Image Critic's deficiency identification and the RL coordination objective remains high-level; a concrete example of a reasoning trace or loss formulation would clarify how the modules interact without explicit instructions.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of SmartPhotoCrafter, including the recognition of its unified reasoning-to-generation pipeline, multi-stage training, and potential impact on automatic photographic editing. We appreciate the minor_revision recommendation and will incorporate any minor clarifications or improvements in the revised manuscript.
Circularity Check
No significant circularity; empirical pipeline with no derivations
full rationale
The paper describes a procedural multi-stage training pipeline (foundation pretraining, reasoning-guided adaptation, and RL coordination) for an image editing model consisting of an Image Critic and Photographic Artist. All central claims of outperformance and tonal sensitivity are presented as empirical experimental results rather than mathematical derivations or first-principles predictions. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The method is self-contained as a descriptive architecture whose validity rests on external benchmarks and datasets, not on internal reductions to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report. arXiv preprint arXiv:2502.13923 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18392–18402 (2023) 19
2023
-
[3]
In: The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition (2011)
Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input / output image pairs. In: The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition (2011)
2011
-
[4]
In: CVPR 2011
Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: CVPR 2011. pp. 97–104. IEEE (2011)
2011
-
[5]
arXiv preprint arXiv:2506.05384 (2025)
Cai, Z., Zhang, J., Yuan, X., Jiang, P.T., Chen, W., Tang, B., Yao, L., Wang, Q., Chen, J., Li, B.: Q-ponder: A unified training pipeline for reasoning-based visual quality assessment. arXiv preprint arXiv:2506.05384 (2025)
-
[6]
In: Proceedings of the IEEE/CVF international conference on computer vision
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021)
2021
-
[7]
arXiv preprint arXiv:2511.12998 (2025)
Chang, Z., Duan, Z.P., Zhang, J., Guo, C.L., Liu, S., Chun, H., Park, H., Liu, Z., Li, C.: Pertouch: Vlm-driven agent for personalized and semantic image retouching. arXiv preprint arXiv:2511.12998 (2025)
-
[8]
arXiv preprint arXiv:2505.23130(2025)
Chen, H., Tao, K., Wang, Y., Wang, X., Zhu, L., Gu, J.: Photoartagent: Intelligent photo retouching with language model-based artist agents. arXiv preprint arXiv:2505.23130 (2025)
-
[9]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Chen, Y.S., Wang, Y.C., Kao, M.H., Chuang, Y.Y.: Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
2018
-
[10]
In: 13th International Conference on Learning Representations, ICLR 2025
Cui, Y., Zamir, S., Khan, S., Knoll, A., Shah, M., Khan, F.: Adair: Adaptive all-in-one image restoration via frequency mining and modulation. In: 13th International Conference on Learning Representations, ICLR 2025. pp. 57335–57356. 13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations,...
2025
-
[11]
In: Pro- ceedings of the 26th ACM International Conference on Multimedia
Deng, Y., Loy, C.C., Tang, X.: Aesthetic-driven image enhancement by adversarial learning. In: Pro- ceedings of the 26th ACM International Conference on Multimedia. p. 870–878. MM ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3240508.3240531, https://doi.org/10.1145/3240508.3240531
-
[12]
In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id= Z9FjSaBuYt
Duan, C., Fang, R., Wang, Y., Wang, K., Huang, L., Zeng, X., Li, H., Liu, X.: Got-r1: Unleashing reasoning capability of autoregressive visual generation with reinforcement learning. In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id= Z9FjSaBuYt
2026
-
[13]
ACM Transactions on Graphics (TOG)44(4), 1–12 (2025)
Dutt, N.S., Ceylan, D., Mitra, N.J.: Monetgpt: Solving puzzles enhances mllms’ image retouching skills. ACM Transactions on Graphics (TOG)44(4), 1–12 (2025)
2025
-
[14]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Fang, Y., Zhu, H., Zeng, Y., Ma, K., Wang, Z.: Perceptual quality assessment of smartphone photography. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3677–3686 (2020)
2020
-
[15]
In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers
Fortes, A., Wei, T., Zhou, S., Pan, X.: Bokeh diffusion: Defocus blur control in text-to-image diffusion models. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–11 (2025)
2025
-
[16]
Gharbi, M., Chen, J., Barron, J.T., Hasinoff, S.W., Durand, F.: Deep bilateral learning for real-time image enhancement. ACM Trans. Graph.36(4) (Jul 2017). https://doi.org/10.1145/3072959.3073592, https://doi.org/10.1145/3072959.3073592
-
[17]
In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=9RFocgIccP
Gu, X., Li, M., Zhang, L., Chen, F., Wen, L., Luo, T., Zhu, S.: Multi-reward as condition for instruction- based image editing. In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=9RFocgIccP
2025
-
[18]
Available: http://dx.doi.org/10.1038/s41586-025-09422-z
Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z.F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., Xue, B., Wang, B., Wu, B., 20 Feng, B., Lu, C., Zhao, C., Deng, C., Ruan, C., Dai, D., Chen, D., Ji, D., Li, E., Lin, F., Dai, F., Luo, F., Hao, G., Chen, G., Li, G., Zhang, H., Xu,...
-
[19]
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Hessel, J., Holtzman, A., Forbes, M., Bras, R.L., Choi, Y.: Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021)
work page internal anchor Pith review arXiv 2021
-
[20]
Advances in neural information processing systems30 (2017)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30 (2017)
2017
-
[21]
In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020),https://proceedings.neurips.cc/paper_files/paper/ 2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
2020
-
[22]
Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 2010 20th international conference on pattern recognition. pp. 2366–2369. IEEE (2010)
2010
-
[23]
IEEE Transactions on Image Processing29, 4041–4056 (2020)
Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing29, 4041–4056 (2020)
2020
-
[24]
In: Proceedings of the IEEE/CVF international conference on computer vision
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021)
2021
-
[25]
Labs, B.F.: Flux.https://github.com/black-forest-labs/flux(2024)
2024
-
[26]
Labs, B.F.: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025)
2025
-
[27]
In: Proceedings of the IEEE/CVF international conference on computer vision
Li, H., Chen, X., Dong, J., Tang, J., Pan, J.: Foundir: Unleashing million-scale training data to advance foundation models for image restoration. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12626–12636 (2025)
2025
- [28]
-
[29]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026),https://openreview.net/forum?id=Bds54EfR9x
Li, W., Zhang, X., Zhao, S., ZHANG, Y., Li, J., zhang, L., Zhang, J.: Q-insight: Understanding image quality via visual reinforcement learning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026),https://openreview.net/forum?id=Bds54EfR9x
2026
-
[30]
In: IJCAI
Li, Z., Chen, X., Wang, S., Pun, C.M.: A large-scale film style dataset for learning multi-frequency driven film enhancement. In: IJCAI. vol. 2023, pp. 1160–1168 (2023) 21
2023
-
[31]
Li, Z., Liu, Z., Zhang, Q., Lin, B., Yuan, S., Yan, Z., Ye, Y., Yu, W., Niu, Y., Yuan, L.: Uniworld- v2: Reinforce image editing with diffusion negative-aware finetuning and mllm implicit feedback. arXiv preprint arXiv:2510.16888 (2025)
-
[32]
In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX)
Lin, H., Hosu, V., Saupe, D.: Kadid-10k: A large-scale artificially distorted iqa database. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). pp. 1–3. IEEE (2019)
2019
-
[33]
Jarvisart: Liberating human artistic creativity via an intelligent photo retouching agent
Lin, Y., Lin, Z., Lin, K., Bai, J., Pan, P., Li, C., Chen, H., Wang, Z., Ding, X., Li, W., et al.: Jarvisart: Liberating human artistic creativity via an intelligent photo retouching agent. arXiv preprint arXiv:2506.17612 (2025)
-
[34]
Lin, Y., Wang, L., Lin, K., Lin, Z., Gong, K., Li, W., Lin, B., Li, Z., Zhang, S., Peng, Y., et al.: Jarvisevo: Towards a self-evolving photo editing agent with synergistic editor-evaluator optimization. arXiv preprint arXiv:2511.23002 (2025)
-
[35]
Step1X-Edit: A Practical Framework for General Image Editing
Liu, S., Han, Y., Xing, P., Yin, F., Wang, R., Cheng, W., Liao, J., Wang, Y., Fu, H., Han, C., et al.: Step1x-edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761 (2025)
work page internal anchor Pith review arXiv 2025
-
[36]
IEEE Transactions on image processing21(12), 4695–4708 (2012)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing21(12), 4695–4708 (2012)
2012
-
[37]
completely blind
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal processing letters20(3), 209–212 (2012)
2012
-
[38]
In: 2012 IEEE conference on computer vision and pattern recognition
Murray, N., Marchesotti, L., Perronnin, F.: Ava: A large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition. pp. 2408–2415. IEEE (2012)
2012
-
[39]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
2017
-
[40]
In: The Twelfth International Conference on Learning Representations (2024),https://openreview.net/forum?id=di52zR8xgf
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: Improving latent diffusion models for high-resolution image synthesis. In: The Twelfth International Conference on Learning Representations (2024),https://openreview.net/forum?id=di52zR8xgf
2024
-
[41]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)
Qin, X., Wang, Z., Li, F., Chen, H., Pei, R., Li, W., Cao, X.: Camedit: Continuous camera parameter control for photorealistic image editing. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)
2025
-
[42]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
Qu, L., Tian, J., He, S., Tang, Y., Lau, R.W.H.: Deshadownet: A multi-context embedding deep network for shadow removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
2017
-
[43]
In: Proceedings of 3rd IEEE international conference on image processing
Rahman, Z.u., Jobson, D.J., Woodell, G.A.: Multi-scale retinex for color image enhancement. In: Proceedings of 3rd IEEE international conference on image processing. vol. 3, pp. 1003–1006. IEEE (1996)
1996
-
[44]
In: European Conference on Computer Vision
Rim, J., Lee, H., Won, J., Cho, S.: Real-world blur dataset for learning and benchmarking deblurring algorithms. In: European Conference on Computer Vision. pp. 184–201. Springer (2020)
2020
-
[45]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10684–10695 (June 2022)
2022
-
[46]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Seizinger, T., Vasluianu, F.A., Conde, M.V., Wu, Z., Timofte, R.: Bokehlicious: Photorealistic bokeh rendering with controllable apertures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8908–8917 (2025)
2025
-
[47]
IEEE Transactions on image processing20(5), 1211–1220 (2010) 22
Sen, D., Pal, S.K.: Automatic exact histogram specification for contrast enhancement and visual system based quantitative evaluation. IEEE Transactions on image processing20(5), 1211–1220 (2010) 22
2010
-
[48]
Clip-fields: Weakly supervised semantic fields for robotic memory
Shafiullah, N.M.M., Paxton, C., Pinto, L., Chintala, S., Szlam, A.: Clip-fields: Weakly supervised semantic fields for robotic memory. arXiv preprint arXiv:2210.05663 (2022)
-
[49]
In: International Conference on Learning Representations (2021),https://openreview.net/forum?id=St1giarCHLP
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2021),https://openreview.net/forum?id=St1giarCHLP
2021
-
[50]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Su, S., Yan, Q., Zhu, Y., Zhang, C., Ge, X., Sun, J., Zhang, Y.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3667–3676 (2020)
2020
-
[51]
doi:10.1109/TIP.2018.2831899 , shorttitle =
Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Transactions on Image Processing27(8), 3998–4011 (2018). https://doi.org/10.1109/TIP.2018.2831899
-
[52]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Wang, J., Li, X., Yang, J.: Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
2018
-
[53]
IEEE transactions on image processing13(4), 600–612 (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing13(4), 600–612 (2004)
2004
-
[54]
Deep Retinex Decomposition for Low-Light Enhancement
Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018)
work page Pith review arXiv 2018
-
[55]
Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., Yin, S.m., Bai, S., Xu, X., Chen, Y., et al.: Qwen-image technical report. arXiv preprint arXiv:2508.02324 (2025)
work page internal anchor Pith review arXiv 2025
-
[56]
OmniGen2: Towards Instruction-Aligned Multimodal Generation
Wu, C., Zheng, P., Yan, R., Xiao, S., Luo, X., Wang, Y., Li, W., Jiang, X., Liu, Y., Zhou, J., et al.: Omnigen2: Exploration to advanced multimodal generation. arXiv preprint arXiv:2506.18871 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Wu, H., Zhang, Z., Zhang, E., Chen, C., Liao, L., Wang, A., Xu, K., Li, C., Hou, J., Zhai, G., Xue, G., Sun, W., Yan, Q., Lin, W.: Q-instruct: Improving low-level visual abilities for multi-modality foundation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 25490–25500 (June 2024)
2024
-
[58]
In: Proceedings of the 41st International Conference on Machine Learning
Wu, H., Zhang, Z., Zhang, W., Chen, C., Liao, L., Li, C., Gao, Y., Wang, A., Zhang, E., Sun, W., et al.: Q-align: teaching lmms for visual scoring via discrete text-defined levels. In: Proceedings of the 41st International Conference on Machine Learning. pp. 54015–54029 (2024)
2024
-
[59]
arXiv preprint arXiv:2602.17558 (2026)
Wu, Q., Shi, J., Jenni, S., Kafle, K., Wang, T., Chang, S., Zhao, H.: Retouchiq: Mllm agents for instruction-based image retouching with generalist reward. arXiv preprint arXiv:2602.17558 (2026)
-
[60]
Wu, T., Zou, J., Liang, J., Zhang, L., Ma, K.: Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank. arXiv preprint arXiv:2505.14460 (2025)
-
[62]
Yan, Z., Zhang, H., Wang, B., Paris, S., Yu, Y.: Automatic photo adjustment using deep neural networks. ACM Trans. Graph.35(2) (Feb 2016). https://doi.org/10.1145/2790296,https://doi.org/10.1145/ 2790296
- [63]
-
[64]
IEEE Transactions on Image Processing30, 2072–2086 (2021)
Yang, W., Wang, W., Huang, H., Wang, S., Liu, J.: Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing30, 2072–2086 (2021)
2072
- [65]
-
[66]
arXiv preprint arXiv:2602.09084 (2026)
Ye, R., Zhang, J., Liu, Z., Zhu, Z., Yang, S., Li, L., Fu, T., Dernoncourt, F., Zhao, Y., Zhu, J., et al.: Agent banana: High-fidelity image editing with agentic thinking and tooling. arXiv preprint arXiv:2602.09084 (2026)
- [67]
- [68]
-
[69]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
You, Z., Cai, X., Gu, J., Xue, T., Dong, C.: Teaching large language models to regress accurate image quality scores using score distribution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 14483–14494 (2025)
2025
-
[70]
IEEE Transactions on Multimedia25, 5589–5600 (2022)
Yue, H., Cheng, Y., Mao, Y., Cao, C., Yang, J.: Recaptured screen image demoiréing in raw domain. IEEE Transactions on Multimedia25, 5589–5600 (2022)
2022
-
[71]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zamfir, E., Wu, Z., Mehta, N., Tan, Y., Paudel, D.P., Zhang, Y., Timofte, R.: Complexity experts are task-discriminative learners for any image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12753–12763 (June 2025)
2025
-
[72]
IEEE transactions on Image Processing20(8), 2378–2386 (2011)
Zhang, L., Zhang, L., Mou, X., Zhang, D.: Fsim: A feature similarity index for image quality assessment. IEEE transactions on Image Processing20(8), 2378–2386 (2011)
2011
-
[73]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang, L., He, Y., Zhang, Q., Liu, Z., Zhang, X., Xiao, C.: Document image shadow removal guided by color-aware background. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1818–1827 (June 2023)
2023
-
[75]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
2018
-
[76]
In: International Conference on Learning Representations (ICLR) (2026)
Zheng, K., Chen, H., Ye, H., Wang, H., Zhang, Q., Jiang, K., Su, H., Ermon, S., Zhu, J., Liu, M.Y.: Diffusionnft: Online diffusion reinforcement with forward process. In: International Conference on Learning Representations (ICLR) (2026)
2026
-
[77]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Zhu, C., Fan, Q., Zhang, Q., Chen, J., Zhang, H., Xu, C., Shi, B.: Bokehdiff: Neural lens blur with one-step diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9508–9518 (2025)
2025
-
[78]
arXiv preprint arXiv:2507.07105 (2025)
Zuo, Y., Zheng, Q., Wu, M., Jiang, X., Li, R., Wang, J., Zhang, Y., Mai, G., Wang, L.V., Zou, J., et al.: 4kagent: agentic any image to 4k super-resolution. arXiv preprint arXiv:2507.07105 (2025) 24 A More Qualitative Results We present additional qualitative examples of automatic photographic enhancement. As shown in Figure 6, SmartPhotoCrafter produces ...
-
[79]
However, there is noticeable low exposure, particularly in the darker areas surrounding the dog
Image Quality/Degradations Analysis: The image exhibits no significant motion blur, defocus, or distortion. However, there is noticeable low exposure, particularly in the darker areas surrounding the dog. This creates a lack of detail in the background and could be considered a form of visual interference due to shadow occlusion. There is no apparent digi...
-
[80]
The white balance seems accurate, as the dog’s fur tones look true-to-life
Color Performance&Lighting Analysis: The saturation appears moderate, with natural-looking colors. The white balance seems accurate, as the dog’s fur tones look true-to-life. However, the contrast is somewhat subdued due to the low exposure, leading to less distinction between the subject and its surroundings. The lighting is focused on the dog, creating ...
-
[81]
The focus is sharp on the dog, drawing attention directly to it
Composition&Layout Analysis: The main subject, the dog, is clearly defined and positioned slightly off-center, adhering to the rule of thirds. The focus is sharp on the dog, drawing attention directly to it. The camera viewpoint is at eye level, providing a natural perspective. The background blur is effective in isolating the subject, though it might be ...
-
[82]
The creativity lies in the use of lighting and shadow to evoke emotion
Aesthetic Impression Analysis: The tone style is somewhat muted due to the low exposure, giving the image a contemplative and serene feel. The creativity lies in the use of lighting and shadow to evoke emotion. The emotional expression is calm and introspective, with the dog appearing relaxed yet alert. The semantic richness is moderate, as the image tell...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.