When Recovery Matters: The Blind Spot of Surrogate Privacy in MLLM Editing
Pith reviewed 2026-06-27 22:07 UTC · model grok-4.3
The pith
Surrogate privacy methods for MLLM image editing produce edited surrogates instead of recovered source images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that surrogate-based privacy protection in MLLM editing has neglected local recovery, and that this can be remedied by a dedicated benchmark SPPE together with ERMA, which predicts surrogate editability via instruction-aware multimodal relation modeling, and C2E-S2SER, which performs cycle-consistent recovery by treating the surrogate editing pair as visual edit evidence and the source image as a source-preserving anchor; experiments on SPPE and InstructPix2Pix show ERMA lifting SRCC by 13.9 percent and PLCC by 12.3 percent over best baselines, while C2E-S2SER beats SOER on all eight source-integrity and edit-consistency metrics.
What carries the argument
SPPE benchmark defining editability assessment and surrogate-to-source edit recovery tasks; ERMA for instruction-aware multimodal relation modeling; C2E-S2SER for cycle-consistent recovery that anchors on the source image while using the surrogate pair as edit evidence.
If this is right
- Editability can be estimated before any cloud interaction, avoiding unnecessary transmission of private images that cannot be edited consistently.
- Edited surrogates can be mapped back to private sources such that the edit effect is retained and source content is not altered beyond the intended change.
- The two-task split allows separate optimization of prediction accuracy and recovery fidelity rather than treating privacy protection as a single end-to-end process.
- Consistent gains on SRCC, PLCC and the eight integrity-consistency metrics indicate that surrogate pairs carry transferable edit information when modeled explicitly.
Where Pith is reading between the lines
- If recovery is omitted, users may receive final images whose visual content deviates from the cloud result in ways that affect usability or intent.
- The pre-cloud editability check could be inserted into existing MLLM pipelines to decide automatically whether surrogate substitution is safe for a given instruction.
- The cycle-consistent anchoring approach may generalize to other privacy mechanisms that replace or obscure parts of an image before remote processing.
Load-bearing premise
The edited surrogate pair supplies reliable visual evidence of the intended edit that can be transferred back to the private source image while preserving both source integrity and edit consistency.
What would settle it
A controlled test in which C2E-S2SER applied to a new set of source-surrogate pairs fails to exceed SOER on any of the eight source integrity or edit consistency metrics.
read the original abstract
Multimodal Large Language Models (MLLMs) enable flexible instruction-driven image editing, but privacy risks arise when user images expose diverse and user-specific private content. Canonical privacy protection strategies typically substitute sensitive regions with surrogate content before cloud editing. Yet, the resulting output is often an edited surrogate rather than the desired edited source image, neglecting the local recovery in both design and evaluation scope. To this end, we introduce SPPE (Surrogate-based Privacy-Preserving Editing), the first recovery-oriented benchmark covering 36 fine-grained privacy categories and 65 editing instructions. It defines two complementary tasks: 1) editability assessment, which estimates before cloud interaction whether a surrogate can induce an edit consistent with the original image; and 2) surrogate-to-source edit recovery, which evaluates whether the edited surrogate can be transferred back to the private source with the edit effect preserved. We address each task with a dedicated method: ERMA predicts surrogate editability through instruction-aware multimodal relation modeling, while \method performs cycle-consistent recovery by using the surrogate editing pair as visual edit evidence and the source image as a source-preserving anchor. Experiments on SPPE and InstructPix2Pix show consistent improvements on both tasks. For editability assessment, ERMA improves over the best-performing baselines by 13.9% in SRCC and 12.3% in PLCC. For surrogate-to-source edit recovery, C2E-S2SER outperforms SOER across all 8 source integrity and edit consistency metrics on SPPE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the SPPE benchmark for surrogate-based privacy-preserving editing in MLLMs, spanning 36 fine-grained privacy categories and 65 editing instructions. It defines two tasks—editability assessment (via ERMA, using instruction-aware multimodal relation modeling) and surrogate-to-source edit recovery (via C2E-S2SER, using cycle-consistent recovery with the surrogate pair as visual evidence and the source as anchor)—and reports that ERMA improves SRCC by 13.9% and PLCC by 12.3% over baselines while C2E-S2SER outperforms SOER on all 8 source integrity and edit consistency metrics on SPPE (and shows gains on InstructPix2Pix).
Significance. If the transfer step in C2E-S2SER is validated, the work fills a genuine gap by shifting focus from surrogate substitution alone to post-editing recovery of the private source, which is load-bearing for practical privacy pipelines in instruction-driven MLLM editing. The benchmark itself is a clear contribution as the first recovery-oriented resource in this setting.
major comments (1)
- [Abstract] Abstract (surrogate-to-source edit recovery task description): The headline claim that C2E-S2SER outperforms SOER across all 8 metrics rests on the unverified assumption that edit effects encoded in the surrogate editing pair (after region substitution) transfer reliably to the private source without semantic drift or integrity loss across 36 categories; no loss formulations, cycle-consistency equations, or ablations isolating this transfer step are referenced, leaving the central recovery claim dependent on an assumption whose security is not demonstrated.
minor comments (1)
- [Abstract] Abstract: experimental design details (baseline implementations, statistical significance testing, data selection criteria, and potential confounds) are absent, which prevents immediate assessment of the reported metric gains.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We address the major comment regarding the abstract and the validation of the transfer step in C2E-S2SER.
read point-by-point responses
-
Referee: [Abstract] Abstract (surrogate-to-source edit recovery task description): The headline claim that C2E-S2SER outperforms SOER across all 8 metrics rests on the unverified assumption that edit effects encoded in the surrogate editing pair (after region substitution) transfer reliably to the private source without semantic drift or integrity loss across 36 categories; no loss formulations, cycle-consistency equations, or ablations isolating this transfer step are referenced, leaving the central recovery claim dependent on an assumption whose security is not demonstrated.
Authors: The abstract is a concise summary and therefore omits detailed equations. The cycle-consistency loss formulations and equations for C2E-S2SER are defined in Section 3.2, with the surrogate pair serving as visual evidence and the source as anchor. Ablations isolating the transfer step, including checks for semantic drift and integrity across all 36 categories, appear in Section 4.3 and the supplement. The consistent gains on all 8 metrics over SOER on SPPE (and InstructPix2Pix) supply empirical support for reliable transfer. We will revise the abstract to explicitly note the cycle-consistent formulation. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces the SPPE benchmark and two methods (ERMA for editability assessment via instruction-aware multimodal relation modeling, and C2E-S2SER for cycle-consistent surrogate-to-source recovery) without any equations, mathematical derivations, fitted parameters presented as predictions, or self-referential definitions. Claims of improvement (e.g., 13.9% SRCC on editability, outperformance on 8 metrics for recovery) are empirical evaluations against baselines on SPPE and InstructPix2Pix. No load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior work are described. The chain consists of standard multimodal modeling and benchmark evaluation and is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Sheynin, S., Polyak, A., Singer, U., Kirstain, Y., Zohar, A., Ashual, O., Parikh, D., Taig- man, Y.: Emu edit: Precise image editing via recognition and generation tasks. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8871–8879 (2024)
2024
-
[2]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Huang, Y., Xie, L., Wang, X., Yuan, Z., Cun, X., Ge, Y., Zhou, J., Dong, C., Huang, R., Zhang, R.,et al.: Smartedit: Exploring complex instruction-based image editing with multimodal large language models. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8362–8371 (2024)
2024
-
[3]
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol
Ma, J., Zhu, X., Pan, Z., Peng, Q., Guo, X., Chen, C., Lu, H.: X2edit: Revisiting arbitrary-instruction image editing through self-constructed data and task-aware repre- sentation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, pp. 7764–7772 (2026)
2026
-
[4]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 20 Datasets and Benchmarks Track (2026)
Ye, Y., He, X., Li, Z., Lin, B., Yuan, S., Yan, Z., Hou, B., Yuan, L.: Imgedit: A uni- fied image editing dataset and benchmark. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 20 Datasets and Benchmarks Track (2026). https://openreview.net/forum?id=uUCSrMlfD3
2026
-
[5]
Mishra, A., Noh, R., Fu, H., Li, M., Kim, M.: ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting (2025). https://arxiv. org/abs/2502.14780
-
[6]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Gafni, O., Wolf, L., Taigman, Y.: Live face de-identification in video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9378–9387 (2019)
2019
-
[7]
In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pp
Hukkel˚ as, H., Smebye, M., Mester, R., Lind- seth, F.: Realistic full-body anonymization with surface-guided gans. In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pp. 1430–1440 (2023)
2023
-
[8]
In: Proceedings of the Asian Conference on Computer Vision, pp
Maximov, M., Elezi, I., Leal-Taix´ e, L.: Decou- pling identity and visual quality for image and video anonymization. In: Proceedings of the Asian Conference on Computer Vision, pp. 3637–3653 (2022)
2022
-
[9]
In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp
Xu, A., Fang, S., Yang, H., Hosio, S., Yatani, K.: Examining human perception of gener- ative content replacement in image privacy protection. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2024)
2024
-
[10]
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual Instruction Tuning (2023). https://arxiv.org/ abs/2304.08485
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Zhu, D., Chen, J., Shen, X., Li, X., Elhoseiny, M.: MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Lan- guage Models (2023). https://arxiv.org/abs/ 2304.10592
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H., Lavril, T., Izacard, G., Mar- tinet, X., Lachaux, M.-A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., Lam- ple, G.: LLaMA: Open and Efficient Founda- tion Language Models (2023). https://arxiv. org/abs/2302.13971
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
NeurIPS (2023)
Koh, J.Y., Fried, D., Salakhutdinov, R.: Gen- erating images with multimodal language models. NeurIPS (2023)
2023
-
[14]
https://arxiv.org/abs/ 2307.08041
Ge, Y., Ge, Y., Zeng, Z., Wang, X., Shan, Y.: Planting a SEED of Vision in Large Lan- guage Model (2023). https://arxiv.org/abs/ 2307.08041
-
[15]
https://arxiv.org/abs/2310.01218
Ge, Y., Zhao, S., Zeng, Z., Ge, Y., Li, C., Wang, X., Shan, Y.: Making LLaMA SEE and Draw with SEED Tokenizer (2023). https://arxiv.org/abs/2310.01218
-
[16]
Advances in Neural Information Processing Systems36, 45381–45401 (2023)
Chien, E., Chen, W.-N., Pan, C., Li, P., Ozgur, A., Milenkovic, O.: Differentially pri- vate decoupled graph convolutions for multi- granular topology protection. Advances in Neural Information Processing Systems36, 45381–45401 (2023)
2023
-
[17]
Wei, Q., Li, J., You, Z., Zhan, Y., Li, K., Wu, J., Liu, X.L.H., Yu, Y., Cao, B., Xu, Y., et al.: Dual-priv pruning: Effi- cient differential private fine-tuning in multi- modal large language models. arXiv preprint arXiv:2506.07077 (2025)
-
[18]
IEEE Trans- actions on Circuits and Systems for Video Technology32(7), 4828–4840 (2021)
Zhang, Y., Zhu, G., Wu, L., Kwong, S., Zhang, H., Zhou, Y.: Multi-task se-network for image splicing localization. IEEE Trans- actions on Circuits and Systems for Video Technology32(7), 4828–4840 (2021)
2021
-
[19]
IEEE Transactions on Multimedia24, 1435–1448 (2021)
Huang, J., Liao, J., Kwong, S.: Unsupervised image-to-image translation via pre-trained stylegan2 network. IEEE Transactions on Multimedia24, 1435–1448 (2021)
2021
-
[20]
In: ACM SIGGRAPH 2024 Conference Pa- pers
Alaluf, Y., Garibi, D., Patashnik, O., Averbuch-Elor, H., Cohen-Or, D.: Cross- image attention for zero-shot appearance transfer. In: ACM SIGGRAPH 2024 Conference Papers. SIGGRAPH ’24. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3641519.3657423 . https://doi.org/10.1145/3641519.3657423
-
[21]
In: Proceed- ings of the Computer Vision and Pattern Recognition Conference, pp
Zhou, Y., Gao, X., Chen, Z., Huang, H.: Attention distillation: A unified approach to 21 visual characteristics transfer. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference, pp. 18270–18280 (2025)
2025
-
[22]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X., Chen, D., Wen, F.: Paint by example: Exemplar-based image editing with diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18381–18391 (2023)
2023
-
[23]
In: 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), pp
Chen, S., Huang, J.: Specref: A fast training- free baseline of specific reference-condition real image editing. In: 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), pp. 369–375 (2023). IEEE
2023
-
[24]
https://arxiv.org/abs/2409.18071
He, R., Ma, K., Huang, L., Huang, S., Gao, J., Wei, X., Dai, J., Han, J., Liu, S.: FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction (2024). https://arxiv.org/abs/2409.18071
-
[25]
Advances in Neural Information Processing Systems37, 84010–84032 (2024)
Chen, X., Feng, Y., Chen, M., Wang, Y., Zhang, S., Liu, Y., Shen, Y., Zhao, H.: Zero- shot image editing with reference imitation. Advances in Neural Information Processing Systems37, 84010–84032 (2024)
2024
-
[26]
In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol
Biswas, S.D., Shreve, M., Li, X., Singhal, P., Roy, K.: Pixels: Progressive image xemplar- based editing with latent surgery. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 2663–2671 (2025)
2025
-
[27]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Chen, X., Huang, L., Liu, Y., Shen, Y., Zhao, D., Zhao, H.: Anydoor: Zero-shot object- level image customization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6593– 6602 (2024)
2024
-
[28]
Advances in neural information processing systems33, 1877–1901 (2020)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A.,et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)
1901
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Wang, X., Wang, W., Cao, Y., Shen, C., Huang, T.: Images speak in images: A gener- alist painter for in-context visual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6830–6839 (2023)
2023
-
[30]
Advances in Neural Infor- mation Processing Systems35, 25005–25017 (2022)
Bar, A., Gandelsman, Y., Darrell, T., Glober- son, A., Efros, A.: Visual prompting via image inpainting. Advances in Neural Infor- mation Processing Systems35, 25005–25017 (2022)
2022
-
[31]
Zhang, Y., Zhou, K., Liu, Z.: What makes good examples for visual in-context learning? Advances in Neural Information Processing Systems36, 17773–17794 (2023)
2023
-
[32]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Bai, Y., Geng, X., Mangalam, K., Bar, A., Yuille, A.L., Darrell, T., Malik, J., Efros, A.A.: Sequential modeling enables scalable learning for large vision models. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22861–22872 (2024)
2024
-
[33]
https://arxiv.org/abs/2410
Huang, L., Wang, W., Wu, Z.-F., Shi, Y., Dou, H., Liang, C., Feng, Y., Liu, Y., Zhou, J.: In-Context LoRA for Diffusion Trans- formers (2024). https://arxiv.org/abs/2410. 23775
2024
-
[34]
In: Pro- ceedings of the IEEE International Confer- ence on Computer Vision, pp
Orekondy, T., Schiele, B., Fritz, M.: Towards a visual privacy advisor: Understanding and predicting privacy risks in images. In: Pro- ceedings of the IEEE International Confer- ence on Computer Vision, pp. 3686–3695 (2017)
2017
-
[35]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Gurari, D., Li, Q., Lin, C., Zhao, Y., Guo, A., Stangl, A., Bigham, J.P.: Vizwiz-priv: A dataset for recognizing the presence and pur- pose of private visual information in images taken by blind people. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
2019
-
[36]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 22 pp
Orekondy, T., Fritz, M., Schiele, B.: Connect- ing pixels to privacy and utility: Automatic redaction of private information in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 22 pp. 8466–8475 (2018)
2018
-
[37]
In: Companion Proceedings of the 28th International Conference on Intelli- gent User Interfaces, pp
Xu, A., Zhou, Z., Miyazaki, K., Yoshikawa, R., Hosio, S., Yatani, K.: Dipa: An image dataset with cross-cultural privacy concern annotations. In: Companion Proceedings of the 28th International Conference on Intelli- gent User Interfaces, pp. 259–266 (2023)
2023
-
[38]
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7(4), 1–30 (2024)
Xu, A., Zhou, Z., Miyazaki, K., Yoshikawa, R., Hosio, S., Yatani, K.: Dipa2: An image dataset with cross-cultural privacy percep- tion annotations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7(4), 1–30 (2024)
2024
-
[39]
In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp
Tseng, Y., Sharma, T., Zhang, L., Stangl, A., Findlater, L., Wang, Y., Gurari, D.: Biv- priv-seg: Locating private content in images taken by people with visual impairments. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 430–440 (2025). https://doi.org/10.1109/ WACV61041.2025.00052
-
[40]
Zhang, J., Cao, X., Han, Z., Shan, S., Chen, X.: Multi-P 2A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models (2025). https:// arxiv.org/abs/2412.19496
-
[41]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Abdulaziz, S., D’amicantonio, G., Bon- darev, E.: Evaluation of human visual pri- vacy protection: Three-dimensional frame- work and benchmark dataset. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5893–5902 (2025)
2025
-
[42]
IEEE Transactions on Image Processing13(4), 600–612 (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simon- celli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing13(4), 600–612 (2004)
2004
-
[43]
IEEE Transactions on Image Processing20(8), 2378–2386 (2011)
Zhang, L., Zhang, L., Mou, X., Zhang, D.: Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing20(8), 2378–2386 (2011)
2011
-
[44]
IEEE transactions on pattern analysis and machine intelligence 44(5), 2567–2581 (2020)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence 44(5), 2567–2581 (2020)
2020
-
[45]
In: European Confer- ence on Computer Vision (2022)
Ghildyal, A., Liu, F.: Shift-tolerant percep- tual similarity metric. In: European Confer- ence on Computer Vision (2022)
2022
-
[46]
Chen, C., Mo, J., Hou, J., Wu, H., Liao, L., Sun, W., Yan, Q., Lin, W.: Topiq: A top- down approach from semantics to distortions for image quality assessment. Trans. Img. Proc.33, 2404–2418 (2024) https://doi.org/ 10.1109/TIP.2024.3378466
-
[47]
https://arxiv.org/abs/ 2503.11221
Chen, D., Wu, T., Ma, K., Zhang, L.: Toward Generalized Image Quality Assess- ment: Relaxing the Perfect Reference Quality Assumption (2025). https://arxiv.org/abs/ 2503.11221
-
[48]
IEEE Transactions on Image Processing21(12), 4695–4708 (2012)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No- reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing21(12), 4695–4708 (2012)
2012
-
[49]
completely blind
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image qual- ity analyzer. IEEE Signal Processing Letters 20(3), 209–212 (2013)
2013
-
[50]
IEEE Transactions on Image Processing27(8), 3998–4011 (2018)
Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Transactions on Image Processing27(8), 3998–4011 (2018)
2018
-
[51]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
2021
-
[52]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Lao, S., Gong, Y., Shi, S., Yang, S., Wu, T., Wang, J., Xia, W., Yang, Y.: Attentions help cnns see better: Attention-based hybrid image quality assessment network. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1140–1149 (2022)
2022
-
[53]
In: Proceedings 23 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp
Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi- dimension attention network for no-reference image quality assessment. In: Proceedings 23 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1191–1200 (2022)
2022
-
[54]
ACM Transactions on Multimedia Com- puting, Communications and Applications (2026)
Xian, W., Chen, Y., Chen, B., U, L.H., Liu, S., Feng, Y., Zhou, M., Kwong, S.: Neighbor- hood attention-based feature reconstruction for image anomaly detection and localiza- tion. ACM Transactions on Multimedia Com- puting, Communications and Applications (2026)
2026
-
[55]
In: The Twelfth International Con- ference on Learning Representations (2024)
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., M¨ uller, J., Penna, J., Rombach, R.: SDXL: Improving latent dif- fusion models for high-resolution image syn- thesis. In: The Twelfth International Con- ference on Learning Representations (2024). https://openreview.net/forum?id=di52zR8xgf
2024
-
[56]
In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Nguyen, T., Li, Y., Ojha, U., Lee, Y.J.: Visual instruction inversion: Image editing via image prompting. In: Thirty-seventh Conference on Neural Information Processing Systems (2023). https://openreview.net/forum?id=l9BsCh8ikK
2023
-
[57]
Multitask Prompted Training Enables Zero-Shot Task Generalization
Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T.L., Raja, A., et al.: Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[58]
arXiv preprint arXiv:2410.02761 (2024)
Xu, Z., Zhang, X., Li, R., Tang, Z., Huang, Q., Zhang, J.: Fakeshield: Explainable image forgery detection and localization via multi- modal large language models. arXiv preprint arXiv:2410.02761 (2024)
-
[59]
arXiv preprint arXiv:2305.01115 (2023)
Wang, Z., Jiang, Y., Lu, Y., Shen, Y., He, P., Chen, W., Wang, Z., Zhou, M.: In-context learning unlocked for diffusion models. arXiv preprint arXiv:2305.01115 (2023)
-
[60]
arXiv preprint arXiv:2503.13327 (2025)
Chen, L., Mao, Q., Gu, Y., Shou, M.Z.: Edit transfer: Learning image editing via vision in-context relations. arXiv preprint arXiv:2503.13327 (2025)
-
[61]
In: Proceedings of the AAAI Conference on Arti- ficial Intelligence, vol
Xu, S., Liu, Y., Chen, P., Li, Y.-H., Wang, S., Kwong, S.: When privacy meets recov- ery: The overlooked half of surrogate-driven privacy preservation for mllm editing. In: Proceedings of the AAAI Conference on Arti- ficial Intelligence, vol. 40, pp. 35958–35966 (2026)
2026
-
[62]
In: Proceedings of the Computer Vision and Pattern Recog- nition Conference, pp
Lai, B., Juefei-Xu, F., Liu, M., Dai, X., Mehta, N., Zhu, C., Huang, Z., Rehg, J.M., Lee, S., Zhang, N., Xiao, T.: Unleashing in- context learning of autoregressive models for few-shot image manipulation. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference, pp. 18346–18357 (2025) 24
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.