arxiv: 2604.09096 · v1 · submitted 2026-04-10 · 💻 cs.CV · cs.MM· eess.IV

Recognition: unknown

Off-the-shelf Vision Models Benefit Image Manipulation Localization

Zhengxuan Zhang , Keji Song , Junmin Hu , Ao Luo , Yuezun Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3

classification 💻 cs.CV cs.MMeess.IV

keywords image manipulation localizationforgery detectionvision adaptersemantic priorsfeature disentanglementrobust PCAfrozen modelsIML

0 comments

The pith

Off-the-shelf vision models contain extractable clues for locating image manipulations that a lightweight adapter can isolate without retraining the base network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that image manipulation localization and general vision tasks are intrinsically linked because off-the-shelf models already embed manipulation-specific signals within their semantic features. It introduces a trainable adapter, ReVi, that draws on robust principal component analysis to separate semantic redundancy from those signals and amplify the manipulation cues. Only the adapter is fine-tuned while the underlying vision models stay frozen, sidestepping the full redesign and retraining demanded by earlier IML approaches. A sympathetic reader would care because this suggests a scalable route to forgery detection that reuses existing large vision networks rather than building specialized ones from scratch.

Core claim

The central claim is that general-purpose vision models for tasks such as image generation and segmentation already hold manipulation-specific information alongside their semantic representations. The ReVi adapter disentangles semantic redundancy from this information and selectively strengthens the manipulation component, allowing effective localization by training only the adapter on top of frozen base models.

What carries the argument

The ReVi adapter, a trainable module inspired by robust principal component analysis that disentangles semantic redundancy from manipulation-specific information embedded in frozen off-the-shelf vision models and selectively enhances the latter.

If this is right

IML systems can reuse existing vision networks for forgery localization instead of training entirely new architectures.
Only a small adapter needs updating when switching base models or adapting to new manipulation types.
Semantic understanding and manipulation detection can share training signals rather than requiring separate model families.
Detection performance improves by amplifying hidden manipulation cues already present in semantic feature spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adapter pattern could transfer general models to related forensic tasks such as deepfake detection or camera-source identification.
If the disentanglement succeeds across diverse base models, manipulation artifacts likely occupy a consistent low-rank structure in feature spaces.
The approach invites tests on video sequences or 3D manipulations to check whether the semantic-manipulation link generalizes beyond still images.

Load-bearing premise

Off-the-shelf general vision models already embed manipulation-specific information that can be reliably separated from semantic features by the adapter without redesigning or retraining the base models.

What would settle it

Apply the adapter to a new frozen vision model, train only on standard IML training sets, and evaluate localization metrics on held-out benchmarks such as CASIA or NIST; if the results fall below those of fully retrained task-specific networks, the claim that general priors suffice is falsified.

Figures

Figures reproduced from arXiv: 2604.09096 by Ao Luo, Junmin Hu, Keji Song, Yuezun Li, Zhengxuan Zhang.

**Figure 2.** Figure 2: Top and bottom rows denote manipulated and authentic images. It can be seen that manipulations often cause semantic incon [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Overview of the proposed ReVi. The symbol “ [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: Iterative optimization framework for RPCA. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Visualization results comparing different methods on sample images. “Seg”: SegFormer. “Swin”: Swin Transformer. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: GradCAM visualizations of the backbone model after [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: GradCAM visualizations of the SKA module and MKE [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the off-the-shelf vision models with frozen parameters and only fine-tunes the proposed adapter. The experimental results demonstrate the superiority of our method, showing the potential for scalable IML frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core idea is to freeze general vision models and add a small RPCA-inspired adapter to extract manipulation signals for localization, but the abstract supplies no numbers or ablations to show the disentanglement actually works as claimed.

read the letter

The main thing to know is that this work tries to repurpose off-the-shelf models like segmentation or generation networks for image manipulation localization by training only a lightweight adapter called ReVi. The adapter draws from robust PCA to split semantic redundancy from manipulation-specific cues and boost the latter, all while keeping the backbone frozen. That setup avoids the heavy redesign and retraining common in prior IML papers, which is a practical angle if it holds up.

Referee Report

2 major / 2 minor

Summary. The paper claims that off-the-shelf general-purpose vision models (e.g., segmentation and generation networks) intrinsically embed manipulation-specific information alongside semantic features; a novel trainable adapter (ReVi), inspired by robust principal component analysis, can disentangle semantic redundancy from this manipulation-specific information, selectively enhance the latter, and thereby repurpose the frozen backbones for superior image manipulation localization (IML) without full model redesign or retraining.

Significance. If the empirical gains are reproducible and the disentanglement mechanism is verified, the work would be significant for demonstrating an intrinsic link between general semantic priors and IML, enabling scalable forensic pipelines that leverage existing pretrained models rather than training specialized detectors from scratch. The adapter-based efficiency and the RPCA-inspired separation framing are potentially valuable contributions if supported by mechanistic evidence.

major comments (2)

[Abstract] Abstract: the assertion of experimental superiority is made without any reported metrics, baselines, datasets, or implementation details, preventing assessment of whether the ReVi adapter's performance gains actually arise from the claimed disentanglement.
[Method] Method (ReVi adapter): the central claim that manipulation-specific signals are pre-embedded in the frozen general-vision features and selectively enhanced by the RPCA-inspired low-rank/sparse separation lacks explicit verification such as rank analysis of the low-rank component, feature attribution, or ablation of the disentanglement operator; without these, the gains could equally be explained by standard transfer learning on semantic context rather than isolation of pre-existing manipulation cues.

minor comments (2)

[Abstract] The acronym 'ReVi' is introduced in the abstract without expansion or a one-sentence description of its components.
[Method] Notation for the low-rank and sparse components in the adapter should be defined consistently with the RPCA reference to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have prepared revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of experimental superiority is made without any reported metrics, baselines, datasets, or implementation details, preventing assessment of whether the ReVi adapter's performance gains actually arise from the claimed disentanglement.

Authors: We agree that the abstract, as currently written, is too high-level to allow immediate assessment of the quantitative claims. The full manuscript reports all metrics, baselines, datasets, and implementation details in the experiments section. To address the concern directly, we will revise the abstract to concisely include the primary datasets, main baselines, and key performance deltas while preserving its brevity. revision: yes
Referee: [Method] Method (ReVi adapter): the central claim that manipulation-specific signals are pre-embedded in the frozen general-vision features and selectively enhanced by the RPCA-inspired low-rank/sparse separation lacks explicit verification such as rank analysis of the low-rank component, feature attribution, or ablation of the disentanglement operator; without these, the gains could equally be explained by standard transfer learning on semantic context rather than isolation of pre-existing manipulation cues.

Authors: The referee correctly identifies that the current manuscript does not provide direct mechanistic verification of the disentanglement. We will add (i) an ablation that replaces the RPCA-inspired operator with a standard linear adapter to isolate its contribution, (ii) rank analysis of the low-rank and sparse components across layers, and (iii) feature attribution or activation visualizations comparing the enhanced manipulation cues before and after the adapter. These additions will clarify the distinction from generic transfer learning on frozen backbones. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new adapter introduces independent contribution

full rationale

The paper proposes a novel trainable ReVi adapter that repurposes frozen off-the-shelf vision models for image manipulation localization by disentangling semantic redundancy from manipulation-specific information, drawing inspiration from RPCA. This constitutes an original architectural and training contribution evaluated through experiments, rather than any derivation that reduces to its own inputs by definition, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps in the abstract or described method exhibit self-definitional loops, uniqueness imported from prior author work, or ansatz smuggled via citation. The central insight that semantic priors benefit IML is framed as a motivating hypothesis tested via the adapter's empirical performance, not a self-proving statement. The approach is therefore self-contained with independent content against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the domain assumption that manipulation-specific features exist within general vision models and are separable via an RPCA-inspired adapter. The ReVi module is an invented component whose effectiveness is not independently evidenced in the abstract.

axioms (1)

domain assumption General semantic priors from vision models contain extractable manipulation-specific information that can be disentangled and enhanced by a trainable adapter.
This is the core premise stated in the abstract for why the adapter works.

invented entities (1)

ReVi adapter no independent evidence
purpose: Disentangle semantic redundancy from manipulation-specific information in frozen vision models and selectively enhance the latter.
New trainable module introduced by the paper.

pith-pipeline@v0.9.0 · 5469 in / 1263 out tokens · 59673 ms · 2026-05-10T16:36:00.447036+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Frequency-adaptive dilated convolution for semantic seg- mentation

Linwei Chen, Lin Gu, Dezhi Zheng, and Ying Fu. Frequency-adaptive dilated convolution for semantic seg- mentation. InIEEE Conference on Computer Vision and Pattern Recognition, 2024. 7

2024
[2]

Image manipulation detection by multi-view multi-scale supervision

Xinru Chen, Chengbo Dong, Jiaqi Ji, juan Cao, and Xirong Li. Image manipulation detection by multi-view multi-scale supervision. InIEEE International Conference on Computer Vision, 2021. 5

2021
[3]

Forensics adapter: Adapting clip for generalizable face forgery detection

Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting clip for generalizable face forgery detection. InIEEE Conference on Computer Vision and Pattern Recognition, 2025. 5

2025
[4]

Exposing digital image forgeries by illumination color classification

Tiago Jos ´e de Carvalho, Christian Riess, Elli Angelopoulou, H´elio Pedrini, and Anderson de Rezende Rocha. Exposing digital image forgeries by illumination color classification. IEEE Transactions on Information Forensics and Security,
[5]

Casia image tamper- ing detection evaluation database.IEEE China Summit and International Conference on Signal and Information Pro- cessing, 2013

Jing Dong, Wei Wang, and Tieniu Tan. Casia image tamper- ing detection evaluation database.IEEE China Summit and International Conference on Signal and Information Pro- cessing, 2013. 3, 5

2013
[6]

Foreground segmentation with tree-structured sparse rpca.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 2018

Salehe Erfanian Ebadi and Ebroul Izquierdo. Foreground segmentation with tree-structured sparse rpca.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 2018. 2, 4

2018
[7]

Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. The pascal vi- sual object classes challenge: A retrospective.International Journal of Computer Vision, 2015. 6

2015
[8]

Learning fast approxima- tions of sparse coding

Karol Gregor and Yann LeCun. Learning fast approxima- tions of sparse coding. InInternational Conference on Ma- chine Learning, 2010. 2, 4

2010
[9]

Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus

Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy N. Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus. Mfc datasets: Large-scale benchmark datasets for media forensic challenge evaluation. InIEEE Winter Applications of Computer Vision Workshop, 2019. 5

2019
[10]

Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 5, 6, 7, 9

2023
[11]

Hierarchical fine-grained im- age forgery detection and localization

Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Ia- copo Masi, and Xiaoming Liu. Hierarchical fine-grained im- age forgery detection and localization. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 6, 7

2023
[12]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 3

2022
[13]

Computational limits of low-rank adaptation (loRA) fine-tuning for transformer models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, and Han Liu. Computational limits of low-rank adaptation (loRA) fine-tuning for transformer models. InInternational Conference on Learning Representations Workshop, 2025. 3

2025
[14]

Span: Spatial pyramid attention network for image manipulation localiza- tion

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. Span: Spatial pyramid attention network for image manipulation localiza- tion. InEuropean Conference on Computer Vision, 2020. 1, 2, 6, 9

2020
[15]

Robust video restoration by joint sparse and low rank matrix approx- imation.Siam Journal on Imaging Sciences, 2011

Hui Ji, Sibin Huang, Zuowei Shen, and Yuhong Xu. Robust video restoration by joint sparse and low rank matrix approx- imation.Siam Journal on Imaging Sciences, 2011. 2

2011
[16]

Semanticrt: A large-scale dataset and method for robust semantic segmentation in multispectral images

Wei Ji, Jingjing Li, Cheng Bian, Zhicheng Zhang, and Li Cheng. Semanticrt: A large-scale dataset and method for robust semantic segmentation in multispectral images. InIn- ternational Conference on Multimedia, 2023. 2

2023
[17]

Deepfakes and image manipulation: criminalisation and control.Information & Communications Technology Law, 2020

Tyrone Kirchengast. Deepfakes and image manipulation: criminalisation and control.Information & Communications Technology Law, 2020. 1

2020
[18]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 2, 6

work page internal anchor Pith review arXiv 2023
[19]

Learning jpeg compression artifacts for image manipulation detection and localization

Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung- Kyu Lee, and Changick Kim. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision, 2022. 2, 6

2022
[20]

Safire: Segment any forged image region.AAAI Conference on Artificial Intelligence, 2024

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam, Minji Son, and Changick Kim. Safire: Segment any forged image region.AAAI Conference on Artificial Intelligence, 2024. 2, 6

2024
[21]

Gert Lek, Chaoyi Zhu, Pin-Yu Chen, Robert Birke, and Ly- dia Y . Chen. Detective SAM: Adapting SAM to localize diffusion-based forgeries via embedding artifacts. InInter- national Conference on Machine Learning Workshop, 2025. 2, 6

2025
[22]

Noise-assisted prompt learning forimage forgery detection andlocalization

Dong Li, Jiaying Zhu, Xueyang Fu, Xun Guo, Yidi Liu, Gang Yang, Jiawei Liu, and Zheng Jun Zha. Noise-assisted prompt learning forimage forgery detection andlocalization. InEuropean Conference on Computer Vision, 2025. 2

2025
[23]

Adaifl: Adaptive image forgery localization via a dynamic and importance-aware transformer network

Yuxi Li, Fuyuan Cheng, Wangbo Yu, Guangshuo Wang, Guibo Luo, and Yuesheng Zhu. Adaifl: Adaptive image forgery localization via a dynamic and importance-aware transformer network. InEuropean Conference on Computer Vision, 2025. 2, 6

2025
[24]

Texture, shape and order matter: A new transformer design for sequential deepfake detection

Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, and Junyu Dong. Texture, shape and order matter: A new transformer design for sequential deepfake detection. In IEEE Winter Conference on Applications of Computer Vi- sion, 2025. 8

2025
[25]

Explicit visual prompting for low-level structure segmenta- tions

Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmenta- tions. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 3, 6, 7, 9

2023
[26]

Attentive and contrastive im- age manipulation localization with boundary guidance.IEEE Transactions on Information Forensics and Security, 2024

Wenxi Liu, Hao Zhang, Xinyang Lin, Qing Zhang, Qi Li, Xiaoxiang Liu, and Ying Cao. Attentive and contrastive im- age manipulation localization with boundary guidance.IEEE Transactions on Information Forensics and Security, 2024. 2, 6, 7, 9

2024
[27]

Pscc-net: Progressive spatio-channel correlation network for 10 image manipulation detection and localization.IEEE Trans- actions on Circuits and Systems for Video Technology, 2022

Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for 10 image manipulation detection and localization.IEEE Trans- actions on Circuits and Systems for Video Technology, 2022. 2, 3, 5, 6, 7, 9

2022
[28]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision, 2021. 5

2021
[29]

Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025

Zijie Lou, Gang Cao, Kun Guo, Lifang Yu, and Shaowei Weng. Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025. 2

2025
[30]

Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025

Zijie Lou, Gang Cao, Kun Guo, Lifang Yu, and Shaowei Weng. Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025. 6, 7, 9

2025
[31]

arXiv preprint arXiv:2307.14863 , year=

Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y . Al Hammadi, and Jizhe Zhou. Iml-vit: Benchmarking image manipulation localization by vision transformer. arXiv:2307.14863, 2023. 1, 2, 5, 6, 7, 9

work page arXiv 2023
[32]

A survey on deep learning-based image forgery detection.Pattern Recogni- tion, 2023

Fatemeh Zare Mehrjardi, Ali Mohammad Latif, Mohsen Sar- dari Zarchi, and Razieh Sheikhpour. A survey on deep learning-based image forgery detection.Pattern Recogni- tion, 2023. 1

2023
[33]

A data set of authentic and spliced image blocks.ADVENT Technical Report, 2004

Tian-Tsong Ng and Shih-Fu Chang. A data set of authentic and spliced image blocks.ADVENT Technical Report, 2004. 5

2004
[34]

Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images

Adam Novoz ´amsk´y, Babak Mahdian, and Stanislav Saic. Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images. InIEEE Winter Applications of Computer Vision Workshop, 2020. 5

2020
[35]

Partridge and M

M. Partridge and M. Jabri. Robust principal component anal- ysis. InIEEE Signal Processing Society Workshop, 2000. 2

2000
[36]

Robust principal component anal- ysis: A factorization-based approach with linear complexity

Chong Peng, Yongyong Chen, Zhao Kang, Chenglizhao Chen, and Qiang Cheng. Robust principal component anal- ysis: A factorization-based approach with linear complexity. Information Sciences, 2020. 2

2020
[37]

Houwen Peng, Bing Li, Haibin Ling, Weiming Hu, Weihua Xiong, and Stephen J. Maybank. Salient object detection via structured matrix decomposition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. 2, 4

2017
[38]

Forensicssam: Toward robust and unified image forgery detection and localization resisting to adversarial attack.arXiv:2508.07402, 2025

Rongxuan Peng, Shunquan Tan, Chenqi Kong, Anwei Luo, Alex C Kot, and Jiwu Huang. Forensicssam: Toward robust and unified image forgery detection and localization resisting to adversarial attack.arXiv:2508.07402, 2025. 2, 6

work page arXiv 2025
[39]

Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer

Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models.IEEE Conference on Computer Vision and Pattern Recognition, 2022. 3, 5

2022
[40]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015. 6

2015
[41]

LAION-5b: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5b: An open large-scale dataset for training next generation image-text m...

2022
[42]

Dirloc: Disentanglement representation learning for robust image forgery localization.IEEE Transactions on De- pendable and Secure Computing, 2025

Ziqi Sheng, Zuomin Qu, Wei Lu, Xiaochun Cao, and Jiwu Huang. Dirloc: Disentanglement representation learning for robust image forgery localization.IEEE Transactions on De- pendable and Secure Computing, 2025. 2

2025
[43]

Tinysam: Pushing the envelope for efficient segment any- thing model.AAAI Conference on Artificial Intelligence,

Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yi- hao Chen, Houqiang Li, Yunhe Wang, and Xinghao Chen. Tinysam: Pushing the envelope for efficient segment any- thing model.AAAI Conference on Artificial Intelligence,
[44]

Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter- efficient image manipulation localization through spare- coding transformer

Lei Su, Xiaochen Ma, Xuekang Zhu, Chaoqun Niu, Zeyu Lei, and Ji-Zhe Zhou. Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter- efficient image manipulation localization through spare- coding transformer. InAAAI Conference on Artificial Intel- ligence, 2025. 2, 6, 7, 9

2025
[45]

A novel universal image forensics localization model based on image noise and segment anything model.ACM Workshop on Information Hiding and Multimedia Security, 2024

Yang Su, Shunquan Tan, and Jiwu Huang. A novel universal image forensics localization model based on image noise and segment anything model.ACM Workshop on Information Hiding and Multimedia Security, 2024. 6

2024
[46]

Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation

Qi Tang, Runmin Cong, Ronghui Sheng, Lingzhi He, Dan Zhang, Yao Zhao, and Sam Kwong. Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation. InInternational Conference on Multime- dia, 2021. 2

2021
[47]

Harmfully manipulated images mat- ter in multimodal misinformation detection

Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, and Ximing Li. Harmfully manipulated images mat- ter in multimodal misinformation detection. InInternational Conference on Multimedia, pages 2262–2271, 2024. 1

2024
[48]

Ob- jectformer for image manipulation detection and localiza- tion

Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Ab- hinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. Ob- jectformer for image manipulation detection and localiza- tion. InIEEE Conference on Computer Vision and Pattern Recognition, 2022. 2, 6, 7

2022
[49]

Rpcanet: Deep unfolding rpca based infrared small target detection

Fengyi Wu, Tianfang Zhang, Lei Li, Yian Huang, Mingming Chen, and Zhenming Peng. Rpcanet: Deep unfolding rpca based infrared small target detection. InIEEE Winter Con- ference on Applications of Computer Vision, 2024. 2, 4

2024
[50]

RPCANet++: Deep interpretable robust PCA for sparse object segmentation,

Fengyi Wu, Yimian Dai, Tianfang Zhang, Yixuan Ding, Jian Yang, Ming-Ming Cheng, and Zhenming Peng. Rpcanet++: Deep interpretable robust pca for sparse object segmentation. arXiv:2508.04190, 2025. 2, 4, 5, 8

work page arXiv 2025
[51]

Segformer: Simple and ef- ficient design for semantic segmentation with transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and ef- ficient design for semantic segmentation with transformers. InNeural Information Processing Systems, 2021. 5

2021
[52]

Advancing multimodal large language models with quantization-aware scale learning for efficient adapta- tion

Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, and Rongrong Ji. Advancing multimodal large language models with quantization-aware scale learning for efficient adapta- tion. InInternational Conference on Multimedia, 2024. 3

2024
[53]

Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- 11 guage models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- 11 guage models. InInternational Conference on Learning Representations, 2025. 2, 6

2025
[54]

Multispectral and hyperspectral image fu- sion based on low-rank unfolding network.Signal Process- ing, 2023

Jun Yan, Kai Zhang, Feng Zhang, Chiru Ge, Wenbo Wan, and Jiande Sun. Multispectral and hyperspectral image fu- sion based on low-rank unfolding network.Signal Process- ing, 2023. 2, 4

2023
[55]

Ista-net++: Flexible deep unfolding network for compressive sensing

Di You, Jingfen Xie, and Jian Zhang. Ista-net++: Flexible deep unfolding network for compressive sensing. InInter- national Conference on Multimedia and Expo, 2021. 5

2021
[56]

Image forgery detection: a survey of recent deep-learning approaches.Multimedia Tools and Ap- plications, 2023

Marcello Zanardelli, Fabrizio Guerrini, Riccardo Leonardi, and Nicola Adami. Image forgery detection: a survey of recent deep-learning approaches.Multimedia Tools and Ap- plications, 2023. 1

2023
[57]

Mgq- former: mask-guided query-based transformer for image ma- nipulation localization

Kunlun Zeng, Ri Cheng, Weimin Tan, and Bo Yan. Mgq- former: mask-guided query-based transformer for image ma- nipulation localization. InAAAI Conference on Artificial In- telligence, 2024. 3, 5, 7

2024
[58]

Image classification by non-negative sparse coding, low-rank and sparse decomposition

Chunjie Zhang, Jing Liu, Qi Tian, Changsheng Xu, Hanqing Lu, and Songde Ma. Image classification by non-negative sparse coding, low-rank and sparse decomposition. InIEEE Conference on Computer Vision and Pattern Recognition,
[59]

Ista-net: Interpretable optimization-inspired deep network for image compressive sensing

Jian Zhang and Bernard Ghanem. Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. InIEEE Conference on Computer Vision and Pat- tern Recognition, 2018. 2, 4

2018
[60]

Hyperspectral im- age denoising via sparse representation and low-rank con- straint.IEEE Transactions on Geoscience and Remote Sens- ing, 2015

Yong-Qiang Zhao and Jingxiang Yang. Hyperspectral im- age denoising via sparse representation and low-rank con- straint.IEEE Transactions on Geoscience and Remote Sens- ing, 2015. 2

2015
[61]

Learning discriminative noise guidance for image forgery detection and localization

Jiaying Zhu, Dong Li, Xueyang Fu, Gang Yang, Jie Huang, Aiping Liu, and Zheng-Jun Zha. Learning discriminative noise guidance for image forgery detection and localization. InAAAI Conference on Artificial Intelligence, 2024. 2, 6, 9

2024
[62]

Mesoscopic insights: orchestrating multi- scale & hybrid architecture for image manipulation localiza- tion

Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, and Ji-Zhe Zhou. Mesoscopic insights: orchestrating multi- scale & hybrid architecture for image manipulation localiza- tion. InAAAI Conference on Artificial Intelligence, 2025. 1, 2, 3, 6, 7, 9 12

2025