Recognition: unknown
Off-the-shelf Vision Models Benefit Image Manipulation Localization
Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3
The pith
Off-the-shelf vision models contain extractable clues for locating image manipulations that a lightweight adapter can isolate without retraining the base network.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that general-purpose vision models for tasks such as image generation and segmentation already hold manipulation-specific information alongside their semantic representations. The ReVi adapter disentangles semantic redundancy from this information and selectively strengthens the manipulation component, allowing effective localization by training only the adapter on top of frozen base models.
What carries the argument
The ReVi adapter, a trainable module inspired by robust principal component analysis that disentangles semantic redundancy from manipulation-specific information embedded in frozen off-the-shelf vision models and selectively enhances the latter.
If this is right
- IML systems can reuse existing vision networks for forgery localization instead of training entirely new architectures.
- Only a small adapter needs updating when switching base models or adapting to new manipulation types.
- Semantic understanding and manipulation detection can share training signals rather than requiring separate model families.
- Detection performance improves by amplifying hidden manipulation cues already present in semantic feature spaces.
Where Pith is reading between the lines
- The same adapter pattern could transfer general models to related forensic tasks such as deepfake detection or camera-source identification.
- If the disentanglement succeeds across diverse base models, manipulation artifacts likely occupy a consistent low-rank structure in feature spaces.
- The approach invites tests on video sequences or 3D manipulations to check whether the semantic-manipulation link generalizes beyond still images.
Load-bearing premise
Off-the-shelf general vision models already embed manipulation-specific information that can be reliably separated from semantic features by the adapter without redesigning or retraining the base models.
What would settle it
Apply the adapter to a new frozen vision model, train only on standard IML training sets, and evaluate localization metrics on held-out benchmarks such as CASIA or NIST; if the results fall below those of fully retrained task-specific networks, the claim that general priors suffice is falsified.
Figures
read the original abstract
Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the off-the-shelf vision models with frozen parameters and only fine-tunes the proposed adapter. The experimental results demonstrate the superiority of our method, showing the potential for scalable IML frameworks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that off-the-shelf general-purpose vision models (e.g., segmentation and generation networks) intrinsically embed manipulation-specific information alongside semantic features; a novel trainable adapter (ReVi), inspired by robust principal component analysis, can disentangle semantic redundancy from this manipulation-specific information, selectively enhance the latter, and thereby repurpose the frozen backbones for superior image manipulation localization (IML) without full model redesign or retraining.
Significance. If the empirical gains are reproducible and the disentanglement mechanism is verified, the work would be significant for demonstrating an intrinsic link between general semantic priors and IML, enabling scalable forensic pipelines that leverage existing pretrained models rather than training specialized detectors from scratch. The adapter-based efficiency and the RPCA-inspired separation framing are potentially valuable contributions if supported by mechanistic evidence.
major comments (2)
- [Abstract] Abstract: the assertion of experimental superiority is made without any reported metrics, baselines, datasets, or implementation details, preventing assessment of whether the ReVi adapter's performance gains actually arise from the claimed disentanglement.
- [Method] Method (ReVi adapter): the central claim that manipulation-specific signals are pre-embedded in the frozen general-vision features and selectively enhanced by the RPCA-inspired low-rank/sparse separation lacks explicit verification such as rank analysis of the low-rank component, feature attribution, or ablation of the disentanglement operator; without these, the gains could equally be explained by standard transfer learning on semantic context rather than isolation of pre-existing manipulation cues.
minor comments (2)
- [Abstract] The acronym 'ReVi' is introduced in the abstract without expansion or a one-sentence description of its components.
- [Method] Notation for the low-rank and sparse components in the adapter should be defined consistently with the RPCA reference to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have prepared revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion of experimental superiority is made without any reported metrics, baselines, datasets, or implementation details, preventing assessment of whether the ReVi adapter's performance gains actually arise from the claimed disentanglement.
Authors: We agree that the abstract, as currently written, is too high-level to allow immediate assessment of the quantitative claims. The full manuscript reports all metrics, baselines, datasets, and implementation details in the experiments section. To address the concern directly, we will revise the abstract to concisely include the primary datasets, main baselines, and key performance deltas while preserving its brevity. revision: yes
-
Referee: [Method] Method (ReVi adapter): the central claim that manipulation-specific signals are pre-embedded in the frozen general-vision features and selectively enhanced by the RPCA-inspired low-rank/sparse separation lacks explicit verification such as rank analysis of the low-rank component, feature attribution, or ablation of the disentanglement operator; without these, the gains could equally be explained by standard transfer learning on semantic context rather than isolation of pre-existing manipulation cues.
Authors: The referee correctly identifies that the current manuscript does not provide direct mechanistic verification of the disentanglement. We will add (i) an ablation that replaces the RPCA-inspired operator with a standard linear adapter to isolate its contribution, (ii) rank analysis of the low-rank and sparse components across layers, and (iii) feature attribution or activation visualizations comparing the enhanced manipulation cues before and after the adapter. These additions will clarify the distinction from generic transfer learning on frozen backbones. revision: yes
Circularity Check
No significant circularity; new adapter introduces independent contribution
full rationale
The paper proposes a novel trainable ReVi adapter that repurposes frozen off-the-shelf vision models for image manipulation localization by disentangling semantic redundancy from manipulation-specific information, drawing inspiration from RPCA. This constitutes an original architectural and training contribution evaluated through experiments, rather than any derivation that reduces to its own inputs by definition, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps in the abstract or described method exhibit self-definitional loops, uniqueness imported from prior author work, or ansatz smuggled via citation. The central insight that semantic priors benefit IML is framed as a motivating hypothesis tested via the adapter's empirical performance, not a self-proving statement. The approach is therefore self-contained with independent content against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption General semantic priors from vision models contain extractable manipulation-specific information that can be disentangled and enhanced by a trainable adapter.
invented entities (1)
-
ReVi adapter
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Frequency-adaptive dilated convolution for semantic seg- mentation
Linwei Chen, Lin Gu, Dezhi Zheng, and Ying Fu. Frequency-adaptive dilated convolution for semantic seg- mentation. InIEEE Conference on Computer Vision and Pattern Recognition, 2024. 7
2024
-
[2]
Image manipulation detection by multi-view multi-scale supervision
Xinru Chen, Chengbo Dong, Jiaqi Ji, juan Cao, and Xirong Li. Image manipulation detection by multi-view multi-scale supervision. InIEEE International Conference on Computer Vision, 2021. 5
2021
-
[3]
Forensics adapter: Adapting clip for generalizable face forgery detection
Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting clip for generalizable face forgery detection. InIEEE Conference on Computer Vision and Pattern Recognition, 2025. 5
2025
-
[4]
Exposing digital image forgeries by illumination color classification
Tiago Jos ´e de Carvalho, Christian Riess, Elli Angelopoulou, H´elio Pedrini, and Anderson de Rezende Rocha. Exposing digital image forgeries by illumination color classification. IEEE Transactions on Information Forensics and Security,
-
[5]
Casia image tamper- ing detection evaluation database.IEEE China Summit and International Conference on Signal and Information Pro- cessing, 2013
Jing Dong, Wei Wang, and Tieniu Tan. Casia image tamper- ing detection evaluation database.IEEE China Summit and International Conference on Signal and Information Pro- cessing, 2013. 3, 5
2013
-
[6]
Foreground segmentation with tree-structured sparse rpca.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 2018
Salehe Erfanian Ebadi and Ebroul Izquierdo. Foreground segmentation with tree-structured sparse rpca.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 2018. 2, 4
2018
-
[7]
Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. The pascal vi- sual object classes challenge: A retrospective.International Journal of Computer Vision, 2015. 6
2015
-
[8]
Learning fast approxima- tions of sparse coding
Karol Gregor and Yann LeCun. Learning fast approxima- tions of sparse coding. InInternational Conference on Ma- chine Learning, 2010. 2, 4
2010
-
[9]
Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus
Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy N. Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus. Mfc datasets: Large-scale benchmark datasets for media forensic challenge evaluation. InIEEE Winter Applications of Computer Vision Workshop, 2019. 5
2019
-
[10]
Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion
Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 5, 6, 7, 9
2023
-
[11]
Hierarchical fine-grained im- age forgery detection and localization
Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Ia- copo Masi, and Xiaoming Liu. Hierarchical fine-grained im- age forgery detection and localization. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 6, 7
2023
-
[12]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 3
2022
-
[13]
Computational limits of low-rank adaptation (loRA) fine-tuning for transformer models
Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, and Han Liu. Computational limits of low-rank adaptation (loRA) fine-tuning for transformer models. InInternational Conference on Learning Representations Workshop, 2025. 3
2025
-
[14]
Span: Spatial pyramid attention network for image manipulation localiza- tion
Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. Span: Spatial pyramid attention network for image manipulation localiza- tion. InEuropean Conference on Computer Vision, 2020. 1, 2, 6, 9
2020
-
[15]
Robust video restoration by joint sparse and low rank matrix approx- imation.Siam Journal on Imaging Sciences, 2011
Hui Ji, Sibin Huang, Zuowei Shen, and Yuhong Xu. Robust video restoration by joint sparse and low rank matrix approx- imation.Siam Journal on Imaging Sciences, 2011. 2
2011
-
[16]
Semanticrt: A large-scale dataset and method for robust semantic segmentation in multispectral images
Wei Ji, Jingjing Li, Cheng Bian, Zhicheng Zhang, and Li Cheng. Semanticrt: A large-scale dataset and method for robust semantic segmentation in multispectral images. InIn- ternational Conference on Multimedia, 2023. 2
2023
-
[17]
Deepfakes and image manipulation: criminalisation and control.Information & Communications Technology Law, 2020
Tyrone Kirchengast. Deepfakes and image manipulation: criminalisation and control.Information & Communications Technology Law, 2020. 1
2020
-
[18]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 2, 6
work page internal anchor Pith review arXiv 2023
-
[19]
Learning jpeg compression artifacts for image manipulation detection and localization
Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung- Kyu Lee, and Changick Kim. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision, 2022. 2, 6
2022
-
[20]
Safire: Segment any forged image region.AAAI Conference on Artificial Intelligence, 2024
Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam, Minji Son, and Changick Kim. Safire: Segment any forged image region.AAAI Conference on Artificial Intelligence, 2024. 2, 6
2024
-
[21]
Gert Lek, Chaoyi Zhu, Pin-Yu Chen, Robert Birke, and Ly- dia Y . Chen. Detective SAM: Adapting SAM to localize diffusion-based forgeries via embedding artifacts. InInter- national Conference on Machine Learning Workshop, 2025. 2, 6
2025
-
[22]
Noise-assisted prompt learning forimage forgery detection andlocalization
Dong Li, Jiaying Zhu, Xueyang Fu, Xun Guo, Yidi Liu, Gang Yang, Jiawei Liu, and Zheng Jun Zha. Noise-assisted prompt learning forimage forgery detection andlocalization. InEuropean Conference on Computer Vision, 2025. 2
2025
-
[23]
Adaifl: Adaptive image forgery localization via a dynamic and importance-aware transformer network
Yuxi Li, Fuyuan Cheng, Wangbo Yu, Guangshuo Wang, Guibo Luo, and Yuesheng Zhu. Adaifl: Adaptive image forgery localization via a dynamic and importance-aware transformer network. InEuropean Conference on Computer Vision, 2025. 2, 6
2025
-
[24]
Texture, shape and order matter: A new transformer design for sequential deepfake detection
Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, and Junyu Dong. Texture, shape and order matter: A new transformer design for sequential deepfake detection. In IEEE Winter Conference on Applications of Computer Vi- sion, 2025. 8
2025
-
[25]
Explicit visual prompting for low-level structure segmenta- tions
Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmenta- tions. InIEEE Conference on Computer Vision and Pattern Recognition, 2023. 2, 3, 6, 7, 9
2023
-
[26]
Attentive and contrastive im- age manipulation localization with boundary guidance.IEEE Transactions on Information Forensics and Security, 2024
Wenxi Liu, Hao Zhang, Xinyang Lin, Qing Zhang, Qi Li, Xiaoxiang Liu, and Ying Cao. Attentive and contrastive im- age manipulation localization with boundary guidance.IEEE Transactions on Information Forensics and Security, 2024. 2, 6, 7, 9
2024
-
[27]
Pscc-net: Progressive spatio-channel correlation network for 10 image manipulation detection and localization.IEEE Trans- actions on Circuits and Systems for Video Technology, 2022
Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for 10 image manipulation detection and localization.IEEE Trans- actions on Circuits and Systems for Video Technology, 2022. 2, 3, 5, 6, 7, 9
2022
-
[28]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision, 2021. 5
2021
-
[29]
Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025
Zijie Lou, Gang Cao, Kun Guo, Lifang Yu, and Shaowei Weng. Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025. 2
2025
-
[30]
Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025
Zijie Lou, Gang Cao, Kun Guo, Lifang Yu, and Shaowei Weng. Exploring multi-view pixel contrast for general and robust image forgery localization.IEEE Transactions on In- formation Forensics and Security, 2025. 6, 7, 9
2025
-
[31]
arXiv preprint arXiv:2307.14863 , year=
Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y . Al Hammadi, and Jizhe Zhou. Iml-vit: Benchmarking image manipulation localization by vision transformer. arXiv:2307.14863, 2023. 1, 2, 5, 6, 7, 9
-
[32]
A survey on deep learning-based image forgery detection.Pattern Recogni- tion, 2023
Fatemeh Zare Mehrjardi, Ali Mohammad Latif, Mohsen Sar- dari Zarchi, and Razieh Sheikhpour. A survey on deep learning-based image forgery detection.Pattern Recogni- tion, 2023. 1
2023
-
[33]
A data set of authentic and spliced image blocks.ADVENT Technical Report, 2004
Tian-Tsong Ng and Shih-Fu Chang. A data set of authentic and spliced image blocks.ADVENT Technical Report, 2004. 5
2004
-
[34]
Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images
Adam Novoz ´amsk´y, Babak Mahdian, and Stanislav Saic. Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images. InIEEE Winter Applications of Computer Vision Workshop, 2020. 5
2020
-
[35]
Partridge and M
M. Partridge and M. Jabri. Robust principal component anal- ysis. InIEEE Signal Processing Society Workshop, 2000. 2
2000
-
[36]
Robust principal component anal- ysis: A factorization-based approach with linear complexity
Chong Peng, Yongyong Chen, Zhao Kang, Chenglizhao Chen, and Qiang Cheng. Robust principal component anal- ysis: A factorization-based approach with linear complexity. Information Sciences, 2020. 2
2020
-
[37]
Houwen Peng, Bing Li, Haibin Ling, Weiming Hu, Weihua Xiong, and Stephen J. Maybank. Salient object detection via structured matrix decomposition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. 2, 4
2017
-
[38]
Rongxuan Peng, Shunquan Tan, Chenqi Kong, Anwei Luo, Alex C Kot, and Jiwu Huang. Forensicssam: Toward robust and unified image forgery detection and localization resisting to adversarial attack.arXiv:2508.07402, 2025. 2, 6
-
[39]
Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer
Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models.IEEE Conference on Computer Vision and Pattern Recognition, 2022. 3, 5
2022
-
[40]
Berg, and Li Fei-Fei
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015. 6
2015
-
[41]
LAION-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5b: An open large-scale dataset for training next generation image-text m...
2022
-
[42]
Dirloc: Disentanglement representation learning for robust image forgery localization.IEEE Transactions on De- pendable and Secure Computing, 2025
Ziqi Sheng, Zuomin Qu, Wei Lu, Xiaochun Cao, and Jiwu Huang. Dirloc: Disentanglement representation learning for robust image forgery localization.IEEE Transactions on De- pendable and Secure Computing, 2025. 2
2025
-
[43]
Tinysam: Pushing the envelope for efficient segment any- thing model.AAAI Conference on Artificial Intelligence,
Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yi- hao Chen, Houqiang Li, Yunhe Wang, and Xinghao Chen. Tinysam: Pushing the envelope for efficient segment any- thing model.AAAI Conference on Artificial Intelligence,
-
[44]
Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter- efficient image manipulation localization through spare- coding transformer
Lei Su, Xiaochen Ma, Xuekang Zhu, Chaoqun Niu, Zeyu Lei, and Ji-Zhe Zhou. Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter- efficient image manipulation localization through spare- coding transformer. InAAAI Conference on Artificial Intel- ligence, 2025. 2, 6, 7, 9
2025
-
[45]
A novel universal image forensics localization model based on image noise and segment anything model.ACM Workshop on Information Hiding and Multimedia Security, 2024
Yang Su, Shunquan Tan, and Jiwu Huang. A novel universal image forensics localization model based on image noise and segment anything model.ACM Workshop on Information Hiding and Multimedia Security, 2024. 6
2024
-
[46]
Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation
Qi Tang, Runmin Cong, Ronghui Sheng, Lingzhi He, Dan Zhang, Yao Zhao, and Sam Kwong. Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation. InInternational Conference on Multime- dia, 2021. 2
2021
-
[47]
Harmfully manipulated images mat- ter in multimodal misinformation detection
Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, and Ximing Li. Harmfully manipulated images mat- ter in multimodal misinformation detection. InInternational Conference on Multimedia, pages 2262–2271, 2024. 1
2024
-
[48]
Ob- jectformer for image manipulation detection and localiza- tion
Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Ab- hinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. Ob- jectformer for image manipulation detection and localiza- tion. InIEEE Conference on Computer Vision and Pattern Recognition, 2022. 2, 6, 7
2022
-
[49]
Rpcanet: Deep unfolding rpca based infrared small target detection
Fengyi Wu, Tianfang Zhang, Lei Li, Yian Huang, Mingming Chen, and Zhenming Peng. Rpcanet: Deep unfolding rpca based infrared small target detection. InIEEE Winter Con- ference on Applications of Computer Vision, 2024. 2, 4
2024
-
[50]
RPCANet++: Deep interpretable robust PCA for sparse object segmentation,
Fengyi Wu, Yimian Dai, Tianfang Zhang, Yixuan Ding, Jian Yang, Ming-Ming Cheng, and Zhenming Peng. Rpcanet++: Deep interpretable robust pca for sparse object segmentation. arXiv:2508.04190, 2025. 2, 4, 5, 8
-
[51]
Segformer: Simple and ef- ficient design for semantic segmentation with transformers
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and ef- ficient design for semantic segmentation with transformers. InNeural Information Processing Systems, 2021. 5
2021
-
[52]
Advancing multimodal large language models with quantization-aware scale learning for efficient adapta- tion
Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, and Rongrong Ji. Advancing multimodal large language models with quantization-aware scale learning for efficient adapta- tion. InInternational Conference on Multimedia, 2024. 3
2024
-
[53]
Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- 11 guage models
Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- 11 guage models. InInternational Conference on Learning Representations, 2025. 2, 6
2025
-
[54]
Multispectral and hyperspectral image fu- sion based on low-rank unfolding network.Signal Process- ing, 2023
Jun Yan, Kai Zhang, Feng Zhang, Chiru Ge, Wenbo Wan, and Jiande Sun. Multispectral and hyperspectral image fu- sion based on low-rank unfolding network.Signal Process- ing, 2023. 2, 4
2023
-
[55]
Ista-net++: Flexible deep unfolding network for compressive sensing
Di You, Jingfen Xie, and Jian Zhang. Ista-net++: Flexible deep unfolding network for compressive sensing. InInter- national Conference on Multimedia and Expo, 2021. 5
2021
-
[56]
Image forgery detection: a survey of recent deep-learning approaches.Multimedia Tools and Ap- plications, 2023
Marcello Zanardelli, Fabrizio Guerrini, Riccardo Leonardi, and Nicola Adami. Image forgery detection: a survey of recent deep-learning approaches.Multimedia Tools and Ap- plications, 2023. 1
2023
-
[57]
Mgq- former: mask-guided query-based transformer for image ma- nipulation localization
Kunlun Zeng, Ri Cheng, Weimin Tan, and Bo Yan. Mgq- former: mask-guided query-based transformer for image ma- nipulation localization. InAAAI Conference on Artificial In- telligence, 2024. 3, 5, 7
2024
-
[58]
Image classification by non-negative sparse coding, low-rank and sparse decomposition
Chunjie Zhang, Jing Liu, Qi Tian, Changsheng Xu, Hanqing Lu, and Songde Ma. Image classification by non-negative sparse coding, low-rank and sparse decomposition. InIEEE Conference on Computer Vision and Pattern Recognition,
-
[59]
Ista-net: Interpretable optimization-inspired deep network for image compressive sensing
Jian Zhang and Bernard Ghanem. Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. InIEEE Conference on Computer Vision and Pat- tern Recognition, 2018. 2, 4
2018
-
[60]
Hyperspectral im- age denoising via sparse representation and low-rank con- straint.IEEE Transactions on Geoscience and Remote Sens- ing, 2015
Yong-Qiang Zhao and Jingxiang Yang. Hyperspectral im- age denoising via sparse representation and low-rank con- straint.IEEE Transactions on Geoscience and Remote Sens- ing, 2015. 2
2015
-
[61]
Learning discriminative noise guidance for image forgery detection and localization
Jiaying Zhu, Dong Li, Xueyang Fu, Gang Yang, Jie Huang, Aiping Liu, and Zheng-Jun Zha. Learning discriminative noise guidance for image forgery detection and localization. InAAAI Conference on Artificial Intelligence, 2024. 2, 6, 9
2024
-
[62]
Mesoscopic insights: orchestrating multi- scale & hybrid architecture for image manipulation localiza- tion
Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, and Ji-Zhe Zhou. Mesoscopic insights: orchestrating multi- scale & hybrid architecture for image manipulation localiza- tion. InAAAI Conference on Artificial Intelligence, 2025. 1, 2, 3, 6, 7, 9 12
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.