ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
Pith reviewed 2026-05-20 19:26 UTC · model grok-4.3
The pith
Aligning visual features with LLM reasoning texts creates a lightweight yet generalizable detector for AI-generated image forgeries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReAlign distills high-quality reasoning texts generated by a GRPO-optimized LLM into a lightweight AIGI detector via contrastive learning. It inherits the generalization ability and semantic sensitivity capability of reasoning textual representations while remaining efficient and lightweight for deployment, using a tailored joint optimization strategy that integrates contrastive loss for image-text alignment and classification loss for accurate forgery discrimination.
What carries the argument
Reasoning-aligned representation created by contrastive learning between image embeddings and LLM-generated reasoning texts about forgery.
If this is right
- ReAlign outperforms state-of-the-art detectors in accuracy and generalization on benchmarks like AIGCDetectBenchmark.
- It handles complex, high-fidelity forgeries from modern generative models effectively.
- The method remains efficient and lightweight compared to full LLM-based approaches.
- Joint optimization of alignment and classification losses improves both semantic understanding and forgery discrimination.
Where Pith is reading between the lines
- This could allow forgery detection to scale to new generative models without retraining large systems.
- Similar alignment techniques might enhance other visual tasks with semantic reasoning from text.
- Future work could explore using different types of reasoning texts or optimizing the LLM specifically for visual artifact description.
Load-bearing premise
The reasoning texts generated by the LLM carry generalization and semantic sensitivity that can be transferred effectively to the visual model through contrastive alignment.
What would settle it
Evaluating the detector on a dataset of forgeries where the LLM's reasoning texts do not highlight the actual visual inconsistencies would show no improvement over standard visual-only models.
Figures
read the original abstract
The rise of AI-generated images (AIGIs) poses growing challenges for digital authenticity, prompting the need for efficient, generalizable image forgery detection systems. Existing methods, whether non-LLM-based or LLM-based, exhibit distinct advantages and limitations. While non-LLM-based models offer efficient low-level artifact detection, they often lack semantic understanding. Conversely, LLM-based methods provide strong semantic reasoning and explainability but are computationally intensive and less sensitive to subtle visual artifacts. Moreover, the true contribution of explanatory reasoning texts to forgery detection performance remains unclear. In this work, we investigate the intrinsic value and potential of LLM-generated reasoning texts, considering it a source of generalization and semantic-error sensitivity. Based on these findings, we propose ReAlign, a novel framework that distills high-quality reasoning texts generated by a GRPO-optimized LLM into a lightweight AIGI detector via contrastive learning. ReAlign effectively inherits the generalization ability and semantic sensitivity capability of reasoning textual representations, while remaining efficient and lightweight for deployment. Moreover, ReAlign adopts a tailored joint optimization strategy that integrates contrastive loss for image-text alignment and classification loss for accurate forgery discrimination. Experimental results on AIGCDetectBenchmark, AIGI-Holmes, and our newly constructed UltraSynth-10k demonstrate that ReAlign consistently outperforms existing state-of-the-art detectors in both accuracy and generalization, particularly when facing complex, high-fidelity forgeries from modern generative models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ReAlign, a framework for generalizable AIGI forgery detection that distills reasoning texts generated by a GRPO-optimized LLM into a lightweight visual detector via contrastive learning. It combines image-text contrastive alignment with classification loss to transfer generalization and semantic sensitivity from the textual representations, while remaining efficient. Experiments claim consistent outperformance over prior detectors on AIGCDetectBenchmark, AIGI-Holmes, and the newly introduced UltraSynth-10k benchmark, particularly for high-fidelity forgeries.
Significance. If the results and ablations hold under full scrutiny, the work offers a practical bridge between low-level artifact detectors and semantically rich but heavy LLM approaches, potentially improving deployment in real-world authenticity verification. The construction of UltraSynth-10k and the explicit investigation of reasoning text value are constructive contributions to the field.
major comments (2)
- [Abstract and Methods (distillation pipeline)] The central hypothesis that LLM-generated reasoning texts supply transferable generalization and semantic-error sensitivity (Abstract) is load-bearing for the performance claims, yet the manuscript provides no ablations isolating reasoning text quality versus generic captions or no-text baselines; without these, the outperformance on the three benchmarks cannot be confidently attributed to the proposed distillation mechanism.
- [Experiments] Experimental results section: reported gains on AIGCDetectBenchmark, AIGI-Holmes, and UltraSynth-10k lack error bars, multiple random seeds, or statistical significance tests, undermining the generalization claim especially given the review's note on absent full protocols.
minor comments (2)
- [Method] Clarify the precise form of the joint contrastive-plus-classification objective and any weighting hyperparameters in the optimization strategy.
- [Conclusion] Add a dedicated limitations paragraph discussing potential failure modes when the GRPO-optimized LLM produces low-quality reasoning on novel forgery types.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's potential significance. We address each major comment below with clarifications and commitments to revisions that strengthen the attribution of results to the proposed mechanism.
read point-by-point responses
-
Referee: [Abstract and Methods (distillation pipeline)] The central hypothesis that LLM-generated reasoning texts supply transferable generalization and semantic-error sensitivity (Abstract) is load-bearing for the performance claims, yet the manuscript provides no ablations isolating reasoning text quality versus generic captions or no-text baselines; without these, the outperformance on the three benchmarks cannot be confidently attributed to the proposed distillation mechanism.
Authors: We agree that isolating the contribution of reasoning texts is important for substantiating the central hypothesis. The manuscript does compare ReAlign against prior non-LLM and LLM-based detectors and includes a joint optimization analysis, but it lacks explicit ablations against generic captions or no-text baselines. In the revision we will add these experiments: (1) replacing reasoning texts with generic captions from a standard VLM such as BLIP, and (2) a no-text baseline that uses only the classification loss on image features. These additions will directly test whether the observed gains on AIGCDetectBenchmark, AIGI-Holmes, and UltraSynth-10k stem from the semantic-error sensitivity of the GRPO-optimized reasoning texts. revision: yes
-
Referee: [Experiments] Experimental results section: reported gains on AIGCDetectBenchmark, AIGI-Holmes, and UltraSynth-10k lack error bars, multiple random seeds, or statistical significance tests, undermining the generalization claim especially given the review's note on absent full protocols.
Authors: We acknowledge that the current single-run results limit confidence in the generalization claims. In the revised manuscript we will rerun all main experiments and ablations with at least three random seeds, report mean accuracy and standard deviation (error bars), and include statistical significance tests (paired t-tests or Wilcoxon signed-rank tests) against the strongest baselines. We will also expand the experimental protocols section with complete hyperparameter tables, training schedules, and data splits to address the noted absence of full protocols. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's core pipeline relies on an external GRPO-optimized LLM to generate reasoning texts, which are then distilled into a lightweight visual detector through contrastive alignment plus joint classification loss. No derivation step, equation, or performance claim is shown to reduce by construction to a fitted parameter or self-defined quantity within the paper itself; the generalization and semantic-sensitivity benefits are presented as an empirical hypothesis tested on three external benchmarks. The framework is self-contained against those benchmarks and does not invoke load-bearing self-citations or uniqueness theorems that collapse the central result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-generated reasoning texts serve as a source of generalization and semantic-error sensitivity for image forgery detection
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ReAlign leverages reasoning textual representations aligned with visual features through contrastive learning and a designed joint optimization strategy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhao- hai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Jun- yang Lin. Qwen2.5-vl technical repor...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
End-to-end reconstruction- classification learning for face forgery detection
Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4113–4122, 2022. 1
work page 2022
-
[3]
You-Ming Chang, Chen Yeh, Wei-Chen Chiu, and Ning Yu. Antifakeprompt: Prompt-tuned vision-language models are fake image detectors.arXiv preprint arXiv:2310.17419,
-
[4]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, and Dahua Lin. Sharegpt4v: Improving large multi-modal models with better captions. arXiv preprint arXiv:2311.12793, 2023. 3, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Liang Chen, Lei Li, Haozhe Zhao, Yifan Song, and Vinci. R1-v: Reinforcing super generalization ability in vision- language models with less than $3.https://github. com/Deep-Agent/R1-V, 2025. Accessed: 2025-02-02. 6
work page 2025
-
[6]
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V Le, Sergey Levine, and Yi Ma. Sft memorizes, rl generalizes: A compara- tive study of foundation model post-training.arXiv preprint arXiv:2501.17161, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 2, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale N Fung, and Steven Hoi. Instructblip: Towards general-purpose vision- language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023. 3
work page 2023
-
[9]
Bo Du, Xuekang Zhu, Xiaochen Ma, Chenfan Qu, Kai- wen Feng, Zhe Yang, Chi-Man Pun, Jian Liu, and Jizhe Zhou. Forensichub: A unified benchmark & codebase for all- domain fake image detection and localization.arXiv preprint arXiv:2505.11003, 2025. 2
-
[10]
Leveraging fre- quency analysis for deep fake image recognition
Joel Frank, Thorsten Eisenhofer, Lea Sch ¨onherr, Asja Fis- cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre- quency analysis for deep fake image recognition. InInter- national conference on machine learning, pages 3247–3258. PMLR, 2020. 6, 7
work page 2020
-
[11]
Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, et al. Seedream 3.0 technical report.arXiv preprint arXiv:2504.11346, 2025. 2, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization
Qing Huang, Zhipei Xu, Xuanyu Zhang, and Jian Zhang. Unishield: An adaptive multi-agent framework for unified forgery image detection and localization.arXiv preprint arXiv:2510.03161, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Mm-iml: Multi- modal image forgery detection and localization
Qing Huang, Xiangyu Yu, and Zhipei Xu. Mm-iml: Multi- modal image forgery detection and localization. In2025 IEEE International Conference on Image Processing (ICIP), pages 1588–1593. IEEE, 2025. 3
work page 2025
-
[15]
Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, and Guan- gliang Cheng. Sida: Social media image deepfake detection, localization and explanation with large multimodal model
-
[16]
Zhenglin Huang, Tianxiao Li, Xiangtai Li, Haiquan Wen, Yiwei He, Jiangning Zhang, Hao Fei, Xi Yang, Xiaowei Huang, Bei Peng, et al. So-fake: Benchmarking and explain- ing social media image forgery detection.arXiv preprint arXiv:2505.18660, 2025. 3, 4
-
[17]
Scaling up visual and vision-language representa- tion learning with noisy text supervision
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,
-
[18]
Fusing global and local features for gen- eralized ai-synthesized image detection
Yan Ju, Shan Jia, Lipeng Ke, Hongfei Xue, Koki Nagano, and Siwei Lyu. Fusing global and local features for gen- eralized ai-synthesized image detection. In2022 IEEE In- ternational Conference on Image Processing (ICIP), pages 3465–3469. IEEE, 2022. 6, 7
work page 2022
-
[19]
Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Wei- jia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, et al. Legion: Learning to ground and explain for synthetic image detection.arXiv preprint arXiv:2503.15264, 2025. 1, 3
-
[20]
Leveraging rep- resentations from intermediate encoder-blocks for synthetic image detection
Christos Koutlis and Symeon Papadopoulos. Leveraging rep- resentations from intermediate encoder-blocks for synthetic image detection. InEuropean Conference on Computer Vi- sion, pages 394–411. Springer, 2024. 6, 7
work page 2024
-
[21]
Contextual integrity in LLMs via reasoning and reinforcement learning
Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, Janard- han Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G Brinton, and Robert Sim. Contextual integrity in LLMs via reasoning and reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems (NeurIPS), 2025. 3
work page 2025
-
[22]
Hanzhe Li, Jiaran Zhou, Yuezun Li, Baoyuan Wu, Bin Li, and Junyu Dong. Freqblender: Enhancing deepfake detec- tion by blending frequency knowledge.Advances in Neural Information Processing Systems, 37:44965–44988, 2024. 1
work page 2024
-
[23]
Lion-fs: Fast & slow video-language thinker as on- line video assistant
Wei Li, Bing Hu, Rui Shao, Leyang Shen, and Liqiang Nie. Lion-fs: Fast & slow video-language thinker as on- line video assistant. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3240–3251, 2025. 3
work page 2025
-
[24]
Yuyuan Li, Chaochao Chen, Yizhao Zhang, Weiming Liu, Lingjuan Lyu, Xiaolin Zheng, Dan Meng, and Jun Wang. Ultrare: Enhancing receraser for recommendation unlearn- ing via error decomposition.Advances in Neural Informa- tion Processing Systems, 36:12611–12625, 2023. 1
work page 2023
-
[25]
Texture, shape and order matter: A new transformer design for sequential deepfake detection
Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, and Junyu Dong. Texture, shape and order matter: A new transformer design for sequential deepfake detection. In 2025 IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 202–211. IEEE, 2025. 1
work page 2025
-
[26]
Yuyuan Li, Yizhao Zhang, Weiming Liu, Xiaohua Feng, Zhongxuan Han, Chaochao Chen, and Chenggang Yan. Multi-objective unlearning in recommender systems via preference guided pareto exploration.IEEE Transactions on Services Computing, 2025. 1
work page 2025
-
[27]
Kaiqing Lin, Zhiyuan Yan, Ruoxin Chen, Junyan Ye, Ke- Yue Zhang, Yue Zhou, Peng Jin, Bin Li, Taiping Yao, and Shouhong Ding. Seeing before reasoning: A unified frame- work for generalizable and explainable fake image detection. arXiv preprint arXiv:2509.25502, 2025. 2, 4
-
[28]
Detecting generated images by real images
Bo Liu, Fan Yang, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo Gao. Detecting generated images by real images. In European Conference on Computer Vision, pages 95–110. Springer, 2022. 6, 7
work page 2022
-
[29]
Visual instruction tuning.Advances in neural information processing systems, 36, 2024
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36, 2024. 1, 3
work page 2024
-
[30]
ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
Jiawei Liu, Fanrui Zhang, Jiaying Zhu, Esther Sun, Qiang Zhang, and Zheng-Jun Zha. Forgerygpt: Multimodal large language model for explainable image forgery detection and localization.arXiv preprint arXiv:2410.10238, 2024. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Anwei Luo, Rizhao Cai, Chenqi Kong, Yakun Ju, Xiangui Kang, Jiwu Huang, and Alex C Kot Life. Forgery-aware adaptive learning with vision transformer for generalized face forgery detection.IEEE Transactions on Circuits and Systems for Video Technology, 2024. 3
work page 2024
-
[32]
Lareˆ 2: Latent reconstruction error based method for diffusion-generated image detection
Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. Lareˆ 2: Latent reconstruction error based method for diffusion-generated image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17006–17015, 2024. 6, 7
work page 2024
-
[33]
Iml-vit: Image manipulation localiza- tion by vision transformer.arXiv preprint arXiv:2307.14863,
Xiaochen Ma, Bo Du, Xianggen Liu, Ahmed Y Al Ham- madi, and Jizhe Zhou. Iml-vit: Image manipulation localiza- tion by vision transformer.arXiv preprint arXiv:2307.14863,
-
[34]
Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 5
work page 2008
-
[35]
Towards uni- versal fake image detectors that generalize across generative models
Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across generative models. InCVPR, 2023. 3, 6, 7
work page 2023
-
[36]
R OpenAI. Gpt-4 technical report. arxiv 2303.08774.View in Article, 2(5), 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Textsleuth: To- wards explainable tampered text detection.arXiv preprint arXiv:2412.14816, 2024
Chenfan Qu, Jian Liu, Haoxing Chen, Baihan Yu, Jingjing Liu, Weiqiang Wang, and Lianwen Jin. Textsleuth: To- wards explainable tampered text detection.arXiv preprint arXiv:2412.14816, 2024. 1
-
[38]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 6
work page 2021
-
[39]
A ro- bust approach to multimodal deepfake detection.Journal of Imaging, 9(6):122, 2023
Davide Salvi, Honggu Liu, Sara Mandelli, Paolo Bestagini, Wenbo Zhou, Weiming Zhang, and Stefano Tubaro. A ro- bust approach to multimodal deepfake detection.Journal of Imaging, 9(6):122, 2023. 2
work page 2023
-
[40]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022. 4
work page 2022
-
[41]
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen, Peng Liu, Jingcheng Li, Chunxin Fang, Yibo Ma, Jiajia Liao, Qiaoli Shen, Zilun Zhang, Kangjia Zhao, Qianqian Zhang, et al. Vlm-r1: A stable and generaliz- able r1-style large vision-language model.arXiv preprint arXiv:2504.07615, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Qiannan Shen and Jing Zhang. Ai-enhanced disaster risk prediction with explainable shap analysis: A multi-class classification approach using xgboost. 2025. 1
work page 2025
-
[43]
Learning on gradients: Generalized arti- facts representation for gan-generated images detection
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, and Yunchao Wei. Learning on gradients: Generalized arti- facts representation for gan-generated images detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12105–12114, 2023. 3, 6, 7
work page 2023
-
[44]
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 28130–28139, 2024. 6, 7
work page 2024
-
[45]
Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Inject- ing category common prompt in clip to enhance generaliza- tion in deepfake detection. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 7184–7192, 2025. 3
work page 2025
-
[46]
Tencent Hunyuan Team. Hunyuanimage 2.1: An efficient diffusion model for high-resolution (2k) text-to-image gener- ation.https://github.com/Tencent-Hunyuan/ HunyuanImage-2.1, 2025. 7, 8
work page 2025
-
[47]
Cnn-generated images are sur- prisingly easy to spot...for now
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are sur- prisingly easy to spot...for now. InCVPR, 2020. 6, 7
work page 2020
-
[48]
Cnn-generated images are surprisingly easy to spot
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020. 6
work page 2020
-
[49]
Opensdi: Spotting diffusion-generated images in the open world
Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. Opensdi: Spotting diffusion-generated images in the open world. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 4291–4301, 2025. 3
work page 2025
-
[50]
Dire for diffusion-generated image detection
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023. 6, 7
work page 2023
-
[51]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 1, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Reversible primitive–composition align- ment for continual vision–language learning
Canran Xiao, Tianxiang Xu, Yiyang Jiang, Haoyu Gao, Yuhan Wu, et al. Reversible primitive–composition align- ment for continual vision–language learning. InThe Four- teenth International Conference on Learning Representa- tions, 2026. 3
work page 2026
-
[53]
Confusion-resistant federated learn- ing via diffusion-based data harmonization on non-iid data
Canran Xiao et al. Confusion-resistant federated learn- ing via diffusion-based data harmonization on non-iid data. Advances in Neural Information Processing Systems, 37: 137495–137520, 2024. 1
work page 2024
-
[54]
Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models. InInternational Conference on Learning Representations, 2025. 1, 2, 3
work page 2025
-
[55]
Zhipei Xu, Xuanyu Zhang, Xing Zhou, and Jian Zhang. Avatarshield: Visual reinforcement learning for human-centric video forgery detection.arXiv preprint arXiv:2505.15173, 2025. 2, 3, 4
-
[56]
A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024
Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xi- aolong Jiang, Yao Hu, and Weidi Xie. A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024. 3, 5, 6, 7
-
[57]
Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, and Li Yuan. Generalizing deepfake video detection with plug-and-play: Video-level blending and spatiotemporal adapter tuning. arXiv preprint arXiv:2408.17065, 2024. 1
-
[58]
Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Junchi Yan, Shouhong Ding, et al. All patches matter, more patches better: Enhance ai-generated image detection via panoptic patch learning.arXiv preprint arXiv:2504.01396, 2025. 1
-
[59]
Jiawei Yao, Chuming Li, and Canran Xiao. Swift sampler: Efficient learning of sampler by 10 parameters.Advances in Neural Information Processing Systems, 37:59030–59053,
-
[60]
Depthssc: Monocular 3d semantic scene com- pletion via depth-spatial alignment and voxel adaptation
Jiawei Yao, Jusheng Zhang, Xiaochao Pan, Tong Wu, and Canran Xiao. Depthssc: Monocular 3d semantic scene com- pletion via depth-spatial alignment and voxel adaptation. In 2025 IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 2154–2163. IEEE, 2025. 1
work page 2025
-
[61]
Identifying money laundering risks in digital as- set transactions based on ai algorithms
Qian Yu, Zong Ke, Guofu Xiong, Yu Cheng, and Xiao- jun Guo. Identifying money laundering risks in digital as- set transactions based on ai algorithms. In2024 4th Inter- national Conference on Electronic Information Engineering and Computer Communication (EIECC), pages 1081–1085. IEEE, 2024. 1
work page 2024
-
[62]
Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, and Xing Wei. Janusvln: Decoupling semantics and spatiality with dual implicit memory for vision-language navigation.arXiv preprint arXiv:2509.22548, 2025. 3
-
[63]
Rongchao Zhang, Weiping Ding, Hongbin Han, Yongzhi Cao, Hanpin Wang, and Yu Huang. Strfilter: Multi-modal medical image fusion via structure-oriented adaptive filter- ing.Information Fusion, page 103888, 2025. 1
work page 2025
-
[64]
Molebridge: Synthetic space projecting with discrete markov bridges
Rongchao Zhang, Yu Huang, Yongzhi Cao, and Hanpin Wang. Molebridge: Synthetic space projecting with discrete markov bridges. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1
work page 2025
-
[65]
Exploit your la- tents: Coarse-grained protein backmapping with latent dif- fusion models
Rongchao Zhang, Yu Huang, Yiwei Lou, Yi Xin, Haixu Chen, Yongzhi Cao, and Hanpin Wang. Exploit your la- tents: Coarse-grained protein backmapping with latent dif- fusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1111–1119, 2025. 2
work page 2025
-
[66]
Badwindtunnel: Defending backdoor in high-noise simulated training with confidence variance
Ruyi Zhang, Songlei Jian, Yusong Tan, Heng Gao, Haifang Zhou, and Kai Lu. Badwindtunnel: Defending backdoor in high-noise simulated training with confidence variance. In Annual Meeting of the Association for Computational Lin- guistics, pages 9259–9273, 2025. 1
work page 2025
-
[67]
Editguard: Versatile image watermarking for tamper localization and copyright protection
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, and Jian Zhang. Editguard: Versatile image watermarking for tamper localization and copyright protection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11964–11974, 2024. 2
work page 2024
-
[68]
Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 3008–3018,
-
[69]
Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, and Jian Zhang. Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforce- ment learning. InProceedings of the AAAI Conference on Artificial Intelligence, 2026. 3
work page 2026
-
[70]
Reasoning as representation: Rethinking visual reinforcement learning in image quality assessment
Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, and Jian Zhang. Reasoning as representation: Rethinking visual reinforcement learning in image quality assessment. InInternational Conference on Learning Rep- resentations, 2026. 2
work page 2026
-
[71]
Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023. 1, 3, 5, 6, 7
-
[72]
Nan Zhong, Yiran Xu, Zhenxing Qian, and Xinpeng Zhang. Rich and poor texture contrast: A simple yet effective ap- proach for ai-generated image detection.CoRR, 2023. 2
work page 2023
-
[73]
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, and Rongrong Ji. Aigi-holmes: Towards explainable and gener- alizable ai-generated image detection via multimodal large language models.arXiv preprint arXiv:2507.02664, 2025. 2, 4, 6, 7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.