CHROMA: Detecting AI-Generated Images through Inter-Channel Color-Space Correlations
Pith reviewed 2026-06-27 18:23 UTC · model grok-4.3
The pith
Inter-channel color correlations differ systematically between real photos and AI-generated images and can be used to improve detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that inter-channel color correlations exhibit systematic, generator-specific differences from real photographs; augmenting standard RGB inputs with the corresponding correlation maps lets a fixed CNN backbone achieve competitive real-versus-generated discrimination and improved robustness under limited multi-generator supervision.
What carries the argument
Inter-channel correlation maps: pairwise Pearson correlations computed between color channels in multiple color spaces and concatenated as extra input channels to a CNN classifier.
If this is right
- Correlation-augmented inputs improve real-versus-generated discrimination.
- The detector shows greater robustness when trained with only a few samples from additional generators.
- Performance remains competitive with recent detectors while using a simple CNN and modest training budget.
- RGB and Lab color spaces yield the most visible separation in correlation distributions.
Where Pith is reading between the lines
- This cue may arise because current generative objectives do not explicitly penalize mismatches in cross-channel statistics.
- Forcing a generator to reproduce real-image correlation distributions could serve as a test of whether the cue can be removed.
- The same maps could be examined in video frames to check temporal consistency of the signal.
Load-bearing premise
The observed systematic differences in inter-channel correlation distributions between real and generated images will hold for generators and datasets not seen during the limited multi-generator training regime.
What would settle it
Testing the detector on images from a previously unseen generative model whose inter-channel correlation distributions match those of real photographs and finding no performance gain over an unaugmented baseline.
Figures
read the original abstract
The rapid adoption of diffusion and large-scale generative models has made it increasingly challenging to distinguish synthetic imagery from real photographs. While automated detectors have been proposed, their generalization to unseen generators remains brittle. To address this limitation, we investigate inter-channel color correlations, a lightweight and underexploited forensic cue. We first demonstrate that LPIPS, a widely used perceptual metric, exhibits inconsistent responses to perturbations that selectively alter channel dependence across different color-space parameterizations, indicating that cross-channel statistics are not uniformly constrained by common perceptual training objectives. Motivated by this, we analyze the distributions of pairwise inter-channel correlation features across multiple color spaces. Our analysis reveals systematic, generator-specific differences in these distributions, with RGB and Lab color spaces providing the most apparent separation between real and generated images. Building on this, we introduce Chroma, a detector of AI-generated images which augments standard RGB inputs with inter-channel correlation maps and employs a fixed CNN backbone trained with a modest computational budget. We assess its robustness under both single-generator training and a limited multi-generator supervision regime, where only a few samples from additional generators are available. Across a standard benchmark protocol, correlation-augmented inputs improve real-vs-generated discrimination and robustness, yielding performance competitive with recent detectors while maintaining a simple architecture and training procedure. Code is available at https://github.com/JPSoteloSilva/CHROMA
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that pairwise inter-channel correlations in color spaces (particularly RGB and Lab) exhibit systematic differences between real photographs and images from diffusion/generative models. Motivated by inconsistent LPIPS responses to channel-dependence perturbations, it augments standard RGB inputs with these correlation maps, trains a fixed CNN backbone under single-generator and limited multi-generator regimes, and reports improved real-vs-generated discrimination and robustness that remains competitive with recent detectors while using a simple architecture and modest training budget.
Significance. If the generalization claim holds, the work supplies a lightweight, underexploited forensic cue that can be added to existing CNN detectors without architectural overhaul or heavy compute, addressing a key brittleness in current AI-image detectors. The emphasis on limited-supervision robustness and the public code release are concrete strengths.
major comments (2)
- [Experiments / robustness evaluation] The central robustness claim (limited multi-generator regime yielding gains on unseen generators) rests on the assumption that the observed correlation-distribution differences are largely generator-agnostic rather than artifacts of the specific training generators' color pipelines. The manuscript demonstrates separation on a small set of generators and augments inputs accordingly, but does not report performance on a fully held-out generator family never seen even in the “few samples” regime; this is load-bearing for the generalization argument.
- [Abstract and §4 (results)] The abstract and method description state that correlation-augmented inputs improve discrimination, yet the provided text supplies no quantitative metrics (accuracy, AUC, dataset sizes, number of generators, or statistical significance of distribution shifts). Without these numbers or ablation tables, the degree of support for the performance claims cannot be assessed.
minor comments (2)
- [Method] Clarify the exact procedure for computing and normalizing the inter-channel correlation maps (e.g., window size, handling of constant patches) before they are concatenated as additional input channels.
- [Introduction / motivation] The LPIPS inconsistency is presented as motivation; a short quantitative example (specific perturbation magnitudes and resulting LPIPS deltas across color spaces) would make the link to cross-channel statistics more concrete.
Simulated Author's Rebuttal
We thank the referee for the thoughtful comments, which highlight important aspects of our robustness claims and presentation of results. We address each major comment below and outline revisions where appropriate.
read point-by-point responses
-
Referee: [Experiments / robustness evaluation] The central robustness claim (limited multi-generator regime yielding gains on unseen generators) rests on the assumption that the observed correlation-distribution differences are largely generator-agnostic rather than artifacts of the specific training generators' color pipelines. The manuscript demonstrates separation on a small set of generators and augments inputs accordingly, but does not report performance on a fully held-out generator family never seen even in the “few samples” regime; this is load-bearing for the generalization argument.
Authors: We agree that a fully held-out generator family (unseen even with few samples) would provide stronger support for the generator-agnostic nature of the correlation cues. Our experiments evaluate the limited multi-generator regime as stated, where a small number of samples from additional generators are included during training. To directly address this point, we will add experiments on one or more completely unseen generator families in the revised version, reporting performance under the single-generator and limited-supervision settings for comparison. revision: yes
-
Referee: [Abstract and §4 (results)] The abstract and method description state that correlation-augmented inputs improve discrimination, yet the provided text supplies no quantitative metrics (accuracy, AUC, dataset sizes, number of generators, or statistical significance of distribution shifts). Without these numbers or ablation tables, the degree of support for the performance claims cannot be assessed.
Authors: We acknowledge that the abstract and early results description would benefit from explicit quantitative metrics to allow immediate assessment of the claims. In the revision we will update the abstract to include key figures such as accuracy/AUC improvements, the number of generators and dataset sizes used, and reference to the ablation tables already present in §4. We will also ensure the method section cross-references these numbers explicitly. revision: yes
Circularity Check
No circularity: purely empirical observation of correlation distributions
full rationale
The paper's core contribution rests on direct empirical measurement of pairwise inter-channel correlation statistics across color spaces (RGB, Lab, etc.) on real vs. generated images, followed by input augmentation of a standard CNN. No equations, parameters, or predictions are defined in terms of themselves; the separation is reported as an observed data property rather than derived by construction from fitted inputs or prior self-citations. The method is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from the authors' own prior work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
IEEE Open Journal of Signal Processing pp
Bammey, Q.: Synthbuster: Towards detection of diffusion model generated images. IEEE Open Journal of Signal Processing pp. 1–9 (01 2023)
2023
-
[2]
Kaggle dataset, https://www.kaggle.com/datasets/kshitizbhargava/deepfake-face-images
Bhargava, K.: Stylegan-stylegan2 deepfake face images. Kaggle dataset, https://www.kaggle.com/datasets/kshitizbhargava/deepfake-face-images
-
[3]
IEEE Access12, 15642–15650 (2024)
Bird, J.J., Lotfi, A.: Cifake: Image classification and explainable identification of ai-generated synthetic images. IEEE Access12, 15642–15650 (2024)
2024
-
[4]
In: European Conference on Computer Vision (ECCV)
Chai, L., Bau, D., Lim, J., Chan, S.W., Isola, P.: What makes fake images de- tectable? understanding properties that generalize. In: European Conference on Computer Vision (ECCV). pp. 103–120. Springer (2020)
2020
-
[5]
In: Proceed- ings of the Computer Vision and Pattern Recognition Conference
Chu, B., Xu, X., Wang, X., Zhang, Y., You, W., Zhou, L.: Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 12830–12839 (2025)
2025
-
[6]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 973–982 (2023)
2023
-
[7]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cozzolino,D.,Poggi,G.,Corvi,R.,Nießner,M.,Verdoliva,L.:Raisingthebarofai- generated image detection with clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4356–4366 (2024)
2024
-
[8]
Proceedings of the 6th ACM Multimedia Sys- tems Conference (2015)
Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: a raw images dataset for digital image forensics. Proceedings of the 6th ACM Multimedia Sys- tems Conference (2015)
2015
-
[9]
In: 2009 IEEE Conference on Computer Vision and Pattern Recognition
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009)
2009
-
[10]
In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Dhariwal, P., Nichol, A.Q.: Diffusion models beat GANs on image synthesis. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021)
2021
-
[11]
In: Proceedings of the IEEE/CVF international conference on computer vision
Epstein, D.C., Jain, I., Wang, O., Zhang, R.: Online detection of ai-generated images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 382–392 (2023)
2023
-
[12]
In: IEEE International Conference on Multimedia and Expo (ICME)
Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are gan gen- erated images easy to detect? a critical analysis of the state-of-the-art. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE (2021)
2021
-
[13]
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)
2016
-
[14]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Huang, Z., Hu, J., Li, X., He, Y., Zhao, X., Peng, B., Wu, B., Huang, X., Cheng, G.: Sida: Social media image deepfake detection, localization and explanation with large multimodal model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 28831–28841 (2025)
2025
-
[15]
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp
Jia, Z., Huang, C., Zhu, Y., Fei, H., Duan, X., Yuan, Z., Deng, Y., Zhang, J., Zhang, J., Zhou, J.: Secret lies in color: Enhancing ai-generated images detection with color distribution analysis. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13445–13454 (2025)
2025
-
[16]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18706–18717 (2025) Title Suppressed Due to Excessive Length 15
2025
-
[17]
Signal Processing174, 107616 (2020)
Li, H., Li, B., Tan, S., Huang, J.: Identification of deep network generated images using disparities in color components. Signal Processing174, 107616 (2020)
2020
-
[18]
In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Li, Y., Bammey, Q., Gardella, M., Nikoukhah, T., Morel, J.M., Colom, M., Von Gioi, R.G.: Masksim: Detection of synthetic images by masked spectrum sim- ilarity analysis. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 3855–3865 (2024)
2024
-
[19]
In:EuropeanConferenceonComputerVision(ECCV).pp.87–103.Springer(2022)
Liu, Y., Wei, X., Wang, S., Shi, Y.Q.: Detecting generated images by real images. In:EuropeanConferenceonComputerVision(ECCV).pp.87–103.Springer(2022)
2022
-
[20]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Lorenz, P., Durall, R.L., Keuper, J.: Detecting images generated by deep diffusion models using their local intrinsic dimensionality. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 448–459 (2023)
2023
-
[21]
In: IEEE International Conference on Image Processing (ICIP)
Mandelli, S., Bestagini, P., Tubaro, S.: Detecting gan-generated images by or- thogonal training of multiple cnns. In: IEEE International Conference on Image Processing (ICIP). pp. 1301–1305. IEEE (2022)
2022
-
[22]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that gener- alize across generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24480–24489 (2023)
2023
-
[23]
Accessed: 2025-12-27
OpenAI: GPT Image models: gpt-image-1 (2025), https://platform.openai.com/docs/guides/image-generation, openAI API docu- mentation (GPT Image family includesgpt-image-1). Accessed: 2025-12-27
2025
-
[24]
In: Proceedings of the Computer Vision and Pattern Recog- nition Conference
Park, J., Owens, A.: Community forensics: Using thousands of generators to train fake image detectors. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference. pp. 8245–8257 (2025)
2025
-
[25]
Multimedia Tools and Applications84, 47721–47740 (07 2025)
Pontorno, O., Guarnera, L., Battiato, S.: Deepfeaturex-sn: Generalization of deep- fake detection via contrastive learning. Multimedia Tools and Applications84, 47721–47740 (07 2025)
2025
-
[26]
In: European Conference on Com- puter Vision
Popescu, A.C., Farid, H.: Statistical tools for digital forensics. In: Proceed- ings of the 6th International Conference on Information Hiding. p. 128–147. IH’04, Springer-Verlag, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3- 540-30114-1_10
-
[27]
In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (2024)
Tan, C., Zhang, L., Lin, W.: Rethinking the up-sampling operations in cnn-based deepfake image detection. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (2024)
2024
-
[28]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Tan, C., Zhao, Y., Zhang, L., Song, H., Ma, L., Guo, Y., Wang, S., Lin, W.: Learn- ing on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12105–12114 (2023)
2023
-
[29]
In: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security
Uhlenbrock, L., Cozzolino, D., Moussa, D., Verdoliva, L., Riess, C.: Did you note my palette? unveiling synthetic images through color statistics. In: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security. p. 47–52. IH&MMSec ’24, Association for Computing Machinery, New York, NY, USA (2024)
2024
-
[30]
In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV)
Wang, R., Ni, C., Wang, X., Jiang, L., Liu, Z.: Dire for diffusion-generated image detection. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 22405–22415 (2023)
2023
-
[31]
Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 8695–8704 (2020)
2020
-
[32]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.