pith. sign in

arxiv: 2606.08864 · v1 · pith:HYZ3A5DOnew · submitted 2026-06-07 · 💻 cs.CV · cs.LG

CHROMA: Detecting AI-Generated Images through Inter-Channel Color-Space Correlations

Pith reviewed 2026-06-27 18:23 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords AI-generated image detectioninter-channel color correlationsimage forensicscolor space analysisdiffusion model detectionCNN-based classifierforensic cue
0
0 comments X

The pith

Inter-channel color correlations differ systematically between real photos and AI-generated images and can be used to improve detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that pairwise correlations between color channels vary in generator-specific ways across color spaces, with RGB and Lab showing the clearest separation from real-image statistics. These differences arise because common perceptual losses like LPIPS do not uniformly constrain cross-channel dependence. The authors build Chroma by computing correlation maps from the input image and feeding them alongside the original RGB channels into a standard CNN trained with modest resources. Under both single-generator and limited multi-generator training, the augmented inputs raise discrimination accuracy and robustness to unseen generators while keeping the architecture simple. A reader would care because the cue is lightweight, directly measurable, and does not require retraining large models when new generators appear.

Core claim

The central claim is that inter-channel color correlations exhibit systematic, generator-specific differences from real photographs; augmenting standard RGB inputs with the corresponding correlation maps lets a fixed CNN backbone achieve competitive real-versus-generated discrimination and improved robustness under limited multi-generator supervision.

What carries the argument

Inter-channel correlation maps: pairwise Pearson correlations computed between color channels in multiple color spaces and concatenated as extra input channels to a CNN classifier.

If this is right

  • Correlation-augmented inputs improve real-versus-generated discrimination.
  • The detector shows greater robustness when trained with only a few samples from additional generators.
  • Performance remains competitive with recent detectors while using a simple CNN and modest training budget.
  • RGB and Lab color spaces yield the most visible separation in correlation distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This cue may arise because current generative objectives do not explicitly penalize mismatches in cross-channel statistics.
  • Forcing a generator to reproduce real-image correlation distributions could serve as a test of whether the cue can be removed.
  • The same maps could be examined in video frames to check temporal consistency of the signal.

Load-bearing premise

The observed systematic differences in inter-channel correlation distributions between real and generated images will hold for generators and datasets not seen during the limited multi-generator training regime.

What would settle it

Testing the detector on images from a previously unseen generative model whose inter-channel correlation distributions match those of real photographs and finding no performance gain over an unaugmented baseline.

Figures

Figures reproduced from arXiv: 2606.08864 by Juan Pablo Sotelo, Marina Gardella, Pablo Mus\'e.

Figure 1
Figure 1. Figure 1: Color-correlation cues. RAISE-1K real image [8] vs. a visually matched GPT-Image 1 replica [23]; the split Lab correlation map exposes structured chro￾matic and texture differences under matched content. 1 Introduction The ability to generate photorealistic images from text prompts has advanced rapidly, driven by diffusion models and large-scale generative training [10]. These systems are now widely used i… view at source ↗
Figure 2
Figure 2. Figure 2: RAISE-1k-averaged LPIPS distance between an original RGB image and [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Log-density estimates of pairwise inter-channel correlation features across [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean Wasserstein distance matrices between inter-channel correla [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

The rapid adoption of diffusion and large-scale generative models has made it increasingly challenging to distinguish synthetic imagery from real photographs. While automated detectors have been proposed, their generalization to unseen generators remains brittle. To address this limitation, we investigate inter-channel color correlations, a lightweight and underexploited forensic cue. We first demonstrate that LPIPS, a widely used perceptual metric, exhibits inconsistent responses to perturbations that selectively alter channel dependence across different color-space parameterizations, indicating that cross-channel statistics are not uniformly constrained by common perceptual training objectives. Motivated by this, we analyze the distributions of pairwise inter-channel correlation features across multiple color spaces. Our analysis reveals systematic, generator-specific differences in these distributions, with RGB and Lab color spaces providing the most apparent separation between real and generated images. Building on this, we introduce Chroma, a detector of AI-generated images which augments standard RGB inputs with inter-channel correlation maps and employs a fixed CNN backbone trained with a modest computational budget. We assess its robustness under both single-generator training and a limited multi-generator supervision regime, where only a few samples from additional generators are available. Across a standard benchmark protocol, correlation-augmented inputs improve real-vs-generated discrimination and robustness, yielding performance competitive with recent detectors while maintaining a simple architecture and training procedure. Code is available at https://github.com/JPSoteloSilva/CHROMA

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that pairwise inter-channel correlations in color spaces (particularly RGB and Lab) exhibit systematic differences between real photographs and images from diffusion/generative models. Motivated by inconsistent LPIPS responses to channel-dependence perturbations, it augments standard RGB inputs with these correlation maps, trains a fixed CNN backbone under single-generator and limited multi-generator regimes, and reports improved real-vs-generated discrimination and robustness that remains competitive with recent detectors while using a simple architecture and modest training budget.

Significance. If the generalization claim holds, the work supplies a lightweight, underexploited forensic cue that can be added to existing CNN detectors without architectural overhaul or heavy compute, addressing a key brittleness in current AI-image detectors. The emphasis on limited-supervision robustness and the public code release are concrete strengths.

major comments (2)
  1. [Experiments / robustness evaluation] The central robustness claim (limited multi-generator regime yielding gains on unseen generators) rests on the assumption that the observed correlation-distribution differences are largely generator-agnostic rather than artifacts of the specific training generators' color pipelines. The manuscript demonstrates separation on a small set of generators and augments inputs accordingly, but does not report performance on a fully held-out generator family never seen even in the “few samples” regime; this is load-bearing for the generalization argument.
  2. [Abstract and §4 (results)] The abstract and method description state that correlation-augmented inputs improve discrimination, yet the provided text supplies no quantitative metrics (accuracy, AUC, dataset sizes, number of generators, or statistical significance of distribution shifts). Without these numbers or ablation tables, the degree of support for the performance claims cannot be assessed.
minor comments (2)
  1. [Method] Clarify the exact procedure for computing and normalizing the inter-channel correlation maps (e.g., window size, handling of constant patches) before they are concatenated as additional input channels.
  2. [Introduction / motivation] The LPIPS inconsistency is presented as motivation; a short quantitative example (specific perturbation magnitudes and resulting LPIPS deltas across color spaces) would make the link to cross-channel statistics more concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments, which highlight important aspects of our robustness claims and presentation of results. We address each major comment below and outline revisions where appropriate.

read point-by-point responses
  1. Referee: [Experiments / robustness evaluation] The central robustness claim (limited multi-generator regime yielding gains on unseen generators) rests on the assumption that the observed correlation-distribution differences are largely generator-agnostic rather than artifacts of the specific training generators' color pipelines. The manuscript demonstrates separation on a small set of generators and augments inputs accordingly, but does not report performance on a fully held-out generator family never seen even in the “few samples” regime; this is load-bearing for the generalization argument.

    Authors: We agree that a fully held-out generator family (unseen even with few samples) would provide stronger support for the generator-agnostic nature of the correlation cues. Our experiments evaluate the limited multi-generator regime as stated, where a small number of samples from additional generators are included during training. To directly address this point, we will add experiments on one or more completely unseen generator families in the revised version, reporting performance under the single-generator and limited-supervision settings for comparison. revision: yes

  2. Referee: [Abstract and §4 (results)] The abstract and method description state that correlation-augmented inputs improve discrimination, yet the provided text supplies no quantitative metrics (accuracy, AUC, dataset sizes, number of generators, or statistical significance of distribution shifts). Without these numbers or ablation tables, the degree of support for the performance claims cannot be assessed.

    Authors: We acknowledge that the abstract and early results description would benefit from explicit quantitative metrics to allow immediate assessment of the claims. In the revision we will update the abstract to include key figures such as accuracy/AUC improvements, the number of generators and dataset sizes used, and reference to the ablation tables already present in §4. We will also ensure the method section cross-references these numbers explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical observation of correlation distributions

full rationale

The paper's core contribution rests on direct empirical measurement of pairwise inter-channel correlation statistics across color spaces (RGB, Lab, etc.) on real vs. generated images, followed by input augmentation of a standard CNN. No equations, parameters, or predictions are defined in terms of themselves; the separation is reported as an observed data property rather than derived by construction from fitted inputs or prior self-citations. The method is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from the authors' own prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the approach relies on standard CNN training and established color-space conversions.

pith-pipeline@v0.9.1-grok · 5785 in / 1075 out tokens · 25305 ms · 2026-06-27T18:23:18.182610+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 1 canonical work pages

  1. [1]

    IEEE Open Journal of Signal Processing pp

    Bammey, Q.: Synthbuster: Towards detection of diffusion model generated images. IEEE Open Journal of Signal Processing pp. 1–9 (01 2023)

  2. [2]

    Kaggle dataset, https://www.kaggle.com/datasets/kshitizbhargava/deepfake-face-images

    Bhargava, K.: Stylegan-stylegan2 deepfake face images. Kaggle dataset, https://www.kaggle.com/datasets/kshitizbhargava/deepfake-face-images

  3. [3]

    IEEE Access12, 15642–15650 (2024)

    Bird, J.J., Lotfi, A.: Cifake: Image classification and explainable identification of ai-generated synthetic images. IEEE Access12, 15642–15650 (2024)

  4. [4]

    In: European Conference on Computer Vision (ECCV)

    Chai, L., Bau, D., Lim, J., Chan, S.W., Isola, P.: What makes fake images de- tectable? understanding properties that generalize. In: European Conference on Computer Vision (ECCV). pp. 103–120. Springer (2020)

  5. [5]

    In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

    Chu, B., Xu, X., Wang, X., Zhang, Y., You, W., Zhou, L.: Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 12830–12839 (2025)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 973–982 (2023)

  7. [7]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Cozzolino,D.,Poggi,G.,Corvi,R.,Nießner,M.,Verdoliva,L.:Raisingthebarofai- generated image detection with clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4356–4366 (2024)

  8. [8]

    Proceedings of the 6th ACM Multimedia Sys- tems Conference (2015)

    Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: a raw images dataset for digital image forensics. Proceedings of the 6th ACM Multimedia Sys- tems Conference (2015)

  9. [9]

    In: 2009 IEEE Conference on Computer Vision and Pattern Recognition

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009)

  10. [10]

    In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

    Dhariwal, P., Nichol, A.Q.: Diffusion models beat GANs on image synthesis. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021)

  11. [11]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Epstein, D.C., Jain, I., Wang, O., Zhang, R.: Online detection of ai-generated images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 382–392 (2023)

  12. [12]

    In: IEEE International Conference on Multimedia and Expo (ICME)

    Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are gan gen- erated images easy to detect? a critical analysis of the state-of-the-art. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE (2021)

  13. [13]

    In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)

  14. [14]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Huang, Z., Hu, J., Li, X., He, Y., Zhao, X., Peng, B., Wu, B., Huang, X., Cheng, G.: Sida: Social media image deepfake detection, localization and explanation with large multimodal model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 28831–28841 (2025)

  15. [15]

    2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp

    Jia, Z., Huang, C., Zhu, Y., Fei, H., Duan, X., Yuan, Z., Deng, Y., Zhang, J., Zhang, J., Zhou, J.: Secret lies in color: Enhancing ai-generated images detection with color distribution analysis. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13445–13454 (2025)

  16. [16]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18706–18717 (2025) Title Suppressed Due to Excessive Length 15

  17. [17]

    Signal Processing174, 107616 (2020)

    Li, H., Li, B., Tan, S., Huang, J.: Identification of deep network generated images using disparities in color components. Signal Processing174, 107616 (2020)

  18. [18]

    In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    Li, Y., Bammey, Q., Gardella, M., Nikoukhah, T., Morel, J.M., Colom, M., Von Gioi, R.G.: Masksim: Detection of synthetic images by masked spectrum sim- ilarity analysis. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 3855–3865 (2024)

  19. [19]

    In:EuropeanConferenceonComputerVision(ECCV).pp.87–103.Springer(2022)

    Liu, Y., Wei, X., Wang, S., Shi, Y.Q.: Detecting generated images by real images. In:EuropeanConferenceonComputerVision(ECCV).pp.87–103.Springer(2022)

  20. [20]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Lorenz, P., Durall, R.L., Keuper, J.: Detecting images generated by deep diffusion models using their local intrinsic dimensionality. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 448–459 (2023)

  21. [21]

    In: IEEE International Conference on Image Processing (ICIP)

    Mandelli, S., Bestagini, P., Tubaro, S.: Detecting gan-generated images by or- thogonal training of multiple cnns. In: IEEE International Conference on Image Processing (ICIP). pp. 1301–1305. IEEE (2022)

  22. [22]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that gener- alize across generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24480–24489 (2023)

  23. [23]

    Accessed: 2025-12-27

    OpenAI: GPT Image models: gpt-image-1 (2025), https://platform.openai.com/docs/guides/image-generation, openAI API docu- mentation (GPT Image family includesgpt-image-1). Accessed: 2025-12-27

  24. [24]

    In: Proceedings of the Computer Vision and Pattern Recog- nition Conference

    Park, J., Owens, A.: Community forensics: Using thousands of generators to train fake image detectors. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference. pp. 8245–8257 (2025)

  25. [25]

    Multimedia Tools and Applications84, 47721–47740 (07 2025)

    Pontorno, O., Guarnera, L., Battiato, S.: Deepfeaturex-sn: Generalization of deep- fake detection via contrastive learning. Multimedia Tools and Applications84, 47721–47740 (07 2025)

  26. [26]

    In: European Conference on Com- puter Vision

    Popescu, A.C., Farid, H.: Statistical tools for digital forensics. In: Proceed- ings of the 6th International Conference on Information Hiding. p. 128–147. IH’04, Springer-Verlag, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3- 540-30114-1_10

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (2024)

    Tan, C., Zhang, L., Lin, W.: Rethinking the up-sampling operations in cnn-based deepfake image detection. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (2024)

  28. [28]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Tan, C., Zhao, Y., Zhang, L., Song, H., Ma, L., Guo, Y., Wang, S., Lin, W.: Learn- ing on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12105–12114 (2023)

  29. [29]

    In: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security

    Uhlenbrock, L., Cozzolino, D., Moussa, D., Verdoliva, L., Riess, C.: Did you note my palette? unveiling synthetic images through color statistics. In: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security. p. 47–52. IH&MMSec ’24, Association for Computing Machinery, New York, NY, USA (2024)

  30. [30]

    In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV)

    Wang, R., Ni, C., Wang, X., Jiang, L., Liu, Z.: Dire for diffusion-generated image detection. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 22405–22415 (2023)

  31. [31]

    Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 8695–8704 (2020)

  32. [32]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)