DCT-CompCNN: A Novel Image Classification Network Using JPEG Compressed DCT Coefficients

Bulla Rajesh; Mohammed Javed; Ratnesh; Shubham Srivastava

arxiv: 1907.11503 · v1 · pith:YI6CNYW2new · submitted 2019-07-26 · 💻 cs.CV · cs.LG

DCT-CompCNN: A Novel Image Classification Network Using JPEG Compressed DCT Coefficients

Bulla Rajesh , Mohammed Javed , Ratnesh , Shubham Srivastava This is my paper

Pith reviewed 2026-05-24 15:57 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords image classificationJPEG compressionDCT coefficientsconvolutional neural networksDCT-CompCNNResNet-50Dog vs Cat datasetCIFAR-10

0 comments

The pith

CNNs trained on modified JPEG DCT coefficients outperform standard RGB-input models on image classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates training CNNs directly on JPEG compressed data instead of raw RGB images by altering how DCT coefficients are presented as input. It proposes the DCT-CompCNN architecture to process these modified coefficients and tests the idea on the Dog Vs Cat and CIFAR-10 datasets. Both the new architecture and ResNet-50 show higher accuracy when using the compressed DCT inputs than when using conventional RGB inputs. The work establishes that a suitable change to the coefficient representation lets networks learn effectively from already-compressed files.

Core claim

CNNs can be trained with JPEG compressed DCT coefficients after modifying their input representation, and this yields better classification performance than the conventional approach of feeding RGB pixel images into the network, as measured on the Dog Vs Cat and CIFAR-10 datasets with both ResNet-50 and the proposed DCT-CompCNN.

What carries the argument

DCT-CompCNN, the CNN architecture adapted to accept a modified version of JPEG DCT coefficients as its input representation.

If this is right

CNN classifiers can operate directly on compressed JPEG files without first decompressing to RGB.
The input modification improves accuracy on both the Dog Vs Cat and CIFAR-10 benchmarks.
Existing networks such as ResNet-50 benefit from the same DCT-coefficient input change.
Compressed representations can serve as a higher-performing alternative to raw pixels for image classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same input-representation change might extend to other lossy or lossless image formats to test whether compressed-domain learning is broadly useful.
Classification pipelines could avoid full decompression steps, lowering memory and compute costs on resource-limited devices.
Testing the modification on larger or more diverse datasets would clarify how far the accuracy gain generalizes.

Load-bearing premise

That a suitable modification to the input representation of JPEG DCT coefficients exists which lets CNNs learn from the compressed data and beat RGB models.

What would settle it

Retraining the identical architectures on unmodified DCT coefficients or without the described input change and finding equal or lower accuracy than the RGB baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.11503 by Bulla Rajesh, Mohammed Javed, Ratnesh, Shubham Srivastava.

**Figure 3.** Figure 3: Second approach employed in this paper i.e. the un-quantized JPEG compressed image is partially [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Our proposed DCT-CompCNN model for CIFAR-10 Dataset. III. Experimental Results As discussed in the previous section, we have done experimentation with both the approaches. In our first approach, we have quantized JPEG Compressed Image, which is partially decoded as well as dequantized, and then after some transformation it is fed to the network. Whereas in our second approach, we have an un-quantized Compr… view at source ↗

read the original abstract

The popularity of Convolutional Neural Network (CNN) in the field of Image Processing and Computer Vision has motivated researchers and industrialist experts across the globe to solve different challenges with high accuracy. The simplest way to train a CNN classifier is to directly feed the original RGB pixels images into the network. However, if we intend to classify images directly with its compressed data, the same approach may not work better, like in case of JPEG compressed images. This research paper investigates the issues of modifying the input representation of the JPEG compressed data, and then feeding into the CNN. The architecture is termed as DCT-CompCNN. This novel approach has shown that CNNs can also be trained with JPEG compressed DCT coefficients, and subsequently can produce a better performance in comparison with the conventional CNN approach. The efficiency of the modified input representation is tested with the existing ResNet-50 architecture and the proposed DCT-CompCNN architecture on a public image classification datasets like Dog Vs Cat and CIFAR-10 datasets, reporting a better performance

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims better CNN performance on JPEG DCT coefficients but never describes the input modification that supposedly makes it work.

read the letter

The core takeaway is that this work asserts CNNs can be trained directly on JPEG DCT coefficients to beat standard RGB models on Dog vs Cat and CIFAR-10, using both ResNet-50 and a new DCT-CompCNN, yet supplies no account of how the DCT blocks are turned into a usable tensor. The abstract frames this as a novel approach that investigates input representation changes, but the details are absent. The concept of compressed-domain networks is not new, so the contribution is really the specific architecture and the dataset tests. That is a legitimate extension for people who want to skip decompression in bandwidth-limited pipelines, and the practical motivation is reasonable. What the paper does well is flag the workflow issue and test the idea on two common datasets. The soft spot is not minor. Without any description of block flattening, chroma handling, coefficient scaling, normalization, or whether the input stays spatial or becomes per-block features, the performance claim has no verifiable basis. The stress-test note correctly identifies this as the load-bearing gap; any reported gains could stem from unstated preprocessing or training differences rather than the DCT route itself. The abstract also gives zero numbers, no architecture diagram, and no training protocol, which leaves the central empirical assertion unsupported. There is no code, data, or formal verification to fall back on. This paper would interest a narrow group working on efficient compressed-domain vision, but the missing technical steps make it impossible to assess or replicate. It does not show the clear engagement needed for serious consideration. I would not send it to peer review in this state.

Referee Report

3 major / 1 minor

Summary. The paper proposes DCT-CompCNN, a CNN that operates directly on modified JPEG DCT coefficients instead of RGB pixels. It claims that suitable input modifications allow both the proposed architecture and ResNet-50 to be trained on compressed data and to outperform conventional RGB-input CNNs on the Dog vs Cat and CIFAR-10 datasets.

Significance. If the performance claims were substantiated with reproducible experiments, the work would be relevant to compressed-domain vision pipelines that avoid explicit decompression. At present the manuscript supplies no quantitative results, no input-tensor construction details, and no training protocol, so the significance cannot be evaluated.

major comments (3)

[Abstract] Abstract: the central claim that the method 'produce[s] a better performance in comparison with the conventional CNN approach' is unsupported by any accuracy, precision, or runtime numbers, any tables, or any error bars.
[Abstract] Abstract and introduction: the 'issues of modifying the input representation of the JPEG compressed data' are stated to have been investigated, yet no description is given of tensor construction (8×8 block flattening or stacking, chroma handling, DC/AC scaling, normalization, or spatial vs. per-block layout).
[Abstract] Abstract: no architecture diagram, layer specification, or training protocol (optimizer, learning-rate schedule, data augmentation, or loss) is supplied for either DCT-CompCNN or the ResNet-50 baseline, preventing assessment of whether reported gains arise from the claimed DCT representation or from unstated differences in preprocessing or hyperparameters.

minor comments (1)

[Abstract] Abstract contains minor grammatical issues ('industrialist experts', 'like in case of JPEG compressed images') that should be revised for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract must be strengthened with quantitative results, explicit input construction details, architecture specifications, and training protocols to allow proper evaluation of the claims. The revised manuscript will incorporate these elements to address all points raised.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'produce[s] a better performance in comparison with the conventional CNN approach' is unsupported by any accuracy, precision, or runtime numbers, any tables, or any error bars.

Authors: We acknowledge this limitation in the current abstract. Although experimental results demonstrating improved accuracy on Dog vs Cat and CIFAR-10 are present in the manuscript's results section, we will revise the abstract to include specific accuracy figures, comparisons, and references to tables/figures with error bars. This will directly substantiate the performance claim. revision: yes
Referee: [Abstract] Abstract and introduction: the 'issues of modifying the input representation of the JPEG compressed data' are stated to have been investigated, yet no description is given of tensor construction (8×8 block flattening or stacking, chroma handling, DC/AC scaling, normalization, or spatial vs. per-block layout).

Authors: We will add a detailed subsection on input tensor construction in the methods section (and reference it from the abstract/introduction). This will explicitly describe 8×8 block processing, chroma handling, DC/AC coefficient scaling, normalization, and the tensor layout (spatial vs. per-block). revision: yes
Referee: [Abstract] Abstract: no architecture diagram, layer specification, or training protocol (optimizer, learning-rate schedule, data augmentation, or loss) is supplied for either DCT-CompCNN or the ResNet-50 baseline, preventing assessment of whether reported gains arise from the claimed DCT representation or from unstated differences in preprocessing or hyperparameters.

Authors: We agree these details are necessary for reproducibility. The revision will include an architecture diagram, layer-by-layer specifications for DCT-CompCNN, and the full training protocol (optimizer, learning-rate schedule, augmentation, loss) for both models. This will allow readers to confirm the source of any performance differences. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical claim with no derivation chain

full rationale

The paper is an empirical study claiming improved CNN performance on JPEG DCT inputs after unspecified modifications, tested on Dog vs Cat and CIFAR-10 with ResNet-50 and a proposed architecture. No equations, fitted parameters, uniqueness theorems, or self-citations appear in the provided text. The central assertion rests on experimental replication rather than any internal reduction of a prediction to its inputs by construction. This matches the default expectation of no significant circularity for non-derivational work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical machine-learning study; the abstract mentions no mathematical axioms, no free parameters beyond standard network weights, and no invented physical or theoretical entities.

pith-pipeline@v0.9.0 · 5714 in / 1227 out tokens · 24285 ms · 2026-05-24T15:57:00.358419+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Meng-Che Chuang, Jenq-Neng Hwang and Kresimir Williams, A Feature Learning and Object Recognition Framework for Underwater Fish Images, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, pp. 1862-72 APRIL 2016

work page 2016
[2]

33-40, 2014

Meng-Che Chuang, Jenq-Neng Hwang, Kresimir Williams, Supervised and Unsupervised Feature Extraction Methods for Underwater Fish Species Recognition, IEEE Conference Publications, pp. 33-40, 2014

work page 2014
[3]

Hanguen Kim, Jungmo Koo, Donghoonkim, Sungwoo Jung, Jae-Uk Shin, Serin Lee, Hyun Myung, Image- Based Monitoring of Jellyfish Using Deep Learning Architecture, IEEE sensors journal, vol. 16, no. 8

work page
[4]

17, 2018, pp.1307-1319

Muthukrishnan Ramprasath, M.Vijay Anand, Shanmuga sundaram Hariharan, Image Classification using Convolutional Neural Networks, International Journal of Pure and Applied Mathematics, Volume 119, No. 17, 2018, pp.1307-1319

work page 2018
[5]

Lai ZhiFei, Deng HuiFang, Medical Image Classification Based on Deep Features Extracted by Deep Model and Statistic Feature Fusion with Multilayer Perceptron, Hindawi Computational Intelligence and Neuroscience, 2018, pp.1-13

work page 2018
[6]

Sorwar G, Abraham Ajith and Dooley L.S., Texture classification based on DCT and soft computing, 10th IEEE International Conference on Fuzzy Systems, vol 3, 2002, pp.545 – 548

work page 2002
[7]

G. K. Wallace, ”The JPEG still picture compression standard,” in IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, Feb. 1992

work page 1992
[8]

Qacimy Bouchra El, Kerroum Mounir Ait and Ahmed Hammouch, Feature Extraction based on DCT for Handwritten Digit Recognition, International Journal of Computer Science Issues, Volume 11, Issue 6, No 2, November 2014, pp. 27-33

work page 2014
[9]

https : //docs.microsoft.com/en-us/openspecs/windows_protocols/ms-rdprfx/b550d1b5-f7d9-4a0c-9141- b3dca9d7f525 (July 2019)

work page 2019
[10]

Defee, ”Pattern recognition in compressed DCT domain,” 2004 International Conference on Image Processing, 2004

Daidi Zhong and I. Defee, ”Pattern recognition in compressed DCT domain,” 2004 International Conference on Image Processing, 2004. ICIP, Singapore, 2004, pp. 2031-2034 Vol. 3

work page 2004
[11]

Dan Fu and Gabriel Guimaraes Using compression to speed up image classification in artificial neural networks, 2016

work page 2016
[12]

Ulicny Matej and Rozenn Dahyot On using CNN with DCT based Image Data, 19th Irish Machine Vision and Image Processing Conference, 2017, pp. 44-51

work page 2017
[13]

Gueguen, Lionel, Alex Sergeev, Rosanne Liu and Jason Yosinski, ”Faster Neural Networks Straight from JPEG”, ICLR (2018)

work page 2018
[14]

Rectified linear units improve restricted Boltzmann machines,

V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML). NewYork, NY, USA, Omnipress, 2010, pp. 807-814

work page 2010
[15]

K. He, X. Zhang, S. Ren and J. Sun,”Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778

work page 2016
[16]

ImageNet Classification with Deep Convolutional Neural Networks

Krizhevsky Alex, Sutskever Ilya, E. Hinton Geoffrey, “ImageNet Classification with Deep Convolutional Neural Networks” , Neural Information Processing Systems, 2012

work page 2012
[17]

Image classification on CIFAR 10 Dataset,

Wang Yan Yan, “Image classification on CIFAR 10 Dataset,” International Journal of Scientific Research Engineering Technology (IJSRET), Vol 7, June 2018

work page 2018
[18]

https://www.kaggle.com/c/dogs-vs-cats (June 2019)

work page 2019
[19]

Murean Horea, Oltean Mihai, ”Fruit recognition from images using deep learning”, Acta Universitatis Sapientiae, Informatica,2018, vol.10, pp. 26-42

work page 2018
[20]

A review on document image analysis techniques directly in the compressed domain

M Javed, P. Nagabhushan, B.B. Chaudhuri, “A review on document image analysis techniques directly in the compressed domain”, Artificial Intelligence Review Volume 50, 539–568, 2018

work page 2018

[1] [1]

Meng-Che Chuang, Jenq-Neng Hwang and Kresimir Williams, A Feature Learning and Object Recognition Framework for Underwater Fish Images, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, pp. 1862-72 APRIL 2016

work page 2016

[2] [2]

33-40, 2014

Meng-Che Chuang, Jenq-Neng Hwang, Kresimir Williams, Supervised and Unsupervised Feature Extraction Methods for Underwater Fish Species Recognition, IEEE Conference Publications, pp. 33-40, 2014

work page 2014

[3] [3]

Hanguen Kim, Jungmo Koo, Donghoonkim, Sungwoo Jung, Jae-Uk Shin, Serin Lee, Hyun Myung, Image- Based Monitoring of Jellyfish Using Deep Learning Architecture, IEEE sensors journal, vol. 16, no. 8

work page

[4] [4]

17, 2018, pp.1307-1319

Muthukrishnan Ramprasath, M.Vijay Anand, Shanmuga sundaram Hariharan, Image Classification using Convolutional Neural Networks, International Journal of Pure and Applied Mathematics, Volume 119, No. 17, 2018, pp.1307-1319

work page 2018

[5] [5]

Lai ZhiFei, Deng HuiFang, Medical Image Classification Based on Deep Features Extracted by Deep Model and Statistic Feature Fusion with Multilayer Perceptron, Hindawi Computational Intelligence and Neuroscience, 2018, pp.1-13

work page 2018

[6] [6]

Sorwar G, Abraham Ajith and Dooley L.S., Texture classification based on DCT and soft computing, 10th IEEE International Conference on Fuzzy Systems, vol 3, 2002, pp.545 – 548

work page 2002

[7] [7]

G. K. Wallace, ”The JPEG still picture compression standard,” in IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, Feb. 1992

work page 1992

[8] [8]

Qacimy Bouchra El, Kerroum Mounir Ait and Ahmed Hammouch, Feature Extraction based on DCT for Handwritten Digit Recognition, International Journal of Computer Science Issues, Volume 11, Issue 6, No 2, November 2014, pp. 27-33

work page 2014

[9] [9]

https : //docs.microsoft.com/en-us/openspecs/windows_protocols/ms-rdprfx/b550d1b5-f7d9-4a0c-9141- b3dca9d7f525 (July 2019)

work page 2019

[10] [10]

Defee, ”Pattern recognition in compressed DCT domain,” 2004 International Conference on Image Processing, 2004

Daidi Zhong and I. Defee, ”Pattern recognition in compressed DCT domain,” 2004 International Conference on Image Processing, 2004. ICIP, Singapore, 2004, pp. 2031-2034 Vol. 3

work page 2004

[11] [11]

Dan Fu and Gabriel Guimaraes Using compression to speed up image classification in artificial neural networks, 2016

work page 2016

[12] [12]

Ulicny Matej and Rozenn Dahyot On using CNN with DCT based Image Data, 19th Irish Machine Vision and Image Processing Conference, 2017, pp. 44-51

work page 2017

[13] [13]

Gueguen, Lionel, Alex Sergeev, Rosanne Liu and Jason Yosinski, ”Faster Neural Networks Straight from JPEG”, ICLR (2018)

work page 2018

[14] [14]

Rectified linear units improve restricted Boltzmann machines,

V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML). NewYork, NY, USA, Omnipress, 2010, pp. 807-814

work page 2010

[15] [15]

K. He, X. Zhang, S. Ren and J. Sun,”Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778

work page 2016

[16] [16]

ImageNet Classification with Deep Convolutional Neural Networks

Krizhevsky Alex, Sutskever Ilya, E. Hinton Geoffrey, “ImageNet Classification with Deep Convolutional Neural Networks” , Neural Information Processing Systems, 2012

work page 2012

[17] [17]

Image classification on CIFAR 10 Dataset,

Wang Yan Yan, “Image classification on CIFAR 10 Dataset,” International Journal of Scientific Research Engineering Technology (IJSRET), Vol 7, June 2018

work page 2018

[18] [18]

https://www.kaggle.com/c/dogs-vs-cats (June 2019)

work page 2019

[19] [19]

Murean Horea, Oltean Mihai, ”Fruit recognition from images using deep learning”, Acta Universitatis Sapientiae, Informatica,2018, vol.10, pp. 26-42

work page 2018

[20] [20]

A review on document image analysis techniques directly in the compressed domain

M Javed, P. Nagabhushan, B.B. Chaudhuri, “A review on document image analysis techniques directly in the compressed domain”, Artificial Intelligence Review Volume 50, 539–568, 2018

work page 2018