Convolutional Neural Networks on Randomized Data
Pith reviewed 2026-05-24 16:18 UTC · model grok-4.3
The pith
Randomizing image pixels destroys local hierarchies and leaves standard CNNs unable to capture the resulting long-range correlations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By randomizing image pixels the hierarchical structure of the data is destroyed and long range correlations are introduced which standard CNNs are not able to capture. Their classification accuracy is heavily dependent on the class similarities as well as the pixel randomization process. Dilated convolutions are able to recover some of the pixel correlations and improve the performance.
What carries the argument
The pixel randomization process, which removes local correlations while introducing long-range dependencies that fixed-size convolutional kernels cannot efficiently span.
If this is right
- Standard CNN accuracy falls on permuted images relative to the original data.
- The size of the accuracy drop depends on the degree of similarity between classes.
- Different randomization procedures produce measurably different performance levels.
- Dilated convolutions raise accuracy by spanning some of the newly created long-range correlations.
Where Pith is reading between the lines
- The result suggests CNNs' advantage on images stems from an inductive bias toward locality rather than from a general ability to discover any statistical pattern.
- Architectures lacking a strong locality bias may prove more suitable for tasks whose natural correlations are global rather than local.
- The same randomization test could be applied to non-image data reshaped into grids to check whether the effect is specific to visual hierarchies.
Load-bearing premise
The observed drops in CNN accuracy are caused specifically by the destruction of local hierarchical correlations rather than by other experimental factors such as training procedure or dataset choice.
What would settle it
A standard CNN achieving near-original accuracy on randomized-pixel images even when the classes have low visual similarity would falsify the claim.
Figures
read the original abstract
Convolutional Neural Networks (CNNs) are build specifically for computer vision tasks for which it is known that the input data is a hierarchical structure based on locally correlated elements. The question that naturally arises is what happens with the performance of CNNs if one of the basic properties of the data is removed, e.g. what happens if the image pixels are randomly permuted? Intuitively one expects that the convolutional network performs poorly in these circumstances in contrast to a multilayer perceptron (MLPs) whose classification accuracy should not be affected by the pixel randomization. This work shows that by randomizing image pixels the hierarchical structure of the data is destroyed and long range correlations are introduced which standard CNNs are not able to capture. We show that their classification accuracy is heavily dependent on the class similarities as well as the pixel randomization process. We also indicate that dilated convolutions are able to recover some of the pixel correlations and improve the performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that randomizing image pixels destroys the hierarchical local structure of natural images and introduces long-range correlations that standard CNNs cannot capture, in contrast to MLPs whose performance should remain unaffected. It reports that CNN classification accuracy on such data depends heavily on class similarities and the details of the randomization process, and that dilated convolutions can recover some correlations to improve performance.
Significance. If the central experimental claims hold after addressing controls, the work would illustrate the dependence of CNN inductive biases on local pixel correlations and motivate architectures with expanded receptive fields for non-standard data distributions. No machine-checked proofs, parameter-free derivations, or reproducible code are described.
major comments (2)
- [Abstract] Abstract: the claim that CNN accuracy 'is heavily dependent on the class similarities as well as the pixel randomization process' is presented without any datasets, quantitative metrics, error bars, number of trials, or statistical controls, so the data cannot be verified to support the stated qualitative findings.
- [Abstract] Abstract: the attribution of performance changes specifically to destruction of hierarchical local correlations (rather than to unisolated factors such as global vs. class-conditional permutation, altered input statistics, or differing CNN/MLP training regimes) is not supported by any described controls, making the central contrast with MLPs and the dilated-convolution claim load-bearing but unverified.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the abstract. We address each point below and indicate the revisions we will make to improve clarity and verifiability while preserving the manuscript's core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that CNN accuracy 'is heavily dependent on the class similarities as well as the pixel randomization process' is presented without any datasets, quantitative metrics, error bars, number of trials, or statistical controls, so the data cannot be verified to support the stated qualitative findings.
Authors: We agree that the abstract is currently qualitative and does not reference specific datasets or metrics. The full manuscript reports experiments on CIFAR-10 and MNIST using global pixel permutations, with classification accuracies compared across class-similarity groupings and multiple randomization seeds. Results are averaged over several independent trials. We will revise the abstract to name the datasets, note the use of accuracy metrics with standard deviations, and indicate that findings are based on repeated trials. revision: yes
-
Referee: [Abstract] Abstract: the attribution of performance changes specifically to destruction of hierarchical local correlations (rather than to unisolated factors such as global vs. class-conditional permutation, altered input statistics, or differing CNN/MLP training regimes) is not supported by any described controls, making the central contrast with MLPs and the dilated-convolution claim load-bearing but unverified.
Authors: The manuscript applies the same global random permutation to all images and trains both CNNs and MLPs under matched optimization settings on the identical permuted inputs, so that the MLP serves as a control showing that the data remains classifiable when local structure is removed. We will add explicit discussion in the revised manuscript clarifying that the permutation is global (not class-conditional) and that input statistics are held constant across models. Additional ablation with dilated convolutions is already present in the experiments; we will expand the text to better isolate the receptive-field effect from other factors. revision: partial
Circularity Check
No circularity: experimental results on pixel randomization contain no derivations or self-referential reductions
full rationale
The paper reports empirical measurements of CNN accuracy on images whose pixels have been randomly permuted. No equations, parameter fits, or derivation chains appear in the provided text. The central claims are direct observations (accuracy depends on class similarities and randomization process; dilated convolutions improve results) rather than quantities defined in terms of themselves or obtained by fitting then relabeling as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The work is therefore self-contained against external benchmarks; any concerns about missing controls belong to correctness or experimental design, not circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Natural images possess a hierarchical structure based on locally correlated pixels
- domain assumption Random pixel permutation destroys this local hierarchy and introduces long-range correlations
Reference graph
Works this paper leans on
-
[1]
J. Duarte et al. Fast inference of deep neural networks in FPGAs for particle physics. JINST, 13(07):P07027, 2018
work page 2018
- [2]
-
[3]
I. Goodfellow, Y . Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www. deeplearningbook.org
work page 2016
-
[4]
CryptoDL: Deep Neural Networks over Encrypted Data
E. Hesamifard, H. Takabi, and M. Ghasemi. Cryptodl: Deep neural networks over encrypted data. CoRR, abs/1711.05189, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
D. P. Kingma and J. Ba. Adam: A method for stochas- tic optimization. CoRR, abs/1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[6]
A. Krizhevsky. Learning multiple layers of features from tiny images. 2009
work page 2009
- [7]
-
[8]
H. W. Lin, M. Tegmark, and D. Rolnick. Why does deep and cheap learning work so well? Journal of Statistical Physics, 168(6):1223–1247, Sep 2017
work page 2017
-
[9]
C. F. Madrazo, I. H. Cacha, L. L. Iglesias, and J. M. de Lucas. Application of a convolutional neural net- work for image classification to the analysis of colli- sions in high energy physics. CoRR, abs/1708.07034, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan and A. Zisserman. Very deep convo- lutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
H. Xiao, K. Rasul, and R. V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learn- ing algorithms, 2017
work page 2017
-
[12]
Y . B. Y . LeCun, L. Bottou and P. Haffner. Gradient- based learning applied to document recognition. Pro- ceedings of the IEEE, 86(11):2278-2324, November 1998
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.