pith. sign in

arxiv: 1907.10935 · v1 · pith:EWCNC7WWnew · submitted 2019-07-25 · 💻 cs.CV

Convolutional Neural Networks on Randomized Data

Pith reviewed 2026-05-24 16:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords convolutional neural networkspixel randomizationimage classificationdilated convolutionshierarchical structurelong-range correlationspermuted pixelsclass similarity
0
0 comments X

The pith

Randomizing image pixels destroys local hierarchies and leaves standard CNNs unable to capture the resulting long-range correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests what occurs when the local pixel correlations that define natural images are removed through random permutation. It reports that standard convolutional networks then lose accuracy because they cannot efficiently model the long-range dependencies that randomization creates. Results vary strongly with how similar the target classes are to one another and with the details of the permutation process. Dilated convolutions, by expanding receptive fields, recover some of those correlations and raise performance. A reader would care because the experiment directly probes whether CNNs succeed on images primarily by exploiting hierarchical local structure.

Core claim

By randomizing image pixels the hierarchical structure of the data is destroyed and long range correlations are introduced which standard CNNs are not able to capture. Their classification accuracy is heavily dependent on the class similarities as well as the pixel randomization process. Dilated convolutions are able to recover some of the pixel correlations and improve the performance.

What carries the argument

The pixel randomization process, which removes local correlations while introducing long-range dependencies that fixed-size convolutional kernels cannot efficiently span.

If this is right

  • Standard CNN accuracy falls on permuted images relative to the original data.
  • The size of the accuracy drop depends on the degree of similarity between classes.
  • Different randomization procedures produce measurably different performance levels.
  • Dilated convolutions raise accuracy by spanning some of the newly created long-range correlations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result suggests CNNs' advantage on images stems from an inductive bias toward locality rather than from a general ability to discover any statistical pattern.
  • Architectures lacking a strong locality bias may prove more suitable for tasks whose natural correlations are global rather than local.
  • The same randomization test could be applied to non-image data reshaped into grids to check whether the effect is specific to visual hierarchies.

Load-bearing premise

The observed drops in CNN accuracy are caused specifically by the destruction of local hierarchical correlations rather than by other experimental factors such as training procedure or dataset choice.

What would settle it

A standard CNN achieving near-original accuracy on randomized-pixel images even when the classes have low visual similarity would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.10935 by Cristian Ivan.

Figure 3
Figure 3. Figure 3: Left panel: random sample of Fashion-MNIST images; [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy of a CNN and MLP trained on Fashion-MNIST [PITH_FULL_IMAGE:figures/full_fig_p002_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left panel: random sample of CIFAR10 images; right [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of a natural image (top-left) and its patch-wise [PITH_FULL_IMAGE:figures/full_fig_p003_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy of a CNN and MLP running on CIFAR10 im [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: CNN classification accuracy as a function of the number [PITH_FULL_IMAGE:figures/full_fig_p004_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Examples of a natural image and its pixel-wise random [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Classification accuracy of the VGG16-like CNN, MLP [PITH_FULL_IMAGE:figures/full_fig_p005_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mean (upper row) and standard deviation (lower row) [PITH_FULL_IMAGE:figures/full_fig_p006_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Confusion matrix for CNN on natural Fashion-MNIST [PITH_FULL_IMAGE:figures/full_fig_p006_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Confusion matrix for MLP on natural CIFAR10 images [PITH_FULL_IMAGE:figures/full_fig_p007_15.png] view at source ↗
read the original abstract

Convolutional Neural Networks (CNNs) are build specifically for computer vision tasks for which it is known that the input data is a hierarchical structure based on locally correlated elements. The question that naturally arises is what happens with the performance of CNNs if one of the basic properties of the data is removed, e.g. what happens if the image pixels are randomly permuted? Intuitively one expects that the convolutional network performs poorly in these circumstances in contrast to a multilayer perceptron (MLPs) whose classification accuracy should not be affected by the pixel randomization. This work shows that by randomizing image pixels the hierarchical structure of the data is destroyed and long range correlations are introduced which standard CNNs are not able to capture. We show that their classification accuracy is heavily dependent on the class similarities as well as the pixel randomization process. We also indicate that dilated convolutions are able to recover some of the pixel correlations and improve the performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that randomizing image pixels destroys the hierarchical local structure of natural images and introduces long-range correlations that standard CNNs cannot capture, in contrast to MLPs whose performance should remain unaffected. It reports that CNN classification accuracy on such data depends heavily on class similarities and the details of the randomization process, and that dilated convolutions can recover some correlations to improve performance.

Significance. If the central experimental claims hold after addressing controls, the work would illustrate the dependence of CNN inductive biases on local pixel correlations and motivate architectures with expanded receptive fields for non-standard data distributions. No machine-checked proofs, parameter-free derivations, or reproducible code are described.

major comments (2)
  1. [Abstract] Abstract: the claim that CNN accuracy 'is heavily dependent on the class similarities as well as the pixel randomization process' is presented without any datasets, quantitative metrics, error bars, number of trials, or statistical controls, so the data cannot be verified to support the stated qualitative findings.
  2. [Abstract] Abstract: the attribution of performance changes specifically to destruction of hierarchical local correlations (rather than to unisolated factors such as global vs. class-conditional permutation, altered input statistics, or differing CNN/MLP training regimes) is not supported by any described controls, making the central contrast with MLPs and the dilated-convolution claim load-bearing but unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the abstract. We address each point below and indicate the revisions we will make to improve clarity and verifiability while preserving the manuscript's core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that CNN accuracy 'is heavily dependent on the class similarities as well as the pixel randomization process' is presented without any datasets, quantitative metrics, error bars, number of trials, or statistical controls, so the data cannot be verified to support the stated qualitative findings.

    Authors: We agree that the abstract is currently qualitative and does not reference specific datasets or metrics. The full manuscript reports experiments on CIFAR-10 and MNIST using global pixel permutations, with classification accuracies compared across class-similarity groupings and multiple randomization seeds. Results are averaged over several independent trials. We will revise the abstract to name the datasets, note the use of accuracy metrics with standard deviations, and indicate that findings are based on repeated trials. revision: yes

  2. Referee: [Abstract] Abstract: the attribution of performance changes specifically to destruction of hierarchical local correlations (rather than to unisolated factors such as global vs. class-conditional permutation, altered input statistics, or differing CNN/MLP training regimes) is not supported by any described controls, making the central contrast with MLPs and the dilated-convolution claim load-bearing but unverified.

    Authors: The manuscript applies the same global random permutation to all images and trains both CNNs and MLPs under matched optimization settings on the identical permuted inputs, so that the MLP serves as a control showing that the data remains classifiable when local structure is removed. We will add explicit discussion in the revised manuscript clarifying that the permutation is global (not class-conditional) and that input statistics are held constant across models. Additional ablation with dilated convolutions is already present in the experiments; we will expand the text to better isolate the receptive-field effect from other factors. revision: partial

Circularity Check

0 steps flagged

No circularity: experimental results on pixel randomization contain no derivations or self-referential reductions

full rationale

The paper reports empirical measurements of CNN accuracy on images whose pixels have been randomly permuted. No equations, parameter fits, or derivation chains appear in the provided text. The central claims are direct observations (accuracy depends on class similarities and randomization process; dilated convolutions improve results) rather than quantities defined in terms of themselves or obtained by fitting then relabeling as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The work is therefore self-contained against external benchmarks; any concerns about missing controls belong to correctness or experimental design, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about image structure and the effect of permutation; no free parameters or invented entities are introduced in the provided abstract.

axioms (2)
  • domain assumption Natural images possess a hierarchical structure based on locally correlated pixels
    Invoked in the opening sentence of the abstract as the reason CNNs are built for vision tasks.
  • domain assumption Random pixel permutation destroys this local hierarchy and introduces long-range correlations
    Stated as the direct consequence of the randomization process in the abstract.

pith-pipeline@v0.9.0 · 5673 in / 1429 out tokens · 29607 ms · 2026-05-24T16:18:28.507013+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 4 internal anchors

  1. [1]

    Duarte et al

    J. Duarte et al. Fast inference of deep neural networks in FPGAs for particle physics. JINST, 13(07):P07027, 2018

  2. [2]

    Erhan, Y

    D. Erhan, Y . Bengio, A. Courville, and P. Vincent. Vi- sualizing higher-layer features of a deep network.Uni- versity of Montreal, 1341(3):1, 2009

  3. [3]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www. deeplearningbook.org

  4. [4]

    CryptoDL: Deep Neural Networks over Encrypted Data

    E. Hesamifard, H. Takabi, and M. Ghasemi. Cryptodl: Deep neural networks over encrypted data. CoRR, abs/1711.05189, 2017

  5. [5]

    D. P. Kingma and J. Ba. Adam: A method for stochas- tic optimization. CoRR, abs/1412.6980, 2014

  6. [6]

    Krizhevsky

    A. Krizhevsky. Learning multiple layers of features from tiny images. 2009

  7. [7]

    LeCun and C

    Y . LeCun and C. Cortes. MNIST handwritten digit database. 2010

  8. [8]

    H. W. Lin, M. Tegmark, and D. Rolnick. Why does deep and cheap learning work so well? Journal of Statistical Physics, 168(6):1223–1247, Sep 2017

  9. [9]

    C. F. Madrazo, I. H. Cacha, L. L. Iglesias, and J. M. de Lucas. Application of a convolutional neural net- work for image classification to the analysis of colli- sions in high energy physics. CoRR, abs/1708.07034, 2017

  10. [10]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman. Very deep convo- lutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

  11. [11]

    H. Xiao, K. Rasul, and R. V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learn- ing algorithms, 2017

  12. [12]

    Y . B. Y . LeCun, L. Bottou and P. Haffner. Gradient- based learning applied to document recognition. Pro- ceedings of the IEEE, 86(11):2278-2324, November 1998