pith. sign in

arxiv: 2605.23645 · v1 · pith:IRCKLKJBnew · submitted 2026-05-22 · 💻 cs.LG · cs.AI

Learning Through Noise: Why Subliminal Learning Works and When It Fails

Pith reviewed 2026-05-25 05:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords subliminal learningknowledge distillationoutput headscompatible headstask-unrelated noiseMNIST experimentsneural network transferauxiliary heads
0
0 comments X

The pith

Subliminal learning from noise occurs when output heads are compatible, not when initializations match.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that students can acquire task knowledge from teachers by training only on unrelated noise inputs and outputs. This transfer happens because auxiliary heads for the noise and class heads for the task stay compatible between models. Experiments on MNIST keep this compatibility while randomizing hidden layers, removing or adding layers, and switching from MLP to CNN architectures. With both heads aligned, students reach teacher-level accuracy in good cases. The setting also yields a theory and upper bounds on when the transfer must fail.

Core claim

Subliminal learning is governed by compatible output heads. Splitting outputs into an auxiliary head for task-unrelated noise and a class head for classification allows transfer of a recoverable teacher signal even with random hidden-layer initializations or architectural changes. When class heads remain compatible as well, students trained solely on noise inputs can approach and sometimes match teacher performance on the original task.

What carries the argument

Compatible output heads (auxiliary head for noise signals plus class head for classification) that keep the teacher signal recoverable in the student.

If this is right

  • Subliminal learning persists without shared or matched initializations between teacher and student.
  • Students reach near teacher accuracy on the task when both auxiliary and class heads stay compatible.
  • Upper bounds on failure can be derived from the head-compatibility condition alone.
  • Architecture modifications such as layer removal, addition, or MLP-to-CNN switches do not block transfer if heads remain compatible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Design choices that enforce head compatibility could be used to control unintended bias transfer in distillation pipelines.
  • The same compatibility principle might explain limits on knowledge transfer in other settings where inputs are replaced by noise or synthetic data.
  • Testing the bounds on larger image or language models would show whether head compatibility remains the dominant constraint outside the MNIST regime.

Load-bearing premise

The controlled MNIST setup with explicitly split auxiliary and class heads isolates head compatibility as the decisive factor and supports general upper bounds independent of task or data.

What would settle it

Finding reliable subliminal transfer when the auxiliary or class heads are made incompatible, or finding no transfer when the heads are kept compatible across the tested architecture changes, would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.23645 by Bel\'en Hidalgo-Ogalde, Roman D. Ventzke, Valentin Neuhaus, Vincent C. Brockers, Viola Priesemann.

Figure 1
Figure 1. Figure 1: Subliminal learning transfers task information through task-unrelated noise via compatible teacher–student output heads. (a) We study models whose latent representation is connected to an aux head ΩA producing task-unrelated auxiliary logits and a class head ΩC producing task logits. (b) A teacher model is first trained on labeled MNIST [16] using the class head. (c) The trained teacher is then queried on … view at source ↗
Figure 2
Figure 2. Figure 2: Subliminal learning is robust to hidden-layer random initialization but fragile to output-head incompatibility. We test which shared student–teacher components are required for subliminal learning by randomly reinitializing different parts of the student before training. Orange regions indicate reinitialized components, red bars show student accuracy after training, and the blue bar shows teacher accuracy.… view at source ↗
Figure 3
Figure 3. Figure 3: Compatible output heads are sufficient for subliminal learning across architectural and dataset changes, but student capacity and task complexity determine recoverability. We test subliminal learning when teacher and student share only the aux head and class heads while keeping a fixed teacher setup whereas the student hidden architecture or task is varied. (a) Varying the student first hidden-layer dimens… view at source ↗
Figure 4
Figure 4. Figure 4: Subliminal learning depends jointly on aux-head capacity and noise samples, with clear regimes of bottleneck, recovery, and saturation. We vary the number of auxiliary neurons m and the number of noise samples N seen per student epoch to test how recoverability depends on aux￾head capacity and noise exposure. (a) Student accuracy across the (m, N) plane. Increasing either m or N improves subliminal learnin… view at source ↗
Figure 5
Figure 5. Figure 5: Output-head perturbations reveal compatibility limits for subliminal learning and validate theory-derived robustness bounds. Gaussian noise of strength δ is added to either the student’s class or aux head before auxiliary training. (a–d) Perturbing either head reduces student accuracy and teacher–student head similarity, indicating that both readout and aux-head compatibility are required for successful si… view at source ↗
Figure 6
Figure 6. Figure 6: Shared initialization alone does not guarantee subliminal learning, instead excessive latent dimensionality drives head drift and eventually breaks the effect. We vary the shared latent dimension d of teacher and student while keeping architecture and initialization otherwise identical. (a) Teacher accuracy quickly saturates with increasing d, whereas student accuracy first improves and then collapses at l… view at source ↗
Figure 7
Figure 7. Figure 7: Perturbing the student’s aux head reduces the similarity between teacher and student hidden-layer updates forcing the student to learn a rotated teacher representation Without perturbation we observe an alignment of teacher and student weight-changes, as presented in eq. (19). Training was performed for a single epoch to capture changes at the beginning of training. The theory describing an upper bound for… view at source ↗
Figure 8
Figure 8. Figure 8: The class head remains stable during initial training of a randomly initialized teacher. (a) After random weight initialization, class-head vectors ωc are statistically approximately orthog￾onal, while the latent-representation of the training data forms an unstructured point-cloud with inseparable classes. (b) After some training symmetry is broken and the representation adapts in the directions of their … view at source ↗
Figure 9
Figure 9. Figure 9: Increasing dlatent amplifies the effective supervised signal at the class head by making the frozen latent representation more linearly separable. Teacher accuracy with only the class head trainable, plotted against latent dimension dlatent. Since all earlier layers remain fixed at initialization, performance improvements must arise from the readout alone. Larger latent spaces therefore provide a more favo… view at source ↗
Figure 10
Figure 10. Figure 10: For training a CNN student, we test different resolution levels of spatially correlated Perlin [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: We fix the aux-head weights during training, to separate the effect of self-correction and [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Output-head perturbations reveal compatibility limits for subliminal learning. The scale of the perturbation strength goes beyond the scale of ≈ 0.062 which is the average weight scale of the network. The decrease in accuracy and cosine similarity observed in fig. 5 continues beyond this level. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Fixed-head ablations identify aux-head drift as the dominant high-dimensional failure mode. We repeat the latent-dimension sweep while freezing different output heads. (a,e,i) In the baseline, student accuracy collapses at large d as both teacher class-head drift and student aux￾head drift increase. (b,f,j) Fixing the aux head strongly reduces the high-d collapse even though the teacher class head still d… view at source ↗
Figure 14
Figure 14. Figure 14: The decrease in accuracy also observed for the MNIST case is caused by the size of the aux head. We repeat the latent dimension sweep performed in fig. 6 and fig. 13 for a setup trained on the balanced EMNIST with n = 47 classes. The dip observed in accuracy after d = m indicates that this dip is not caused by the number of classes and rather by the number of auxiliary neurons. 29 [PITH_FULL_IMAGE:figure… view at source ↗
read the original abstract

In the context of artificial neural networks, subliminal learning refers to the transfer of task-relevant knowledge or unintended biases from teacher to student models through distillation on task-unrelated input$\unicode{x2013}$output pairs. Prior explanations tie this effect to shared or closely matched teacher$\unicode{x2013}$student initialization. We show that a closely matched initialization is not necessary. Instead, subliminal learning is governed by compatible output heads. Using a controlled MNIST setting, we split outputs into an auxiliary head (for auxiliary, task-unrelated noise signals) and a class head (for classification) to demonstrate subliminal learning occurs$\unicode{x2014}$even when we randomly initialize hidden layers and remove layers, add new layers, or change the architecture (MLP-to-CNN). Compatible auxiliary heads enable transfer of a recoverable teacher signal, bringing the student's representations closer to the teacher's. When the class heads remain compatible as well, students trained only on task-unrelated noise can approach, and in favorable regimes match, teacher-level task performance. Our setting enables us to develop a theory that explains the mechanism of subliminal learning and to derive upper bounds on when subliminal learning fails. Together, our results turn subliminal learning from a surprising transfer effect into a theoretically grounded mechanism with predictable limits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that subliminal learning—the transfer of task-relevant knowledge via distillation on task-unrelated input-output pairs—is governed by compatible output heads rather than shared or matched initializations. In a controlled MNIST setup, outputs are partitioned into an auxiliary head (for task-unrelated noise) and a class head; experiments show transfer occurs even after random initialization of hidden layers, layer removal/addition, or MLP-to-CNN architecture changes. Compatible auxiliary heads bring student representations closer to the teacher's, and when class heads are also compatible, students can approach or match teacher task performance. The setting is used to derive a theory of the mechanism and upper bounds on failure conditions.

Significance. If the central claim and bounds hold beyond the specific construction, the work would convert subliminal learning from an empirical curiosity into a mechanistically understood process with testable limits, with potential implications for distillation, knowledge transfer, and bias propagation in neural networks. The controlled isolation of head compatibility and the attempt to derive bounds are positive features.

major comments (2)
  1. [theory / upper-bounds section] Theory/upper-bounds derivation (referenced in abstract as enabling 'upper bounds on when subliminal learning fails'): the bounds are obtained inside the MNIST split-head construction, where auxiliary noise signals are defined to be task-unrelated and the heads are explicitly factored. The derivation therefore relies on properties (e.g., orthogonality between auxiliary signals and class logits) that are introduced by the architectural partition itself; it is not shown that the same bounds remain valid for standard distillation pipelines that lack this explicit factorization, undermining the claim that the bounds are task- and distribution-independent.
  2. [experimental results on architecture transfer] Experimental claims (§ on architecture changes and performance matching): the demonstration that students match teacher performance when class heads remain compatible is shown only inside the auxiliary/class-head split. Because the split is an additional modeling assumption not present in conventional distillation, the results do not yet establish that head compatibility (rather than the split itself) is the governing factor in general settings.
minor comments (2)
  1. Clarify the precise mathematical definition of 'head compatibility' (e.g., whether it is measured by cosine similarity of weight matrices, logit correlation, or another metric) and state it before the experimental sections.
  2. The abstract states that 'closely matched initialization is not necessary'; the manuscript should explicitly contrast the random-initialization regime against a matched-initialization baseline in the same figure or table to make the comparison quantitative.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, clarifying the intended scope of our controlled construction while agreeing that explicit statements about its limitations are needed.

read point-by-point responses
  1. Referee: [theory / upper-bounds section] Theory/upper-bounds derivation (referenced in abstract as enabling 'upper bounds on when subliminal learning fails'): the bounds are obtained inside the MNIST split-head construction, where auxiliary noise signals are defined to be task-unrelated and the heads are explicitly factored. The derivation therefore relies on properties (e.g., orthogonality between auxiliary signals and class logits) that are introduced by the architectural partition itself; it is not shown that the same bounds remain valid for standard distillation pipelines that lack this explicit factorization, undermining the claim that the bounds are task- and distribution-independent.

    Authors: We agree that the upper bounds and mechanistic derivation are obtained inside the split-head MNIST construction, where the explicit auxiliary/class factorization introduces the orthogonality and task-unrelated signal properties used in the proofs. The manuscript does not demonstrate that identical bounds hold verbatim in unfactored standard distillation pipelines. In revision we will (i) qualify the abstract and theory section to state that the bounds characterize failure modes within this controlled isolation of head compatibility, and (ii) add a limitations paragraph explaining that the construction provides a tractable setting for deriving explicit limits rather than claiming immediate task- and distribution-independence for arbitrary pipelines. This revision will be made. revision: yes

  2. Referee: [experimental results on architecture transfer] Experimental claims (§ on architecture changes and performance matching): the demonstration that students match teacher performance when class heads remain compatible is shown only inside the auxiliary/class-head split. Because the split is an additional modeling assumption not present in conventional distillation, the results do not yet establish that head compatibility (rather than the split itself) is the governing factor in general settings.

    Authors: The split-head construction is deliberately introduced to hold all other variables fixed while varying only head compatibility, thereby isolating it from initialization and architecture effects. The reported architecture-transfer and performance-matching results therefore hold under this controlled isolation. We do not claim the split itself is present in conventional distillation; rather, the experiments show that once heads are compatible, transfer occurs even after random hidden-layer re-initialization, layer addition/removal, and MLP-to-CNN changes. In revision we will add a dedicated discussion paragraph that (a) reiterates the role of the split as an experimental control and (b) sketches how head-compatibility diagnostics could be applied in unfactored pipelines. No new experiments are planned for this revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained within controlled setting

full rationale

The paper uses a controlled MNIST setup with explicitly split auxiliary and class heads to demonstrate that subliminal learning depends on head compatibility (rather than initialization) and to derive a theory plus upper bounds on failure within that framework. The abstract states the setting 'enables us to develop a theory... and to derive upper bounds,' without claiming task- or distribution-independent generality. No equations, self-citations, or reductions are quoted that would make any prediction equivalent to its inputs by construction. The central claims remain experimentally grounded in the described architecture rather than tautological or fitted-by-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no information available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5778 in / 1086 out tokens · 24754 ms · 2026-05-25T05:12:49.918853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 6 internal anchors

  1. [1]

    Náñez, and Yuka Sasaki

    Takeo Watanabe, José E. Náñez, and Yuka Sasaki. Perceptual learning without perception. Nature, 413:844–848, 2001. doi: 10.1038/35101601

  2. [2]

    Seitz and Takeo Watanabe

    Aaron R. Seitz and Takeo Watanabe. Is subliminal learning really passive?Nature, 422:36,

  3. [3]

    doi: 10.1038/422036a

  4. [4]

    Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

    Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Sören Mindermann, Jacob Hilton, Samuel Marks, and Owain Evans. Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

  5. [5]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

  6. [6]

    Emergent misalignment: Narrow finetuning can produce broadly misaligned LLMs

    Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, and Owain Evans. Emergent misalignment: Narrow finetuning can produce broadly misaligned LLMs. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 4043–4068, 2025

  7. [7]

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Evan Hubinger et al. Sleeper agents: Training deceptive LLMs that persist through safety training.arXiv preprint arXiv:2401.05566, 2024

  8. [8]

    What is left after distillation? how knowledge transfer impacts fairness and bias.Transactions on Machine Learning Research, 2025

    Alireza Mohammadshahi and Yani Ioannou. What is left after distillation? how knowledge transfer impacts fairness and bias.Transactions on Machine Learning Research, 2025

  9. [9]

    Poisoning attacks on llms require a near-constant number of poison samples.arXiv preprint arXiv:2510.07192, 2025

    Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, et al. Poisoning attacks on llms require a near-constant number of poison samples.arXiv preprint arXiv:2510.07192, 2025. 11

  10. [10]

    Sustained gradient alignment mediates subliminal learning in a multi-step setting: Evidence from MNIST auxiliary logit distillation experiment

    Chayanon Kitkana and Shivam Arora. Sustained gradient alignment mediates subliminal learning in a multi-step setting: Evidence from MNIST auxiliary logit distillation experiment. InICLR 2026 Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL), 2026

  11. [11]

    Subliminal effects in your data: A general mechanism via log-linearity.arXiv preprint arXiv:2602.04863, 2026

    Ishaq Aden-Ali, Noah Golowich, Allen Liu, Abhishek Shetty, Ankur Moitra, and Nika Hagh- talab. Subliminal effects in your data: A general mechanism via log-linearity.arXiv preprint arXiv:2602.04863, 2026

  12. [12]

    Towards understanding subliminal learning: When and how hidden biases transfer.arXiv preprint arXiv:2509.23886, 2025

    Simon Schrodi, Elias Kempf, Fazl Barez, and Thomas Brox. Towards understanding subliminal learning: When and how hidden biases transfer.arXiv preprint arXiv:2509.23886, 2025

  13. [13]

    Token entanglement in subliminal learning

    Amir Zur, Zhuofan Ying, Alexander Russell Loftus, Kerem ¸ Sahin, Steven Yu, Lucia Quirke, Tamar Rott Shaham, Natalie Shapira, Hadas Orgad, and David Bau. Token entanglement in subliminal learning. InMechanistic Interpretability Workshop at NeurIPS 2025, 2025

  14. [14]

    Subliminal Steering: Stronger Encoding of Hidden Signals

    George Morgulis and John Hewitt. Subliminal steering: Stronger encoding of hidden signals. arXiv preprint arXiv:2604.25783, 2026

  15. [15]

    Data-Free Knowledge Distillation for Deep Neural Networks

    Raphael Gontijo Lopes, Stefano Fenu, and Thad Starner. Data-free knowledge distillation for deep neural networks.arXiv preprint arXiv:1710.07535, 2017

  16. [16]

    Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K

    Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. Dreaming to distill: Data-free knowledge transfer via DeepInversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8712–8721, 2020

  17. [17]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

  18. [18]

    Emnist: Extending mnist to handwritten letters

    Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. Emnist: Extending mnist to handwritten letters. In2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE, 2017

  19. [19]

    Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

  20. [20]

    Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation.arXiv preprint arXiv:2105.08919, 2021

    Taehyeon Kim, Jaehoon Oh, NakYil Kim, Sangwook Cho, and Se-Young Yun. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation.arXiv preprint arXiv:2105.08919, 2021

  21. [21]

    An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985

    Ken Perlin. An image synthesizer.ACM Siggraph Computer Graphics, 19(3):287–296, 1985

  22. [22]

    Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

    Mor Geva, Avi Caciularu, Kevin Wang, and Yoav Goldberg. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. InProceedings of the 2022 conference on empirical methods in natural language processing, pages 30–45, 2022

  23. [23]

    Toy Models of Superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition.arXiv preprint arXiv:2209.10652, 2022

  24. [24]

    Shared global and local geometry of language model embeddings.arXiv preprint arXiv:2503.21073, 2025

    Andrew Lee, Melanie Weber, Fernanda Viégas, and Martin Wattenberg. Shared global and local geometry of language model embeddings.arXiv preprint arXiv:2503.21073, 2025

  25. [25]

    Training-free tokenizer transplantation via orthogonal matching pursuit.arXiv preprint arXiv:2506.06607, 2025

    Charles Goddard and Fernando Fernandes Neto. Training-free tokenizer transplantation via orthogonal matching pursuit.arXiv preprint arXiv:2506.06607, 2025

  26. [26]

    Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington.Wide neural networks of any depth evolve as linear models under gradient descent

    Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington.Wide neural networks of any depth evolve as linear models under gradient descent. Curran Associates Inc., Red Hook, NY , USA, 2019. 12

  27. [27]

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. InProceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015

  28. [28]

    Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020. doi: 10.1073/pnas.2015509117. URL https://www.pnas.org/ doi/abs/10.1073/pnas.2015509117

  29. [29]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/ guo17a.html

  30. [30]

    Discovering and overcoming limitations of noise- engineered data-free knowledge distillation.Advances in Neural Information Processing Systems, 35:4902–4912, 2022

    Piyush Raikwar and Deepak Mishra. Discovering and overcoming limitations of noise- engineered data-free knowledge distillation.Advances in Neural Information Processing Systems, 35:4902–4912, 2022

  31. [31]

    Feature visualization.Distill, 2(11): e7, 2017

    Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization.Distill, 2(11): e7, 2017

  32. [32]

    Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness

    Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. InInternational conference on learning representations, 2018

  33. [33]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 13 A Mathematical Background A.1 Subliminal Learning Setting We consider a "black box" neural network model fθ that maps an input vector x(i) ∈R D into a latent space Rd. For convenience we shall call these latent representations z(i) =f θ(x(i...

  34. [34]

    It needs to generalize the prediction of the teacher latent output from noise inputs to the data samplesx fθ(S,final)(x)≈f θ(T ,final)(x).(9)

    The student network needs to learn the latent-representation of the teacher sufficiently well. It needs to generalize the prediction of the teacher latent output from noise inputs to the data samplesx fθ(S,final)(x)≈f θ(T ,final)(x).(9)

  35. [35]

    better than random

    The final class head of the teacher and the student class head need to be sufficiently close Ω(T,final) C ≈Ω (S,init) C .(10) If the student has learned the teacher’s latent-output and the class head of both is similar enough, the student’s classification probabilities will be close to the teacher’s. Conversely, having an incorrect class head will degrade...

  36. [36]

    stability

    =:β∈ O(1) , independent of d. Importantly, for high latent dimensions d≫1 , these random vectors become pairwise (approximately) orthogonal since their cosine similarity scales as 1√ d. Hence, for typical initializations andm≪d,Ω ⊺ AΩA will effectively become a random orthogonal projection of the latent-space onto an m-dimensional sub-space (up to a the c...