From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

Enrique Alba; Ezequiel Lopez-Rubio

arxiv: 2605.25939 · v1 · pith:5IA5LPD6new · submitted 2026-05-25 · 💻 cs.LG · cs.AI

From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

Enrique Alba , Ezequiel Lopez-Rubio This is my paper

Pith reviewed 2026-06-29 22:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords coverage regularizationprototype specializationminimal MLPsstructural lossesreconstruction errorGaussian activationsone-hidden-layer networkslatent space

0 comments

The pith

Coverage regularization yields the lowest reconstruction error and raises prototype specialization in minimal one-hidden-layer MLPs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether structural losses added during training can induce hidden-neuron specialization in minimal MLPs whose hidden width equals the number of training points, and whether that specialization improves reconstruction of the original data from the learned weights. Experiments on one-dimensional uniform samples with Gaussian activations compare coverage, separation, and overlap losses against ordinary fitting. Coverage regularization produces the lowest mean reconstruction error at every tested size from N=3 to N=100 and increases the ratio of specialized prototype usage, while overlap penalties raise error by expelling prototypes outside the convex hull and separation gives mixed results. The work concludes that any repulsive structural loss must be paired with a compatible attractor, or the latent geometry collapses into a degenerate equilibrium.

Core claim

In Gaussian-activation MLPs of width N on N-point uniform one-dimensional datasets, coverage regularization minimizes reconstruction error at all tested N and elevates the prototype-usage specialization ratio relative to the standard baseline; overlap penalties are systematically harmful because they drive the optimizer to equilibria in which prototype centers lie outside the convex hull of the inputs, separation shows inconsistent effects, and coverage acts as the necessary attractor that prevents such expulsion.

What carries the argument

Coverage regularization loss, which attracts hidden-neuron prototypes toward the training samples and thereby counters the expulsion produced by repulsive losses.

If this is right

Coverage regularization can be added to training to improve both reconstruction accuracy and neuron specialization in minimal MLPs.
Repulsive losses such as overlap or separation without a compensating attractor lead to degenerate equilibria that increase reconstruction error.
Separation penalties produce only mixed gains and become harmful at large temperature or without an attractor term.
The stable pattern across N from 3 to 100 in 480 runs indicates that the coverage-attractor principle applies across a wide range of minimal network sizes on this data regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same balancing of repulsive and attractive losses could be tested in autoencoders or other models that recover prototypes from latent weights.
If the expulsion mechanism persists in higher dimensions, coverage-style terms would become essential when designers add separation losses to encourage distinct prototypes.
Direct visualization of prototype positions at larger N confirms the geometric mechanism and could be extended to monitor training dynamics in real time.

Load-bearing premise

The observed expulsion of prototypes outside the convex hull under overlap and separation losses is a general mechanism rather than an artifact of one-dimensional uniform sampling combined with Gaussian activations.

What would settle it

Repeating the full set of controlled runs on two-dimensional or non-uniformly sampled data and checking whether overlap and separation losses still produce prototype centers outside the convex hull.

Figures

Figures reproduced from arXiv: 2605.25939 by Enrique Alba, Ezequiel Lopez-Rubio.

**Figure 1.** Figure 1: Mean reconstruction error versus N for all eight masks; error bars are SEM over 10 runs per cell. The four coverage-favorable masks (Std, Sep, Cov, Cov+Sep) form a lower band; the four overlap-active masks (Ovl, Ovl+Sep, Ovl+Cov, Full) form a higher band. Coverage-only is the lowest at every N [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Heatmap of mean reconstruction error across masks and dataset sizes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Prototype positions xˆj = −bj/wj for one median-error run of each of three masks at N = 100. The shaded region is the input range [0, 1]. Marker size is proportional to the prototype’s mean activation across the training inputs. Under 000 and 010 a working subset of prototypes remains inside the input range; under 100 all 100 prototypes are expelled [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Mean specialization ratio versus N for all eight masks; error bars are SEM. Coverage and Cov+Sep raise specialization above the standard baseline at every N; overlap-active masks collapse toward zero as N grows [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Heatmap of mean specialization ratio across masks and dataset sizes [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Representative reconstructions at N = 5 for masks 000 (Std), 010 (Cov), and 111 (Full). Originals oi in blue, reconstructions ri in orange. Arrows at the right edge mark prototypes expelled outside the plotted range, with their x value. E is the Hungarian-matching reconstruction error. specialization while leaving the fit essentially intact, and its harm becomes more severe as N grows. The study therefore … view at source ↗

read the original abstract

We here study whether training biases can make hidden neurons specialize in minimal one-hidden-layer MLPs, and whether such specialization improves prototype-based reconstruction of the training dataset from the learned weights. We consider Gaussianactivation MLPs of width equal to dataset size and compare three structural losses that respectively encourage coverage of the training samples, separation between neuron-induced prototypes, and low overlap of hidden responses, against the standard fitting baseline. Experiments on uniformly sampled one-dimensional datasets show a stable pattern from N = 3 to N = 100 across 480 controlled runs. Coverage regularization gives the lowest mean reconstruction error at every tested size and raises the prototype-usage specialization ratio relative to the standard baseline, while separation has mixed effects and overlap penalties are systematically harmful. We show that the harm is not an optimization failure: overlap-active approaches fit the data as well as overlap-free ones but route the optimizer to a degenerate equilibrium in which prototype centers are pushed outside the convex hull of the training inputs. Coverage cannot reward this expulsion and acts as an attractor: separation admits it only at large temperature and overlap admits it at the nominal hyperparameter choice. A direct {\tau}-sweep on the separation-only mask and a prototype-position visualization at N = 100 confirm the mechanism. The findings yield a simple design principle for prototype-recoverability-aware training: every repulsive structural loss must be compensated by a compatible attractor, or it will collapse the latent geometry it was meant to refine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Coverage regularization improves reconstruction in these 1D width-N MLPs while overlap penalties expel prototypes, but the design principle rests on untested assumptions about the data distribution.

read the letter

The core observation is that coverage regularization gives lower mean reconstruction error than the baseline at every N from 3 to 100 and raises the specialization ratio, while overlap penalties are consistently harmful and separation is mixed. The paper shows this pattern holds across 480 runs and supplies a mechanistic account: overlap routes the optimizer to equilibria where prototypes sit outside the convex hull even though the data fit remains good, and coverage prevents that expulsion.

The work does a clean job of comparing the three structural losses in a tightly controlled setting and using tau sweeps plus N=100 position visualizations to separate optimization failure from equilibrium choice. That part is useful and directly supports the claim that the harm is geometric rather than a training artifact.

The soft spot is the narrow regime. All results use 1D uniform sampling on [0,1], Gaussian activations, and width exactly equal to dataset size. The proposed design principle—that every repulsive loss needs a compensating attractor—follows from the observed expulsion, but the paper supplies no runs on d>1 or non-uniform inputs. If expulsion does not occur under those conditions, the general rule does not follow. The reconstruction metric and any data-exclusion rules are also not visible in the abstract, so a referee would need to check those definitions.

This paper is for researchers working on regularization for prototype recovery or interpretability in small MLPs. A reader who wants a controlled empirical comparison in this exact setting will find the patterns and the visualization evidence worth seeing. The thinking is straightforward and the experiments are reproducible in principle.

I would send it to peer review. The study is small but the controls are reasonable, and a referee could usefully ask for the missing higher-dimensional checks and metric details.

Referee Report

1 major / 1 minor

Summary. The paper studies specialization in minimal one-hidden-layer MLPs (Gaussian activations, width = N) on 1D uniform data. It compares coverage, separation, and overlap structural losses to a standard baseline across N=3 to 100 (480 runs). Coverage yields lowest reconstruction error and higher prototype-usage ratios; separation is mixed; overlap is harmful by expelling prototypes outside the convex hull. Tau-sweeps and N=100 visualizations show the mechanism, yielding the design principle that every repulsive structural loss requires a compensating attractor.

Significance. If the empirical patterns hold, the work supplies a concrete mechanistic account of how structural losses interact with latent geometry in small networks, including a falsifiable design rule for prototype-recoverability. The scale (480 controlled runs), direct visualization of prototype positions, and identification of a non-optimization failure mode are strengths that make the observations reproducible and interpretable.

major comments (1)

[Abstract] Abstract and experimental sections: the design principle ('every repulsive structural loss must be compensated by a compatible attractor') is stated without qualification, yet all supporting evidence (reconstruction errors, specialization ratios, expulsion mechanism) is obtained exclusively on 1D uniform sampling with Gaussian activations and width=N. The stress-test correctly flags that the expulsion equilibrium and coverage advantage may be artifacts of this regime; without tests on d>1 or non-uniform distributions the principle does not follow from the reported data.

minor comments (1)

[Abstract] Abstract contains unreplaced LaTeX ('{\tau}-sweep'); ensure all math renders correctly in the final version.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the design principle is stated too generally given the scope of the experiments and will revise the manuscript to qualify the claims as observations from the 1D uniform regime while noting the need for further validation.

read point-by-point responses

Referee: [Abstract] Abstract and experimental sections: the design principle ('every repulsive structural loss must be compensated by a compatible attractor') is stated without qualification, yet all supporting evidence (reconstruction errors, specialization ratios, expulsion mechanism) is obtained exclusively on 1D uniform sampling with Gaussian activations and width=N. The stress-test correctly flags that the expulsion equilibrium and coverage advantage may be artifacts of this regime; without tests on d>1 or non-uniform distributions the principle does not follow from the reported data.

Authors: We acknowledge that all reported results, including reconstruction errors, specialization ratios, and the expulsion mechanism, are obtained exclusively on 1D uniform data with Gaussian activations and width=N. The design principle is an extrapolation from these controlled experiments. We will revise the abstract and experimental sections to present the principle as an empirical observation and falsifiable hypothesis specific to this regime, rather than a general claim. We will also expand the discussion of the stress-test to more explicitly flag the potential for regime-specific artifacts and the need for future tests in d>1 and non-uniform settings. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical measurements of error and specialization ratios.

full rationale

The paper reports controlled experiments measuring reconstruction error and prototype-usage ratios on held-out data across N=3 to 100. The design principle is presented as an interpretation of observed optimizer behavior under different losses, not as a mathematical derivation or fitted quantity that reduces to its own inputs by construction. No equations, self-citations, or ansatzes are invoked in a load-bearing way that would create circularity. The central claims rest on direct experimental outcomes rather than re-deriving inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central empirical claim rests on the modeling choice that width exactly equals dataset size and on the definition of the three structural losses; no new mathematical axioms or invented entities are introduced.

free parameters (1)

loss coefficients for coverage, separation, overlap
The relative weights of the three structural losses are chosen and not derived; they control the observed specialization and reconstruction behavior.

pith-pipeline@v0.9.1-grok · 5791 in / 1173 out tokens · 21955 ms · 2026-06-29T22:22:13.768951+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 6 canonical work pages

[1]

2013 , issue_date =

Y . Bengio, A. Courville, P. Vincent, Representation learn- ing: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8) (2013) 1798–1828. doi:10.1109/TPAMI.2013.50

work page doi:10.1109/tpami.2013.50 2013
[2]

Alain, Y

G. Alain, Y . Bengio, Understanding intermediate layers using linear classifier probes, in: International Conference on Learning Representations Workshop Track, 2017

2017
[3]

Rudin, Stop explaining black box machine learning mod- els for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (5) (2019) 206–215

C. Rudin, Stop explaining black box machine learning mod- els for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (5) (2019) 206–215. doi:10.1038/s42256-019-0048-x

work page doi:10.1038/s42256-019-0048-x 2019
[4]

Carlini, F

N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert- V oss, K. Lee, A. Roberts, T. Brown, D. Song, Ú. Erlings- son, A. Oprea, C. Raffel, Extracting training data from large language models, in: 30th USENIX Security Sympo- sium (USENIX Security 21), 2021, pp. 2633–2650

2021
[5]

N. Haim, G. Vardi, G. Yehudai, O. Shamir, M. Irani, Re- constructing training data from trained neural networks, in: Advances in Neural Information Processing Systems 35, 2022, pp. 22911–22924

2022
[6]

Cogswell, F

M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, D. Batra, Reducing overfitting in deep networks by decorrelating representations, in: International Conference on Learning Representations, 2016

2016
[7]

Oostwal, M

E. Oostwal, M. Straat, M. Biehl, Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation, Physica A: Statistical Mechanics and its Applications 564 (2021) 125517. doi:10.1016/j.physa.2020.125517

work page doi:10.1016/j.physa.2020.125517 2021
[8]

L. Xie, Y . Yang, D. Cai, X. He, Neural collapse inspired attraction-repulsion-balanced loss for imbal- anced learning, Neurocomputing 527 (2023) 60–70. doi:10.1016/j.neucom.2023.01.023

work page doi:10.1016/j.neucom.2023.01.023 2023
[9]

D. S. Broomhead, D. Lowe, Multivariable functional inter- polation and adaptive networks, Complex Systems 2 (3) (1988) 321–355

1988
[10]

Moody, C

J. Moody, C. J. Darken, Fast learning in networks of locally- tuned processing units, Neural Computation 1 (2) (1989) 281–294. doi:10.1162/neco.1989.1.2.281

work page doi:10.1162/neco.1989.1.2.281 1989
[11]

J. Park, I. W. Sandberg, Universal approximation using radial-basis-function networks, Neural Computation 3 (2) (1991) 246–257. doi:10.1162/neco.1991.3.2.246. 9

work page doi:10.1162/neco.1991.3.2.246 1991

[1] [1]

2013 , issue_date =

Y . Bengio, A. Courville, P. Vincent, Representation learn- ing: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8) (2013) 1798–1828. doi:10.1109/TPAMI.2013.50

work page doi:10.1109/tpami.2013.50 2013

[2] [2]

Alain, Y

G. Alain, Y . Bengio, Understanding intermediate layers using linear classifier probes, in: International Conference on Learning Representations Workshop Track, 2017

2017

[3] [3]

Rudin, Stop explaining black box machine learning mod- els for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (5) (2019) 206–215

C. Rudin, Stop explaining black box machine learning mod- els for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (5) (2019) 206–215. doi:10.1038/s42256-019-0048-x

work page doi:10.1038/s42256-019-0048-x 2019

[4] [4]

Carlini, F

N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert- V oss, K. Lee, A. Roberts, T. Brown, D. Song, Ú. Erlings- son, A. Oprea, C. Raffel, Extracting training data from large language models, in: 30th USENIX Security Sympo- sium (USENIX Security 21), 2021, pp. 2633–2650

2021

[5] [5]

N. Haim, G. Vardi, G. Yehudai, O. Shamir, M. Irani, Re- constructing training data from trained neural networks, in: Advances in Neural Information Processing Systems 35, 2022, pp. 22911–22924

2022

[6] [6]

Cogswell, F

M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, D. Batra, Reducing overfitting in deep networks by decorrelating representations, in: International Conference on Learning Representations, 2016

2016

[7] [7]

Oostwal, M

E. Oostwal, M. Straat, M. Biehl, Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation, Physica A: Statistical Mechanics and its Applications 564 (2021) 125517. doi:10.1016/j.physa.2020.125517

work page doi:10.1016/j.physa.2020.125517 2021

[8] [8]

L. Xie, Y . Yang, D. Cai, X. He, Neural collapse inspired attraction-repulsion-balanced loss for imbal- anced learning, Neurocomputing 527 (2023) 60–70. doi:10.1016/j.neucom.2023.01.023

work page doi:10.1016/j.neucom.2023.01.023 2023

[9] [9]

D. S. Broomhead, D. Lowe, Multivariable functional inter- polation and adaptive networks, Complex Systems 2 (3) (1988) 321–355

1988

[10] [10]

Moody, C

J. Moody, C. J. Darken, Fast learning in networks of locally- tuned processing units, Neural Computation 1 (2) (1989) 281–294. doi:10.1162/neco.1989.1.2.281

work page doi:10.1162/neco.1989.1.2.281 1989

[11] [11]

J. Park, I. W. Sandberg, Universal approximation using radial-basis-function networks, Neural Computation 3 (2) (1991) 246–257. doi:10.1162/neco.1991.3.2.246. 9

work page doi:10.1162/neco.1991.3.2.246 1991