pith. sign in

arxiv: 2606.24903 · v1 · pith:RXMHFRGCnew · submitted 2026-06-12 · 💻 cs.LG

A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis

Pith reviewed 2026-06-27 04:46 UTC · model grok-4.3

classification 💻 cs.LG
keywords few-shot classificationsaturation indexeffective rankcovariance concentrationlinear discriminant analysisstopping rulespectral phase diagramrepresentational diagnosis
0
0 comments X

The pith

The saturation index S(K) falls below a threshold exactly when the within-class covariance concentrates around its population value and the linear discriminant stabilizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the saturation index S(K) as the effective rank of the pooled within-class sample covariance divided by the shot count K. It establishes that this index dropping below a threshold signals covariance concentration and stabilization of the linear classifier. The index is computed solely from support features in O(d^3) time and requires no test labels. Across 246 observations from binary tasks, it shows positive correlation with marginal accuracy gains and supports a three-phase diagram of learning progress.

Core claim

The central claim is that S(K) = erank(hat Sigma_W^(K)) / K falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. This equivalence enables a spectral phase diagram with distinct marginal gains in exploration, transition, and saturation regimes, plus a stopping rule that achieves AUC 0.752. The index also diagnoses representational inadequacy when paired with low accuracy, and asymptotic effective rank shows no monotone link to peak accuracy.

What carries the argument

The saturation index S(K), the ratio of effective rank of the pooled within-class sample covariance to shot count K; it acts as a spectral detector of when the covariance estimator and linear discriminant have stabilized.

If this is right

  • Sixteen of seventeen tasks show positive within-task Spearman correlation between S(K) and marginal accuracy gain.
  • The three phases exhibit mean marginal gains of 3.48 percent, 2.40 percent, and 0.82 percent with all pairwise tests significant.
  • As a binary stopping rule the index reaches AUC 0.752.
  • Small S(K) together with low accuracy indicates representational inadequacy.
  • Asymptotic effective rank and peak accuracy lack a significant monotone relationship across tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The index could serve as a label-budget allocator in new binary tasks without access to held-out data.
  • Generalizing the effective-rank construction to N-way settings would require redefining the pooled covariance structure.
  • The absence of a link to peak accuracy implies that task-intrinsic dimensionality alone does not bound final performance.
  • Testing the index on features from pretrained backbones would check whether saturation behavior persists beyond linear classifiers.

Load-bearing premise

The equivalence between S(K) crossing a threshold and covariance concentration plus LDA stabilization holds as stated.

What would settle it

An observation where S(K) remains below the threshold yet adding shots produces large accuracy gains or the sample covariance has not concentrated around the population value.

read the original abstract

Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index $S(K) = \operatorname{erank}(\widehat{\Sigma}_W^{(K)}) / K$ measures the ratio of the effective rank of the pooled within-class sample covariance to the shot count; we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. The index is computable in $O(d^3)$ time from support features alone, requiring no test labels or trained classifier. Evaluated across $N = 246$ doubling-pair observations from seventeen binary tasks and six datasets, sixteen of seventeen tasks have a positive within-task Spearman correlation between $S(K)$ and marginal accuracy gain (median $\rho = 0.811$). The pooled Spearman correlation is $\rho = 0.548$ ($p = 1.1 \times 10^{-20}$, $N = 246$). A three-phase diagram (exploration, transition, saturation) with mean marginal gains of $3.48\%$, $2.40\%$, and $0.82\%$ is supported by all pairwise significance tests ($p \leq 0.008$). As a binary stopping rule, the index achieves AUC $= 0.752$, providing meaningful probabilistic guidance for annotation decisions. Asymptotic effective rank and peak accuracy show no significant monotone relationship across tasks (Spearman $r_s = 0.380$, $p = 0.133$, $N = 17$). A small saturation index paired with low accuracy diagnoses representational inadequacy. All results are for binary classification with a fixed linear classifier; extensions to $N$-way settings and pretrained backbone representations are discussed as future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the saturation index S(K) = erank(Σ̂_W^(K))/K for binary few-shot classification. It asserts a proof that S(K) falls below a threshold precisely when the within-class sample covariance concentrates around the population covariance and the linear discriminant stabilizes. On 246 doubling-pair observations from 17 binary tasks across six datasets, it reports within-task Spearman correlations (median ρ = 0.811) between S(K) and marginal accuracy gain, a pooled correlation of ρ = 0.548 (p = 1.1×10^{-20}), statistically significant differences in mean marginal gains across a three-phase diagram (3.48%, 2.40%, 0.82%), and AUC = 0.752 for the index as a stopping rule. The index requires only support features and runs in O(d³) time.

Significance. If the asserted equivalence can be established under explicit assumptions and the empirical patterns hold under cross-validation, the index would supply a practical, label-free diagnostic for annotation stopping and representational diagnosis in few-shot settings, separating sample-size effects from backbone limitations.

major comments (3)
  1. [Abstract] Abstract: The central claim states 'we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized,' yet supplies no derivation steps, assumptions (e.g., Gaussianity, eigenvalue gaps, fixed d), or intermediate results. This biconditional is load-bearing for the theoretical contribution.
  2. [Abstract] Abstract: S(K) is defined directly from the sample covariance matrix, but the claimed equivalence between S(K) crossing the threshold and covariance concentration plus LDA stabilization is asserted rather than derived from the definition; the manuscript therefore provides no checkable conditions under which the equivalence holds.
  3. [Abstract] Abstract: The saturation threshold itself is listed among the free parameters and is used to partition the three-phase diagram whose significance tests are reported; without a derivation or pre-specification of the threshold, the reported p-values (p ≤ 0.008) and AUC depend on a post-hoc choice whose sensitivity is unexamined.
minor comments (2)
  1. [Abstract] The effective-rank operator erank should be defined at first use, together with its relation to the eigenvalues of the sample covariance.
  2. The manuscript should state whether the 246 observations are independent across tasks or whether task-level clustering was accounted for in the pooled Spearman test.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and for identifying points where the abstract's theoretical claims require clearer presentation and justification. We address each comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim states 'we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant stabilizes,' yet supplies no derivation steps, assumptions (e.g., Gaussianity, eigenvalue gaps, fixed d), or intermediate results. This biconditional is load-bearing for the theoretical contribution.

    Authors: The full derivation of the biconditional, including the assumptions of sub-Gaussian tails on the features and a sufficient eigenvalue gap in the population within-class covariance, appears in Section 3 (Theorems 1 and 2). We will revise the abstract to explicitly reference Section 3 and list the key assumptions so that the claim is no longer presented without context. revision: yes

  2. Referee: [Abstract] Abstract: S(K) is defined directly from the sample covariance matrix, but the claimed equivalence between S(K) crossing the threshold and covariance concentration plus LDA stabilization is asserted rather than derived from the definition; the manuscript therefore provides no checkable conditions under which the equivalence holds.

    Authors: The equivalence is derived from the definition of effective rank via matrix concentration inequalities in Section 3.1. We will add a sentence to the abstract stating that the equivalence follows from the concentration bounds and eigenvalue-gap assumptions established in the main text, thereby supplying the requested checkable conditions. revision: yes

  3. Referee: [Abstract] Abstract: The saturation threshold itself is listed among the free parameters and is used to partition the three-phase diagram whose significance tests are reported; without a derivation or pre-specification of the threshold, the reported p-values (p ≤ 0.008) and AUC depend on a post-hoc choice whose sensitivity is unexamined.

    Authors: We agree that the threshold choice (currently set at S(K) < 1.1) requires explicit justification and sensitivity analysis. In the revision we will pre-specify the selection rule, report the criterion used, and add a sensitivity table showing that the phase-wise mean differences and AUC remain statistically significant across a neighborhood of thresholds. This will be included as a new subsection. revision: yes

Circularity Check

0 steps flagged

No circularity; definition and claimed proof are distinct, with independent empirical checks

full rationale

The saturation index is explicitly defined as S(K) = erank(hat Sigma_W^(K)) / K from the sample covariance. The paper asserts a proof that low S(K) indicates concentration and LDA stabilization, but the provided text contains no equations or steps exhibiting reduction of this equivalence to the definition by construction. Empirical Spearman correlations and AUC are computed on held-out data across tasks, providing external validation. No self-citations, fitted parameters renamed as predictions, or ansatzes appear in the load-bearing claims. The derivation chain is therefore self-contained against the given inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the definition of effective rank as a proxy for covariance concentration and on the assumption that the tested binary tasks and linear classifier are representative; no explicit free parameters are introduced in the abstract, though the unspecified threshold functions as an implicit choice.

free parameters (1)
  • saturation threshold for S(K)
    The value below which S(K) is taken to indicate stabilization is referenced but neither derived nor reported as chosen by any stated procedure.
axioms (1)
  • domain assumption Effective rank of the sample covariance is a reliable indicator of concentration around the population covariance in few-shot regimes.
    Invoked to link S(K) crossing the threshold to stabilization of the linear discriminant.

pith-pipeline@v0.9.1-grok · 5867 in / 1326 out tokens · 35820 ms · 2026-06-27T04:46:49.608216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 19 canonical work pages

  1. [1]

    In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R

    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 29 (NeurIPS), pp. 3630–3638. Curran Associates, Inc., ??? (2016). https://proceedings.neurips.cc/paper/2016/hash/ 90e13...

  2. [2]

    In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R

    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 30 (NeurIPS), pp. 4077–4087. Curran Associates, Inc., ??? (2017). https://proceedings.neurips.cc/paper/2017/hash/ cb8...

  3. [3]

    In: Precup, D., Teh, Y.W

    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adap- tation of deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR, ??? (2017). https: //proceedings.mlr.press/v70/finn17a.html

  4. [4]

    ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

    Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

  5. [5]

    (eds.) Computer Vision – ECCV 2020

    Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few-shot image classification: A good embedding is all you need? In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol. 12359, pp. 266–282. Springer, Cham (2020). https: //doi.org/10.1007/978-3-030-58568-6 16 ....

  6. [6]

    Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009)

    Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009). https://burrsettles.com/pub/settles.activelearning.pdf

  7. [7]

    In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp

    Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond 51 neural scaling laws: Beating power law scaling via data pruning. In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp. 19523–19536. Curran Associates, Inc., ??? (2022). Outstanding Paper Award. https://proceedings.neurips.cc/paper files/paper/2022/hash/ 7b75da9b...

  8. [8]

    Grüning, Frederik Riedel, and Philipp Lorenz-Spreen

    Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences116(32), 15849–15854 (2019) https://doi.org/10.1073/pnas. 1903070116

  9. [9]

    In: 8th International Conference on Learning Representations (ICLR)

    Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net, Addis Ababa, Ethiopia (2020). Also published inJournal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, https://doi.org/...

  10. [10]

    In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp

    Roy, O., Vetterli, M.: The effective rank: A measure of effective dimension- ality. In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp. 606–610. EURASIP, Poznan, Poland (2007). https://www. eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/a5p-h05.pdf

  11. [11]

    Cambridge Series in Statistical and Probabilistic Mathematics, vol

    Vershynin, R.: High-Dimensional Probability: An Introduction with Appli- cations in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press, Cambridge, UK (2018). https://doi.org/10.1017/9781108231596 . https://www.cambridge.org/core/ books/highdimensional-probability/797C466DA29743D2C8213493BD2D2102

  12. [12]

    In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp

    Rudelson, M., Vershynin, R.: Non-asymptotic theory of random matrices: Extreme singular values. In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp. 1576–1602. Hindustan Book Agency, New Delhi (2010). Survey on non-asymptotic methods with applications to covariance estimation. https://www.math.uci.edu/\texttildelowrvershyn/pa...

  13. [13]

    In: 9th International Conference on Learning Representations (ICLR)

    Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. In: 9th International Conference on Learning Representations (ICLR). OpenReview.net, Virtual Event (2021). Spotlight presentation. https://openreview.net/forum?id=XJk19XzGq2J

  14. [14]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10649–10657 (2019). https://doi.org/10.1109/CVPR.2019.01091 . https: //openaccess.thecvf.com/content CVPR 2019/papers/Lee Meta-Learning With Differentiable...

  15. [15]

    In: Inter- national Conference on Learning Representations (ICLR) (2017)

    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national Conference on Learning Representations (ICLR) (2017). Submitted 2016; published at ICLR 2017. https://openreview.net/forum?id=rJY0-Kcll

  16. [16]

    IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209

    Hospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209 . Preprint appeared 2020

  17. [17]

    Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation , year=

    Xu, J., Le, H.: Generating representative samples for few-shot classi- fication. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8993–9003 (2022). https://doi.org/10.1109/CVPR52688.2022.00880 . https://openaccess.thecvf. com/content/CVPR2022/papers/Xu Generating Representative Samples for Few-Shot Classifi...

  18. [18]

    In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp

    Ye, C., Wang, Q., Dong, L.: Single-step support set mining for realistic few- shot image classification. In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp. 1–8 (2024). https://doi.org/10.1109/ IJCNN60899.2024.10651328 . https://ieeexplore.ieee.org/document/10651328

  19. [19]

    In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S

    Hacohen, G., Dekel, O., Weinshall, D.: Active learning on a budget: Oppo- site strategies suit high and low budgets. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th Interna- tional Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 162, pp. 8175–8195. PMLR, ???...

  20. [20]

    In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp

    Ansuini, A., Laio, A., Macke, J.H., Zoccolan, D.: Intrinsic dimension of data rep- resentations in deep neural networks. In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp. 6111–6122 (2019). https://proceedings.neurips.cc/ paper/2019/hash/cfcce0621b49c983991ead4c3d4d3b6b-Abstract.html

  21. [21]

    Journal of Machine Learning Research21(174), 1–38 (2020)

    Nakada, R., Imaizumi, M.: Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research21(174), 1–38 (2020). Preprint appeared 2019

  22. [22]

    In: The Twelfth International Conference on Learning Representations (ICLR) (2024)

    Konz, N., Mazurowski, M.A.: The effect of intrinsic dataset properties on generalization: Unraveling learning differences between natural and medical images. In: The Twelfth International Conference on Learning Representations (ICLR) (2024). arXiv preprint arXiv:2401.08865. https://openreview.net/forum? id=ixP76Y33y1

  23. [23]

    IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744

    Viering, T., Loog, M.: The shape of learning curves: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744 . Preprint arXiv:2103.10948, 2021 53

  24. [24]

    BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

    Figueroa, R.L., Zeng-Treitler, Q., Kandula, S., Ngo, L.H.: Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

  25. [25]

    Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809

    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809. 1936.tb02137.x

  26. [26]

    2016.1264957

    Friedman, J.H.: Regularized discriminant analysis. Journal of the American Sta- tistical Association84(405), 165–175 (1989) https://doi.org/10.1080/01621459. 1989.10478752

  27. [27]

    International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9

    Sharma, A., Paliwal, K.K.: Linear discriminant analysis for the small sample size problem: An overview. International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9 . Preprint appeared 2014

  28. [28]

    arXiv preprint arXiv:1912.07242 (2019)

    Nakkiran, P.: More data can hurt for linear regression: Sample-wise double descent. arXiv preprint arXiv:1912.07242 (2019)

  29. [29]

    2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2nd ed.)

    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer Series in Statistics. Springer, New York, NY (2009). https://doi.org/10.1007/978-0-387-84858-7 . https://hastie.su.domains/ElemStatLearn/

  30. [30]

    Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10

    Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10. 1088/1742-5468/ac3a74

  31. [31]

    Nocedal, S

    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York, NY (2006). https://doi.org/10.1007/978-0-387-40065-5 . https://link.springer.com/book/10. 1007/978-0-387-40065-5

  32. [32]

    Journal of Machine Learning Research12, 2825–2830 (2011)

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

  33. [33]

    In: Biomedical Image Processing and Biomedical Visualization, vol

    Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extrac- tion for breast tumor diagnosis. In: Biomedical Image Processing and Biomedical Visualization, vol. 1905, pp. 861–870. SPIE, ??? (1993). https: //doi.org/10.1117/12.148698 . Breast Cancer Wisconsin (Diagnostic) dataset. https://www.spiedigitallibrary.org/conference-proceedings-of-spie...

  34. [34]

    SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

    Vanschoren, J., Rijn, J.N., Bischl, B., Torgo, L.: OpenML: Networked science in machine learning. SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

  35. [35]

    Lecun, L

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), 2278–2324 (1998) https: //doi.org/10.1109/5.726791

  36. [36]

    arXiv preprint arXiv:1708.07747 (2017)

    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  37. [37]

    arXiv preprint arXiv:1812.01718 (2018)

    Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718 (2018). Kuzushiji-MNIST dataset

  38. [38]

    IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

    Hull, J.J.: A database for handwritten text recognition research. IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

  39. [39]

    Technical report, University of Toronto, Department of Computer Sci- ence (2009)

    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, Department of Computer Sci- ence (2009). CIFAR-10 dataset. https://www.cs.toronto.edu/\texttildelowkriz/ learning-features-2009-TR.pdf

  40. [40]

    Trials<0.93

    Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. John Wiley & Sons, Hoboken, NJ (2003). https://www.wiley.com/ en-us/An+Introduction+to+Multivariate+Statistical+Analysis%2C+3rd+ Edition-p-9780471360919 A Full Per-Task Result Tables Tables 10–26 report the completeK-sweep results for all seventeen binary classification tasks i...