pith. sign in

arxiv: 2110.01729 · v6 · submitted 2021-10-04 · 📊 stat.ML · cs.LG

Stochastic tensor space feature theory with applications to robust machine learning

Pith reviewed 2026-05-24 12:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords stochastic tensor spacesKarhunen-Loeve expansionmultilevel orthogonal subspacesfeature extractionAlzheimer's diseasemachine learning classificationblood plasma datasubspace methods
0
0 comments X

The pith

Modeling data as random fields in stochastic tensor spaces yields multilevel orthogonal subspaces whose projections sharply improve machine learning class separation on Alzheimer's blood data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Multilevel Orthogonal Subspace Karhunen-Loeve feature theory that treats training data as instances of random fields in a Bochner space. It starts with the nominal class and builds a hierarchy of orthogonal subspaces to isolate anomalous components belonging to other classes. Projection coefficients of input data onto these subspaces become new features that produce clearer separation surfaces for classifiers. On the Alzheimer's Disease Neuroimaging Initiative blood plasma dataset, these features deliver much higher accuracy for distinguishing cognitive normal, mild cognitive impairment, and dementia stages than Gradient Boosting, RUS Boost, Random Forest, or Neural Networks.

Core claim

The authors establish that separate machine learning classes often reside predominantly in mostly distinct subspaces. By applying the Karhunen-Loeve expansion to construct a multilevel orthogonal subspace from the first class and treating the second class as an outlier, the projection coefficients of the data into these subspaces become new features. These features enable machine learning classifiers to achieve dramatic accuracy gains on the blood plasma dataset for predicting Alzheimer's disease stages.

What carries the argument

Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory, which uses a hierarchical expansion of the nominal class in stochastic tensor spaces to produce projection coefficients as robust ML features.

If this is right

  • Machine learning classifiers trained on the new projection coefficients achieve dramatic accuracy increases on the blood plasma Alzheimer's dataset.
  • High-accuracy prediction of Alzheimer's stages becomes possible from a non-invasive blood test.
  • The method supplies a general procedure for constructing robust features when classes occupy distinct subspaces.
  • The same subspace construction can be used for anomalous signal detection in other classification problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be applied to other biomarker or medical imaging datasets where subtle class differences exist.
  • Hierarchical subspace levels might be tuned to capture finer anomaly scales in complex data.
  • The extracted coefficients could serve as interpretable inputs to neural network architectures.

Load-bearing premise

Separate machine learning classes can reside predominantly in mostly distinct subspaces.

What would settle it

Applying the projection coefficients as features to the Alzheimer's blood plasma dataset and obtaining no accuracy improvement over Gradient Boosting, RUS Boost, Random Forest, or Neural Networks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2110.01729 by Dingning Liu, Julio Enrique Castrillon-Candas, Kaili Shi, Mark Kon, Sicheng Yang, the Alzheimer's Disease Neuroimaging Initiative, Xiaoling Zhang.

Figure 1
Figure 1. Figure 1: Illustrative example of binary classification with blue and orange dots. For (a) we have that our data are well separated, with blue dots representing the first class and orange dots the second. Due to the separation of the data it is in principle easy to construct a decision boundary. (b) For this case the data classes are mixed, leading to complex boundary decision surfaces that are hard to build, yieldi… view at source ↗
Figure 2
Figure 2. Figure 2: Illustrative example of the separation between the projection coefficients of the nominal class and large anomalous signals based on the coefficients d l k . (a) The orange (nominal class) and blue dots (signal anomaly of the alternative class) corresponds to the original data in the feature space. These observations points are mixed with each other, which makes it hard to build a decision surface. (b) Aft… view at source ↗
Figure 3
Figure 3. Figure 3: MOS KL training framework for binary classification using SVM. With a slight abuse of notation the map Φ : L 2 (U) → L k∈N0 Sk corresponds to the transformation of the signal u(x, ω) into the spaces L k∈N0 Sk and so provides the projection coefficients. The MOS are built from the classes where more data is available, in this case from the data of class A; NT < m1 samples are chosen ( mA 1 , . . . , mA NT )… view at source ↗
Figure 4
Figure 4. Figure 4: Semi-synthetic test classification results for unbalanced data sets. Ac￾curacy and precision with respect to the number of nested multilevel S0 ⊕ S1 ⊕ · · · ⊕ SLevel. The semi-synthetic data AA for class A is generated for NA = 150, 250, 450, 1500 and 10000 realizations of class A using model (4). Similarly, class B dataset BNB is generated NB = 100 realizations with model (4). Since the size of BNB is NB … view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy and prediction comparison results between MSVM RBF, SVM RBF, and WSVM RBF as the number of samples in the training datasets increases for class A (ANA). The accuracy and precision for the MSVM RBF classifier (multi￾level filtered datasets) are plotted for Level = 6 and with respect to the sample size of the datasets ANA. Notice that as as the dataset ANA becomes more unbalanced, the accuracy and p… view at source ↗
Figure 6
Figure 6. Figure 6: Semi-synthetic test classification results for modified unbalanced data sets. Test #1 is repeated but the realizations of the original datasets ANA, AV NA , BNB and B V N˜B are updated as u A M(x, ωk) ← sin(18u A M(x, ωk)) and u B M(x, ωk) ← sin(18u B M(x, ωk)). (a) The accuracy of all of the methods drops somewhat. However, it is clear that the MSVM RBF method outperforms and is more robust towards the in… view at source ↗
Figure 7
Figure 7. Figure 7: Semi-synthetic test classification results for modified unbalanced data sets. Test #1 is repeated but the realizations of the original datasets ANA, AV NA , BNB and B V N˜B are updated as u A M(x, ωk) ← sin(20u A M(x, ωk)) and u B M(x, ωk) ← sin(20u B M(x, ωk)). (a) For this dataset we observe that the MSVM RBF method significantly outperforms the best results from WSVM RBF and linear methods. In particula… view at source ↗
Figure 8
Figure 8. Figure 8: Comparison test for CN (Cognitive Normal subjects) vs AD participants. Accuracy and ROC curves (for the last level) for Multilevel features (with Radial SVM) compared to SVM with the original features are shown. The accuracy for the Multilevel features are plotted for each nested level. It is observed that the accuracy increases from 48.3% to 89.11% by using RBF SVM with the multilevel features. For the sa… view at source ↗
read the original abstract

In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data are treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and Neural Networks. We show that with a non-invasive blood test, high-accuracy results can be obtained for predicting AD stages such as cognitive normal, mild cognitive impairment and dementia.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces. Training data are modeled as random fields in a Bochner space; a hierarchical KL expansion of the nominal class constructs an MOS to detect outlier signal components from other classes, with the resulting projection coefficients serving as new features for an ML classifier. The central empirical claim is that this yields dramatic accuracy gains on the ADNI blood-plasma dataset for distinguishing cognitive normal, mild cognitive impairment, and dementia stages, outperforming Gradient Boosting, RUS Boost, Random Forest, and Neural Networks.

Significance. If the accuracy gains are reproducible and the subspace-separation premise holds for the data, the work would supply a theoretically grounded feature-construction technique that exploits distinct class subspaces via KL expansions, potentially improving robustness in medical classification tasks where non-invasive biomarkers are used.

major comments (2)
  1. [Abstract] Abstract: the claim of 'dramatic increases in accuracy' on the ADNI dataset supplies no numerical values, error bars, dataset sizes, cross-validation protocol, or baseline numbers, rendering the central performance claim impossible to evaluate.
  2. [Abstract] Abstract (key observation paragraph): the premise that 'separate machine learning classes can reside predominantly in mostly distinct subspaces' is stated without any supporting calculation (principal angles, subspace overlap norms, or pre-/post-projection accuracy deltas) for the blood-plasma features; this premise is load-bearing because the MOS construction and outlier detection rely on it to produce clearer separation surfaces.
minor comments (1)
  1. [Abstract] The sentence 'This contrast to popular ML methods...' is grammatically incomplete and should be rephrased for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'dramatic increases in accuracy' on the ADNI dataset supplies no numerical values, error bars, dataset sizes, cross-validation protocol, or baseline numbers, rendering the central performance claim impossible to evaluate.

    Authors: We agree that the abstract should be self-contained with respect to the central empirical claim. The main text reports specific accuracy values, dataset details from the ADNI blood-plasma cohort, the cross-validation protocol, and comparisons against Gradient Boosting, RUS Boost, Random Forest, and Neural Networks. In the revised manuscript we will move these quantitative details into the abstract so that the performance claim can be evaluated directly from it. revision: yes

  2. Referee: [Abstract] Abstract (key observation paragraph): the premise that 'separate machine learning classes can reside predominantly in mostly distinct subspaces' is stated without any supporting calculation (principal angles, subspace overlap norms, or pre-/post-projection accuracy deltas) for the blood-plasma features; this premise is load-bearing because the MOS construction and outlier detection rely on it to produce clearer separation surfaces.

    Authors: The premise is foundational and is supported in the body of the paper by the observed accuracy gains after projection onto the MOS. However, we acknowledge that the abstract itself contains no explicit supporting metrics such as principal angles or overlap norms. We will revise the abstract (and, if space permits, add a short supporting sentence in the introduction) to reference the pre- versus post-projection accuracy deltas already computed in the results section, thereby making the load-bearing assumption more transparent. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation proceeds from KL expansions to empirical classifier accuracy

full rationale

The paper defines the MOS construction explicitly from the hierarchical KL expansion of the nominal class in the Bochner space, projects the second class as an outlier, and feeds the resulting coefficients into a downstream ML classifier whose accuracy is measured on held-out ADNI data. No equation reduces a reported performance number to a fitted parameter or label-derived quantity by construction, and no self-citation is invoked as a load-bearing uniqueness or ansatz step. The subspace-distinctness observation is an explicit modeling assumption whose empirical consequences are tested rather than presupposed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Ledger populated exclusively from statements in the abstract; full paper may contain additional fitted parameters or background results not visible here.

axioms (1)
  • domain assumption separate machine learning classes can reside predominantly in mostly distinct subspaces
    Explicitly stated as the key observation that enables construction of the MOS to treat the second class as an outlier.
invented entities (1)
  • Multilevel Orthogonal Subspace (MOS) no independent evidence
    purpose: To detect anomalous signal components by treating the second class as an outlier of the first via hierarchical KL expansion
    Introduced as the central construct for generating the new projection-coefficient features.

pith-pipeline@v0.9.0 · 5754 in / 1370 out tokens · 51226 ms · 2026-05-24T12:56:45.535024+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distribution-Free Stochastic Analysis and Robust Multilevel Vector Field Anomaly Detection

    stat.ML 2022-07 unverdicted novelty 6.0

    Presents a distribution-free multilevel vector field anomaly detection technique based on Karhunen-Loeve expansions that forms hypothesis tests without distributional assumptions and detects subtle anomalies missed by...

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    J. E. Castrill´ on-Cand´ as and Kevin Amaratunga. Spatially adapted multiwavelets and sparse representation of integral equations on general geometries. SIAM Journal on Scientific Computing , 24(5):1530–1566, 2003

  2. [2]

    Castrill´ on-Cand´ as and Kevin Amaratunga

    Julio E. Castrill´ on-Cand´ as and Kevin Amaratunga. Fast estimation of continuous Karhunen-Loeve eigenfunctions using wavelets. IEEE Transactions on Signal Processing , 50(1):78–86, 2002

  3. [3]

    Castrill´ on-Cand´ as and Mark Kon

    Julio E. Castrill´ on-Cand´ as and Mark Kon. Anomaly detection: A functional analysis perspective. Journal of Multivariate Analysis, 189:104885, 2022

  4. [4]

    Distribution-Free Stochastic Analysis and Robust Multilevel Vector Field Anomaly Detection

    Julio E Castrillon-Candas and Mark Kon. Stochastic functional analysis and multilevel vector field anomaly detection, 2022. arXiv:2207.06229

  5. [5]

    An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

    Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000

  6. [6]

    D’Heedene, K

    S. D’Heedene, K. Amaratunga, and J. E. Castrill´ on-Cand´ as. Generalized hierarchical bases: a wavelet-ritz- galerkin framework for lagrangian FEM. Engineering Computations, 22(1):15–37, 2005

  7. [7]

    Flamholz, Noam Prywes, Uri Moran, Dan Davidi, Yinon M

    Avi I. Flamholz, Noam Prywes, Uri Moran, Dan Davidi, Yinon M. Bar-On, Luke M. Oltrogge, Rui Alves, David Savage, and Ron Milo. Revisiting trade-offs between rubisco kinetic parameters.Biochemistry, 58(31):3365–3376,

  8. [8]

    Analysis of the domain mapping method for elliptic diffusion problems on random domains

    Helmut Harbrecht, Michael Peters, and Markus Siebenmorgen. Analysis of the domain mapping method for elliptic diffusion problems on random domains. Numerische Mathematik, 134(4):823–856, 2016

  9. [9]

    Yu, Ann B

    Lifang He, Xiangnan Kong, Philip S. Yu, Ann B. Ragin, Zhifeng Hao, and Xiaowei Yang. Dusk: A dual structure- preserving kernel for supervised tensor learning with applications to neuroimages, 2014

  10. [10]

    Yu, and Ann B

    Lifang He, Chun-Ta Lu, Hao Ding, Shen Wang, Linlin Shen, Philip S. Yu, and Ann B. Ragin. Multi-way multi- level kernel modeling for neuroimaging classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6846–6854, 2017

  11. [11]

    Yu, and Ann B

    Lifang He, Chun-Ta Lu, Guixiang Ma, Shen Wang, Linlin Shen, Philip S. Yu, and Ann B. Ragin. Kernelized support tensor machines. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1442–1451. PMLR, 06–11 Aug 2017

  12. [12]

    Horv´ ath and P

    L. Horv´ ath and P. Kokoszka.Inference for Functional Data with Applications . Springer, 2012

  13. [13]

    Kokoszka and M

    P. Kokoszka and M. Reimherr. Introduction to Functional Data Analysis . CRC Press, 1 edition, 2017. 15

  14. [14]

    A weighted subspace exponential kernel for support tensor machines, 2023

    Kirandeep Kour, Sergey Dolgov, Peter Benner, Martin Stoll, and Max Pfeffer. A weighted subspace exponential kernel for support tensor machines, 2023

  15. [15]

    Efficient structure-preserving support tensor train machine

    Kirandeep Kour, Sergey Dolgov, Martin Stoll, and Peter Benner. Efficient structure-preserving support tensor train machine. Journal of Machine Learning Research , 24(4):1–22, 2023

  16. [16]

    Machine learning in enzyme engineering

    Stanislav Mazurenko, Zbynek Prokop, and Jiri Damborsky. Machine learning in enzyme engineering. ACS Catal- ysis, 10(2):1210–1223, 2020

  17. [17]

    Adam McCormack and Charles P. DeLisi. Stochastic multilevel orthogonal subspaces for RuBisCO Engineering. In preparation, 2022

  18. [18]

    Pramuditha Perera, Poojan Oza, and Vishal M. Patel. One-class classification: A survey, 2021

  19. [19]

    R. C. Petersen, P. S. Aisen, L. A. Beckett, M. C. Donohue, A. C. Gamst, D. J. Harvey, Jr C. R. Jack, W. J. Jagust, L. M. Shaw, A. W. Toga, J. Q. Trojanowski, and M. W. Weiner. Alzheimer’s disease neuroimaging initiative (adni). Neurology, 74(3):201–209, 2010

  20. [20]

    Mesirov, Tomaso Poggio, William Gerald, Massimo Loda, Eric S

    Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang, Michael Angelo, Chris- tine Ladd, Michael Reich, Eva Latulippe, Jill P. Mesirov, Tomaso Poggio, William Gerald, Massimo Loda, Eric S. Lander, and Todd R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Science...

  21. [21]

    Christoph Schwab and Radu A. Todor. Karhunen–Lo` eve approximation of random fields by generalized fast multipole methods. Journal of Computational Physics , 217(1):100 – 122, 2006. Uncertainty Quantification in Simulation Science

  22. [22]

    Tan, Daniel Q

    Aik C. Tan, Daniel Q. Naiman, Lei Xu, Raimond L. Winslow, and Donald Geman. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21(20):3896–3904, 08 2005

  23. [23]

    A weighted support vector machine method for control chart pattern recognition

    Petros Xanthopoulos and Talayeh Razzaghi. A weighted support vector machine method for control chart pattern recognition. Computers & Industrial Engineering , 70:134–149, 2014. 16