Quantum Kernel Advantage over Classical Collapse in Medical Foundation Model Embeddings
Pith reviewed 2026-05-08 03:52 UTC · model grok-4.3
The pith
Quantum support vector machines avoid classical majority-class collapse on imbalanced chest X-ray insurance classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In binary insurance classification on MIMIC-CXR using PCA-q features from MedSigLIP-448, RAD-DINO, and ViT-patch32, the quantum kernel in QSVM produces a higher effective rank than the linear kernel and thereby prevents the collapse to majority-class prediction that occurs with the classical linear SVM regardless of regularization parameter C. Across every tested qubit count and embedding source, untuned QSVM records higher minority-class F1 than untuned linear SVM, with a mean gain of 0.293 at q=11 on MedSigLIP-448, and still outperforms a tuned RBF SVM in all seven Tier-2 comparisons.
What carries the argument
The quantum kernel Gram matrix obtained by applying a feature map to the PCA-reduced q-dimensional embeddings from the foundation models, whose eigenspectrum yields an effective rank up to 69.80 while the classical linear kernel rank stays low and invariant to C.
If this is right
- The classical linear kernel collapses to majority-class prediction on 90-100 percent of seeds at every qubit count and remains C-invariant.
- QSVM maintains non-trivial recall and wins minority F1 in all 18 Tier-1 configurations, 17 at p less than 0.001.
- At q=11 with MedSigLIP-448, mean QSVM F1 reaches 0.343 versus 0.050 for the linear kernel.
- Under Tier 2, untuned QSVM still wins all seven tested configurations against C-tuned RBF SVM with mean gain 0.068.
- A full qubit sweep shows architecture-dependent concentration onset across the three embedding models.
Where Pith is reading between the lines
- If the rank advantage survives on hardware, quantum kernels could reduce reliance on extensive hyperparameter search for class-imbalanced medical tasks.
- The same mechanism might apply to other high-dimensional outputs from foundation models beyond radiographs.
- Testing whether the effective-rank gap closes under realistic noise would directly test whether the observed separation is hardware-limited or fundamental to the kernel construction.
- Extending the comparison to multi-class or regression versions of the same embeddings could show whether the collapse-avoidance property generalizes.
Load-bearing premise
That noiseless simulation of the quantum kernel after PCA reduction gives a representative test of advantage that would hold on real hardware for this medical task and dataset.
What would settle it
Running the identical QSVM pipeline on current noisy quantum hardware and observing that its minority-class F1 falls below the tuned classical RBF SVM for the same embeddings and qubit counts.
Figures
read the original abstract
We provide evidence of quantum kernel advantage under noiseless simulation in binary insurance classification on MIMIC-CXR chest radiographs using quantum support vector machines (QSVM) with frozen embeddings from three medical foundation models (MedSigLIP-448, RAD-DINO, ViT-patch32). We propose a two-tier fair comparison framework in which both classifiers receive identical PCA-q features. At Tier 1 (untuned QSVM vs. untuned linear SVM, C = 1 both sides), QSVM wins minority-class F1 in all 18 tested configurations (17 at p < 0.001, 1 at p < 0.01). The classical linear kernel collapses to majority-class prediction on 90-100% of seeds at every qubit count, while QSVM maintains non-trivial recall. At q = 11 (MedSigLIP-448 plateau center), QSVM achieves mean F1 = 0.343 vs. classical F1 = 0.050 (F1 gain = +0.293, p < 0.001) without hyperparameter tuning. Under Tier 2 (untuned QSVM vs. C-tuned RBF SVM), QSVM wins all seven tested configurations (mean gain +0.068, max +0.112). Eigenspectrum analysis reveals quantum kernel effective rank reaches 69.80 at q = 11, far exceeding linear kernel rank, while classical collapse remains C-invariant. A full qubit sweep reveals architecture-dependent concentration onset across models. Code: https://github.com/sebasmos/qml-medimage
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims empirical evidence of quantum kernel advantage in noiseless simulations of QSVMs applied to PCA-reduced embeddings from medical foundation models (MedSigLIP-448, RAD-DINO, ViT-patch32) for binary insurance classification on the MIMIC-CXR dataset. Using a two-tier comparison framework with identical PCA-q features for quantum and classical models, it reports that untuned QSVM outperforms untuned linear SVM (C=1) in minority-class F1 across all 18 configurations (with large gains, e.g., +0.293 at q=11 for MedSigLIP-448), while classical kernels collapse to majority-class predictions; QSVM also beats C-tuned RBF SVM in all 7 tested cases. Eigenspectrum analysis shows quantum kernels achieve much higher effective rank (~69.8 at q=11) than classical ones, explaining the non-collapse, with architecture-dependent concentration in qubit sweeps. Code is provided for reproducibility.
Significance. If the results hold under the stated conditions, the work offers a clear, reproducible demonstration of how quantum kernels can mitigate the collapse problem in imbalanced medical classification tasks where classical kernels fail, supported by consistent statistical significance across models and seeds. The two-tier design, effective-rank explanation, and linked code are strengths that make the empirical claims more credible than typical quantum ML benchmarks. This could encourage targeted follow-up on quantum methods for healthcare embeddings, though the noiseless scope limits immediate practical impact.
major comments (2)
- [Results and eigenspectrum analysis] The central results rest on noiseless simulation of the quantum kernel; while the paper scopes its claims appropriately, the effective-rank advantage (reaching 69.80 at q=11) and F1 gains may not persist under realistic noise or hardware constraints, which could induce concentration or rank reduction not captured here. A brief analysis or caveat on this point in the discussion would strengthen the interpretation of the qubit-sweep results.
- [Methods] Full experimental details on embedding extraction from the foundation models, exact train/test splits of MIMIC-CXR, and the precise implementation of the quantum kernel (e.g., feature map and circuit depth) are referenced only via the code repository. These should be summarized in the methods section to allow verification of the 18 configurations and the PCA-q reduction without external access, as they are load-bearing for reproducing the reported F1 values and p-values.
minor comments (2)
- [Results] Ensure that all 18 Tier-1 and 7 Tier-2 configurations are explicitly tabulated or referenced to specific figures/tables, including the exact q values and models tested, to improve clarity of the 'all configurations' claim.
- [Abstract and results] The abstract states 'architecture-dependent concentration onset' but does not specify the onset qubit counts per model; adding this detail or a reference to the relevant figure would aid readers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. Both major points have been addressed by adding a targeted caveat in the Discussion and expanding the Methods section with the requested experimental details.
read point-by-point responses
-
Referee: [Results and eigenspectrum analysis] The central results rest on noiseless simulation of the quantum kernel; while the paper scopes its claims appropriately, the effective-rank advantage (reaching 69.80 at q=11) and F1 gains may not persist under realistic noise or hardware constraints, which could induce concentration or rank reduction not captured here. A brief analysis or caveat on this point in the discussion would strengthen the interpretation of the qubit-sweep results.
Authors: We agree that an explicit caveat strengthens interpretation of the qubit-sweep results. We have added a concise paragraph in the Discussion section noting that the reported effective-rank advantage and F1 gains are obtained under noiseless simulation and that hardware noise could induce additional concentration or rank reduction not captured in the present experiments. This addition clarifies the scope without changing the core empirical claims. revision: yes
-
Referee: [Methods] Full experimental details on embedding extraction from the foundation models, exact train/test splits of MIMIC-CXR, and the precise implementation of the quantum kernel (e.g., feature map and circuit depth) are referenced only via the code repository. These should be summarized in the methods section to allow verification of the 18 configurations and the PCA-q reduction without external access, as they are load-bearing for reproducing the reported F1 values and p-values.
Authors: We accept this recommendation. The revised Methods section now includes a self-contained summary of the embedding extraction pipelines for MedSigLIP-448, RAD-DINO, and ViT-patch32; the precise MIMIC-CXR train/test split (including patient-level stratification and seed handling); and the quantum feature map together with circuit depth and PCA-q reduction procedure. These additions enable direct verification of all 18 configurations and reported statistics without external code access. revision: yes
Circularity Check
No significant circularity
full rationale
The paper reports empirical performance comparisons (minority-class F1 scores across 18 configurations) and direct eigenspectrum measurements (effective rank of quantum vs. classical kernels) on PCA-reduced embeddings from medical foundation models. No derivation chain, first-principles prediction, or ansatz is claimed that reduces by the paper's own equations to fitted inputs or self-citations. The central observations (QSVM non-collapse, rank gap of ~69.8, architecture-dependent concentration) are independent experimental outputs, not constructed from the performance metrics or prior self-citations. Code and statistics over seeds are provided, confirming the results are self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- C=1
- q (PCA components / qubits)
axioms (2)
- domain assumption Noiseless quantum simulation faithfully represents the ideal quantum kernel matrix for the given feature map.
- domain assumption PCA-q reduction preserves the relevant discriminative information equally for quantum and classical kernels.
Reference graph
Works this paper leans on
-
[1]
C´ orcoles, Kristan Temme, Aram W
Vojtˇ ech Havl´ ıˇ cek, Antonio D. C´ orcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow, and Jay M. Gambetta. Supervised learning with quantum- enhanced feature spaces.Nature, 567(7747):209–212, 2019
2019
-
[2]
Schuld and N
M. Schuld and N. Killoran. Quantum machine learn- ing in feature Hilbert spaces.Physical Review Letters, 122:040504, 2019
2019
-
[3]
M. Schuld. Supervised quantum machine learning models are kernel methods, 2021
2021
-
[4]
A rigorous and robust quantum speed-up in supervised machine learning.Nature Physics, 17(9):1013– 1017, 2021
Yunchao Liu, Srinivasan Arunachalam, and Kristan Temme. A rigorous and robust quantum speed-up in supervised machine learning.Nature Physics, 17(9):1013– 1017, 2021
2021
-
[5]
Jerbi, L
S. Jerbi, L. J. Fiderer, H. Poulsen Nautrup, J. M. K¨ ubler, H. J. Briegel, and V. Dunjko. Quantum machine learning beyond kernel methods.Nature Communications, 14:517, 2023
2023
-
[6]
Better than classical? the subtle artofbenchmarkingquantummachinelearningmodels
Joseph Bowles, Shahnawaz Ahmed, and Maria Schuld. Better than classical? the subtle art of benchmark- ing quantum machine learning models.arXiv preprint arXiv:2403.07059, 2024
-
[7]
Embedding aware quantum classical svms for scalable quantum machine learning
Sebasti´ an Andr´ es Cajas Ord´ o˜ nez, Luis Fernando Torres Torres, Mario Bifulco, Carlos Andres Duran, Cristian Bosch, and Ricardo Simon Carbajo. Embedding aware quantum classical svms for scalable quantum machine learning. In Marco Baioletti, Miguel Angel Gonzalez, Corrado Loglisci, Angelo Oddi, Riccardo Rasconi, and Ramiro Varela, editors,Proceedings ...
2025
-
[8]
Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih- Ying Deng, Roger G. Mark, and Steven Horng. MIMIC- CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific Data, 6:317, 2019
2019
-
[9]
A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-Y. Deng, Y. Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs, 2019
2019
-
[10]
Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghas- semi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P
Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghas- semi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Triv...
2022
-
[11]
Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types
Chi-Yu Chen, Rawan Abulibdeh, Arash Asgari, Sebasti´ an Andr´ es Cajas Ord´ o˜ nez, Leo Anthony Celi, Deirdre Goode, Hassan Hamidi, Laleh Seyyed-Kalantari, Ned McCague, Thomas Sounack, et al. Algorithms trained on normal chest x-rays can predict health insurance types.arXiv preprint arXiv:2511.11030, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
A causal perspective on dataset bias in machine learning for medical imaging.Nature Machine Intelligence, 6(2):138– 146, 2024
Charles Jones, Daniel C Castro, Fabio De Sousa Ribeiro, Ozan Oktay, Melissa McCradden, and Ben Glocker. A causal perspective on dataset bias in machine learning for medical imaging.Nature Machine Intelligence, 6(2):138– 146, 2024
2024
-
[13]
Laleh Seyyed-Kalantari, Haoran Zhang, Matthew B. A. McDermott, Irene Y. Chen, and Marzyeh Ghassemi. Un- derdiagnosis bias of artificial intelligence algorithms ap- plied to chest radiographs in under-served patient popu- lations.Nature Medicine, 27(12):2176–2182, 2021
2021
-
[14]
Dissecting racial bias in an algo- rithm used to manage the health of populations.Science, 366(6464):447–453, 2019
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in an algo- rithm used to manage the health of populations.Science, 366(6464):447–453, 2019
2019
-
[15]
Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023
2023
-
[16]
Exploring scal- able medical image encoders beyond text supervision
Fernando P´ erez-Garc´ ıa, Harshita Sharma, Sam Bond- Taylor, Kenza Bouzid, Valentina Salvatelli, Maxim- ilian Ilse, Shruthi Bannur, Daniel C Castro, Anton Schwaighofer, Matthew P Lungren, et al. Exploring scal- able medical image encoders beyond text supervision. Nature Machine Intelligence, 7(1):119–130, 2025
2025
-
[17]
Dosovitskiy, L
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16 ×16 words: Transformers for image recognition at scale. InProceedings of the International Conference on Learning Representations (ICLR), 2021
2021
-
[18]
Exponential concentration in quantum kernel methods.Nature Communications, 15(1):5200, 2024
Supanut Thanasilp, Samson Wang, Marco Cerezo, and Zo¨ e Holmes. Exponential concentration in quantum kernel methods.Nature Communications, 15(1):5200, 2024
2024
-
[19]
Huang, M
H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, H. Neven, and J. R. McClean. Power of data in quantum machine learning.Nature Communications, 12:2631, 2021
2021
-
[20]
The inductive bias of quantum kernels.Advances in Neural Information Processing Systems, 34:12661–12673, 14 2021
Jonas K¨ ubler, Simon Buchholz, and Bernhard Sch¨ olkopf. The inductive bias of quantum kernels.Advances in Neural Information Processing Systems, 34:12661–12673, 14 2021
2021
-
[21]
Larocca, S
M. Larocca, S. Thanasilp, S. Wang, K. Sharma, J. Bia- monte, P. J. Coles, L. Cincio, J. R. McClean, Z. Holmes, and M. Cerezo. Barren plateaus in variational quantum computing.Nature Reviews Physics, 7:174–189, 2025
2025
-
[22]
The power of quantum neural networks.Nature Computational Science, 1:403–409, 2021
Amira Abbas, David Sutter, Christa Zoufal, Aurelien Lucchi, Alessio Figalli, and Stefan Woerner. The power of quantum neural networks.Nature Computational Science, 1:403–409, 2021
2021
-
[23]
Peral-Garc´ ıa, J
D. Peral-Garc´ ıa, J. Cruz-Benito, and F. J. Garc´ ıa-Pe˜ nalvo. Systematic literature review: Quantum machine learning and its applications.Computer Science Review, 51:100619, 2024
2024
-
[24]
Senokosov, A
A. Senokosov, A. Sedykh, A. Sagingalieva, B. Kyriacou, and A. Melnikov. Quantum machine learning for image classification.Machine Learning: Science and Technology, 5:015040, 2024
2024
-
[25]
Vapnik.The Nature of Statistical Learning Theory
Vladimir N. Vapnik.The Nature of Statistical Learning Theory. Springer, New York, 2nd edition, 1998
1998
-
[26]
Sch¨ olkopf and A
B. Sch¨ olkopf and A. J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002
2002
-
[27]
Coleman, C
C. Coleman, C. Yeh, S. Mussmann, B. Mirzasoleiman, P. Bailis, P. Liang, J. Leskovec, and M. Zaharia. Selection via proxy: Efficient data selection for deep learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2020
2020
-
[28]
Sokolova and G
M. Sokolova and G. Lapalme. A systematic analysis of performance measures for classification tasks.Information Processing & Management, 45(4):427–437, 2009
2009
-
[29]
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.BMC Genomics, 21:6, 2020
Davide Chicco and Giuseppe Jurman. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.BMC Genomics, 21:6, 2020
2020
-
[30]
Predicting no-shows at out- patient appointments in internal medicine using machine learning models.PeerJ Computer Science, 11:e2762, 2025
Felipe Ocampo Osorio, Santiago Pedroza Gomez, David Esteban Rebell´ on Sanchez, Richard Ramirez Fernandez, Reinel Tabares-Soto, Mario Alejandro Bravo-Ort´ ız, and Gustavo Adolfo Cruz Suarez. Predicting no-shows at out- patient appointments in internal medicine using machine learning models.PeerJ Computer Science, 11:e2762, 2025
2025
-
[31]
Barren plateaus in quantum neural network training landscapes.Nature Communications, 9(1):4812, 2018
Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum neural network training landscapes.Nature Communications, 9(1):4812, 2018. Appendix A: Supplementary Figures This appendix collects additional figures that comple- ment the main text. All experiments use DT9 prepro- cessing, seed 0, and trace n...
2018
-
[32]
These complement the MedSigLIP q = 6 spectrum shown in the main text (Figure 3)
Quantum Kernel Eigenspectra (All Models) Figure 6 shows the quantum kernel eigenvalue spectra for all three embedding models at q = 4 and q = 6. These complement the MedSigLIP q = 6 spectrum shown in the main text (Figure 3)
-
[33]
Quantum Kernel Heatmaps (All Models) Figure 9 shows the quantum kernel matrices KQ at q= 4 andq= 6 for all three models
-
[34]
The substantial class overlap visible in every panel provides a geometric explanation for why the linear kernel collapses
PCA F eature Space: Class Separation atq= 4 andq= 6 Figure 10 shows the PCA-compressed training data at q = 4 and q = 6 for all three models. The substantial class overlap visible in every panel provides a geometric explanation for why the linear kernel collapses
-
[35]
PCA Geometry of MedSigLIP-448 atq= 2
-
[36]
ViT-patch32-GAP Pooling Ablation To assess the effect of pooling strategy on quantum kernel performance, we evaluate a global average pooling (GAP) variant of ViT-patch32 alongside the CLS-token variant reported in the main text. Both variants produce 768-dimensional embeddings from the same frozen ViT- patch32 backbone; the only difference is the aggrega...
-
[37]
ViT-patch16-cls Patch-Size Ablation To assess the effect of patch size on quantum ker- nel performance, we evaluate a ViT with patch size 16 (ViT-patch16-cls, 768-dimensional CLS-token embed- dings) alongside the ViT-patch32-cls variant reported in the main text. Both variants use the same frozen ViT backbone architecture; the only difference is the spati...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.