Hybrid Quantum--Classical k-Means Clustering via Quantum Feature Maps
Pith reviewed 2026-05-10 18:25 UTC · model grok-4.3
The pith
Quantum kernels from feature maps enhance k-means clustering stability and accuracy on standard datasets
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding classical data points into quantum states via feature maps and using the resulting quantum kernel for similarity, the modified k-means algorithm achieves improved clustering stability and competitive accuracy relative to the classical version, with the SU2 map specifically delivering 88.6% accuracy on Iris and 91.0% on breast cancer even on shallow NISQ-feasible circuits.
What carries the argument
The quantum kernel, defined as the inner product between two quantum states obtained after applying a feature map to classical data points, which replaces the classical Euclidean distance in the k-means objective.
If this is right
- The approach operates successfully on shallow circuits feasible for current NISQ hardware.
- Clustering stability is improved over the classical algorithm on the tested datasets.
- Competitive accuracies are achieved, including 88.6% on Iris and 91.0% on breast cancer with the SU2 feature map.
- Quantum kernels create a richer similarity landscape than traditional distance metrics for clustering.
Where Pith is reading between the lines
- This hybrid method could be applied to other unsupervised learning tasks where capturing non-linear structures is key.
- Testing on higher-dimensional or more complex datasets might reveal where the quantum advantage in separability becomes more pronounced.
- Comparing performance across additional feature maps could help optimize the choice for specific data types.
Load-bearing premise
The chosen quantum feature maps produce a similarity measure that genuinely captures cluster structure better than Euclidean distance, leading to measurable gains in stability and accuracy.
What would settle it
Repeating the clustering experiments on the Iris and breast cancer datasets many times and observing that the quantum version shows no statistically significant improvement in stability or accuracy compared to classical k-means would falsify the central claim.
Figures
read the original abstract
Clustering is one of the most fundamental tasks in machine learning, and the k-means clustering algorithm is perhaps one of the most widely used clustering algorithms. However, it suffers from several limitations, such as sensitivity to centroid initialization, difficulty capturing non-linear structure, and poor performance in high-dimensional spaces. Recent work has proposed improved initialization strategies and quantum-assisted distance computation, but the similarity metric itself has largely remained classical. In this study, we propose a quantum-enhanced variant of k-means that replaces the Euclidean distance with a quantum kernel derived from the inner product between feature-mapped quantum states. Using the Iris dataset, we use multiple quantum feature maps, including entangled SU2 and ZZ circuits, to embed classical data into a higher-dimensional Hilbert space where cluster structures become more separable. We will also be testing using another dataset, namely the breast cancer dataset. Similarity between data points is computed through the inner product between two states. Our results show that this approach achieves improved clustering stability and competitive accuracy compared to the classical algorithm, with the SU2 feature map yielding an accuracy of 88.6 % on the Iris dataset and 91.0 % on the breast cancer dataset, despite operating on NISQ-feasible shallow circuits. These findings suggest that quantum kernels provide a richer similarity landscape than traditional distance metrics, offering a promising path toward more robust unsupervised learning in the NISQ era.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hybrid quantum-classical k-means algorithm that replaces Euclidean distance with a quantum kernel obtained from the inner product of states prepared by SU2 and ZZ feature maps. It evaluates the approach on the Iris and breast-cancer datasets and reports improved clustering stability together with accuracies of 88.6 % (Iris) and 91.0 % (breast cancer) using shallow, NISQ-feasible circuits.
Significance. If the quantum kernels can be shown to supply a demonstrably richer similarity measure than classical kernels on the same data, the work would provide a concrete, reproducible example of a NISQ-era unsupervised-learning primitive. The manuscript already supplies explicit circuit descriptions and public datasets, which are positive attributes for reproducibility.
major comments (2)
- [Results] The experimental evaluation (implicitly the Results section) reports point accuracies of 88.6 % and 91.0 % but supplies neither error bars, number of random initializations, nor any description of data preprocessing or convergence criteria. Without these statistics the central performance claim cannot be assessed.
- [Methodology / Experiments] The paper compares the quantum-kernel k-means only to classical Euclidean k-means. Because any positive-definite kernel (e.g., classical RBF) can induce a non-linear embedding, the absence of a classical-kernel baseline leaves open whether the observed stability and accuracy gains are attributable to the quantum feature maps or simply to kernelization in general. This comparison is load-bearing for the claim that the quantum kernels provide a “richer similarity landscape.”
minor comments (1)
- [Abstract] The abstract uses future tense (“we will also be testing”) while simultaneously presenting numerical results; the tense should be made consistent.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work. We address each major comment below and commit to revisions that strengthen the experimental reporting and comparative analysis.
read point-by-point responses
-
Referee: [Results] The experimental evaluation (implicitly the Results section) reports point accuracies of 88.6 % and 91.0 % but supplies neither error bars, number of random initializations, nor any description of data preprocessing or convergence criteria. Without these statistics the central performance claim cannot be assessed.
Authors: We agree that the absence of statistical details limits the interpretability of the reported accuracies. In the revised manuscript we will report error bars derived from 20 independent runs with different random centroid initializations, explicitly state the number of initializations and the selection of the best result, describe the preprocessing pipeline (including standardization to zero mean and unit variance), and specify the convergence criteria (maximum of 300 iterations or centroid displacement below 1e-4). These additions will allow proper assessment of the 88.6 % (Iris) and 91.0 % (breast-cancer) figures. revision: yes
-
Referee: [Methodology / Experiments] The paper compares the quantum-kernel k-means only to classical Euclidean k-means. Because any positive-definite kernel (e.g., classical RBF) can induce a non-linear embedding, the absence of a classical-kernel baseline leaves open whether the observed stability and accuracy gains are attributable to the quantum feature maps or simply to kernelization in general. This comparison is load-bearing for the claim that the quantum kernels provide a “richer similarity landscape.”
Authors: The referee correctly identifies that a classical-kernel baseline is needed to isolate the contribution of the quantum feature maps. While the manuscript focuses on contrasting the quantum kernel against the standard Euclidean metric used in classical k-means, we acknowledge that this does not rule out generic kernelization effects. In the revised version we will add a kernel k-means baseline employing the classical RBF kernel with bandwidth tuned via cross-validation on the same datasets. We will also discuss how the SU2 and ZZ maps introduce entanglement-induced correlations that are not directly replicated by classical RBF kernels, thereby clarifying the specific advantage of the quantum approach on NISQ hardware. revision: yes
Circularity Check
No circularity: empirical evaluation on public datasets is independent of internal definitions
full rationale
The paper proposes replacing Euclidean distance in k-means with a quantum kernel obtained from inner products of states prepared by SU2 and ZZ feature maps, then reports direct accuracy and stability measurements on the Iris (88.6 % with SU2) and breast-cancer (91.0 %) datasets. These quantities are obtained by executing the algorithm on fixed external data and comparing against classical Euclidean k-means; they are not obtained by fitting parameters inside the paper's own equations and then re-using those fitted values as “predictions.” No self-citation is invoked to justify a uniqueness theorem or an ansatz, and the central claim rests on observable performance numbers rather than on any reduction of the form “Eq. X equals the input data by construction.” The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Quantum feature maps embed classical data into a Hilbert space where inner-product kernels improve cluster separability
Reference graph
Works this paper leans on
-
[1]
Esma A¨ ımeur, Gilles Brassard, and S´ ebastien Gambs. Quantum clustering algorithms. InPro- ceedings of the 24th International Conference on Machine Learning, pages 1–8. ACM, 2007
work page 2007
-
[2]
Implementing quantum k-means clustering algorithm, 2024
Thi Bach Duong Bui. Implementing quantum k-means clustering algorithm, 2024. Bachelor’s Thesis
work page 2024
-
[3]
q-means: A quan- tum algorithm for unsupervised machine learn- ing
Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash. q-means: A quan- tum algorithm for unsupervised machine learn- ing. InAdvances in Neural Information Process- ing Systems (NeurIPS 2019), 2019
work page 2019
-
[4]
Sumsam Ullah Khan, Ahsan Javed Awan, and Gemma Vall-Llosera. K-means clustering on noisy intermediate scale quantum computers.arXiv preprint arXiv:1909.12183, 2019
-
[5]
Riva Shalom, and Michal Cha- lamish
Avivit Levy, B. Riva Shalom, and Michal Cha- lamish. A guide to similarity measures.arXiv preprint arXiv:2408.07706, 2024
-
[6]
A clustering method based on k-means algorithm.Physics Procedia, 25:1104–1109, 2012
Youguo Li and Haiyan Wu. A clustering method based on k-means algorithm.Physics Procedia, 25:1104–1109, 2012
work page 2012
-
[7]
Del Corso, and Riccardo Guidotti
Alessandro Poggiali, Alessandro Berti, Anna Bernasconi, Gianna M. Del Corso, and Riccardo Guidotti. Quantum clustering with k-means: A hybrid approach.Theoretical Computer Science, 992:114466, 2024
work page 2024
-
[8]
Hongfei Zhang and Mingwei Li. Improvement of k-means clustering algorithm based on quantum state similarity measurement.Advances in Com- puter, Signals and Systems, 9(2):10–18, 2025
work page 2025
-
[9]
Md Zubair, MD. Asif Iqbal, Avijeet Shil, M. J. M. Chowdhury, Mohammad Ali Moni, and Iqbal H. Sarker. An improved k-means clustering algo- rithm towards an efficient data-driven modeling. Annals of Data Science, 11(5):1525–1544, 2024. 9
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.