pith. sign in

arxiv: 2604.07873 · v1 · submitted 2026-04-09 · 🪐 quant-ph

Hybrid Quantum--Classical k-Means Clustering via Quantum Feature Maps

Pith reviewed 2026-05-10 18:25 UTC · model grok-4.3

classification 🪐 quant-ph
keywords hybrid quantum-classicalk-means clusteringquantum feature mapsquantum kernelsNISQclustering stabilityIris datasetbreast cancer dataset
0
0 comments X

The pith

Quantum kernels from feature maps enhance k-means clustering stability and accuracy on standard datasets

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hybrid quantum-classical version of k-means clustering by replacing the Euclidean distance with a quantum kernel computed from inner products of states prepared by quantum feature maps. Data from the Iris and breast cancer datasets is embedded into a higher-dimensional Hilbert space using maps like SU2 and ZZ to make clusters more separable. Results indicate better stability and accuracies of 88.6 percent for Iris and 91.0 percent for breast cancer using shallow circuits suitable for NISQ devices. This matters because it points to a way quantum computing can assist machine learning tasks without requiring error-corrected large-scale hardware.

Core claim

By embedding classical data points into quantum states via feature maps and using the resulting quantum kernel for similarity, the modified k-means algorithm achieves improved clustering stability and competitive accuracy relative to the classical version, with the SU2 map specifically delivering 88.6% accuracy on Iris and 91.0% on breast cancer even on shallow NISQ-feasible circuits.

What carries the argument

The quantum kernel, defined as the inner product between two quantum states obtained after applying a feature map to classical data points, which replaces the classical Euclidean distance in the k-means objective.

If this is right

  • The approach operates successfully on shallow circuits feasible for current NISQ hardware.
  • Clustering stability is improved over the classical algorithm on the tested datasets.
  • Competitive accuracies are achieved, including 88.6% on Iris and 91.0% on breast cancer with the SU2 feature map.
  • Quantum kernels create a richer similarity landscape than traditional distance metrics for clustering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This hybrid method could be applied to other unsupervised learning tasks where capturing non-linear structures is key.
  • Testing on higher-dimensional or more complex datasets might reveal where the quantum advantage in separability becomes more pronounced.
  • Comparing performance across additional feature maps could help optimize the choice for specific data types.

Load-bearing premise

The chosen quantum feature maps produce a similarity measure that genuinely captures cluster structure better than Euclidean distance, leading to measurable gains in stability and accuracy.

What would settle it

Repeating the clustering experiments on the Iris and breast cancer datasets many times and observing that the quantum version shows no statistically significant improvement in stability or accuracy compared to classical k-means would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.07873 by Alisha Baba, Muhammad Faryad, Muhammad Siddique, Syed M. Abdullah.

Figure 1
Figure 1. Figure 1: ZZ feature map with linear entangle￾ment [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Efficient SU2 feature map. 3.4 Quantum Kernel Computation For a given feature map Uϕ(x), each data point is encoded as |ψ(x)⟩ = Uϕ(x)|0⟩ ⊗4 . To measure the similarity between any two samples xi and xj , we computed the fidelity K(xi , xj ) = |⟨ψ(xi)|ψ(xj )⟩|2 . 3.5 Quantum-assisted K-means The pseudocode for the quantum-assisted quantum k-means algorithm is presented in Algorithm 2. Since we have tried mu… view at source ↗
Figure 3
Figure 3. Figure 3: Ground truth and the predicted la￾bels for the Iris dataset using the highest￾accuracy feature map [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ground truth and the predicted la￾bels for the breast cancer dataset using the highest-accuracy feature map. The final labels and ground truth obtained for the highest-accuracy feature map are presented in Figs. 3 and 4. The centroids were not placed correctly during the first iterations, but with each successive iteration, the centroids were rightly placed at the center of each cluster. This demonstrates … view at source ↗
Figure 6
Figure 6. Figure 6: Confusion matrix for the breast can￾cer dataset using the highest-accuracy feature map. We also compared the clustering accuracy with quantum k-means with baseline classical k-means. As can be seen from Tables 4 and 5, quantum k-means provide better accuracies than classical baseline k￾means [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Clustering is one of the most fundamental tasks in machine learning, and the k-means clustering algorithm is perhaps one of the most widely used clustering algorithms. However, it suffers from several limitations, such as sensitivity to centroid initialization, difficulty capturing non-linear structure, and poor performance in high-dimensional spaces. Recent work has proposed improved initialization strategies and quantum-assisted distance computation, but the similarity metric itself has largely remained classical. In this study, we propose a quantum-enhanced variant of k-means that replaces the Euclidean distance with a quantum kernel derived from the inner product between feature-mapped quantum states. Using the Iris dataset, we use multiple quantum feature maps, including entangled SU2 and ZZ circuits, to embed classical data into a higher-dimensional Hilbert space where cluster structures become more separable. We will also be testing using another dataset, namely the breast cancer dataset. Similarity between data points is computed through the inner product between two states. Our results show that this approach achieves improved clustering stability and competitive accuracy compared to the classical algorithm, with the SU2 feature map yielding an accuracy of 88.6 % on the Iris dataset and 91.0 % on the breast cancer dataset, despite operating on NISQ-feasible shallow circuits. These findings suggest that quantum kernels provide a richer similarity landscape than traditional distance metrics, offering a promising path toward more robust unsupervised learning in the NISQ era.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a hybrid quantum-classical k-means algorithm that replaces Euclidean distance with a quantum kernel obtained from the inner product of states prepared by SU2 and ZZ feature maps. It evaluates the approach on the Iris and breast-cancer datasets and reports improved clustering stability together with accuracies of 88.6 % (Iris) and 91.0 % (breast cancer) using shallow, NISQ-feasible circuits.

Significance. If the quantum kernels can be shown to supply a demonstrably richer similarity measure than classical kernels on the same data, the work would provide a concrete, reproducible example of a NISQ-era unsupervised-learning primitive. The manuscript already supplies explicit circuit descriptions and public datasets, which are positive attributes for reproducibility.

major comments (2)
  1. [Results] The experimental evaluation (implicitly the Results section) reports point accuracies of 88.6 % and 91.0 % but supplies neither error bars, number of random initializations, nor any description of data preprocessing or convergence criteria. Without these statistics the central performance claim cannot be assessed.
  2. [Methodology / Experiments] The paper compares the quantum-kernel k-means only to classical Euclidean k-means. Because any positive-definite kernel (e.g., classical RBF) can induce a non-linear embedding, the absence of a classical-kernel baseline leaves open whether the observed stability and accuracy gains are attributable to the quantum feature maps or simply to kernelization in general. This comparison is load-bearing for the claim that the quantum kernels provide a “richer similarity landscape.”
minor comments (1)
  1. [Abstract] The abstract uses future tense (“we will also be testing”) while simultaneously presenting numerical results; the tense should be made consistent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work. We address each major comment below and commit to revisions that strengthen the experimental reporting and comparative analysis.

read point-by-point responses
  1. Referee: [Results] The experimental evaluation (implicitly the Results section) reports point accuracies of 88.6 % and 91.0 % but supplies neither error bars, number of random initializations, nor any description of data preprocessing or convergence criteria. Without these statistics the central performance claim cannot be assessed.

    Authors: We agree that the absence of statistical details limits the interpretability of the reported accuracies. In the revised manuscript we will report error bars derived from 20 independent runs with different random centroid initializations, explicitly state the number of initializations and the selection of the best result, describe the preprocessing pipeline (including standardization to zero mean and unit variance), and specify the convergence criteria (maximum of 300 iterations or centroid displacement below 1e-4). These additions will allow proper assessment of the 88.6 % (Iris) and 91.0 % (breast-cancer) figures. revision: yes

  2. Referee: [Methodology / Experiments] The paper compares the quantum-kernel k-means only to classical Euclidean k-means. Because any positive-definite kernel (e.g., classical RBF) can induce a non-linear embedding, the absence of a classical-kernel baseline leaves open whether the observed stability and accuracy gains are attributable to the quantum feature maps or simply to kernelization in general. This comparison is load-bearing for the claim that the quantum kernels provide a “richer similarity landscape.”

    Authors: The referee correctly identifies that a classical-kernel baseline is needed to isolate the contribution of the quantum feature maps. While the manuscript focuses on contrasting the quantum kernel against the standard Euclidean metric used in classical k-means, we acknowledge that this does not rule out generic kernelization effects. In the revised version we will add a kernel k-means baseline employing the classical RBF kernel with bandwidth tuned via cross-validation on the same datasets. We will also discuss how the SU2 and ZZ maps introduce entanglement-induced correlations that are not directly replicated by classical RBF kernels, thereby clarifying the specific advantage of the quantum approach on NISQ hardware. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on public datasets is independent of internal definitions

full rationale

The paper proposes replacing Euclidean distance in k-means with a quantum kernel obtained from inner products of states prepared by SU2 and ZZ feature maps, then reports direct accuracy and stability measurements on the Iris (88.6 % with SU2) and breast-cancer (91.0 %) datasets. These quantities are obtained by executing the algorithm on fixed external data and comparing against classical Euclidean k-means; they are not obtained by fitting parameters inside the paper's own equations and then re-using those fitted values as “predictions.” No self-citation is invoked to justify a uniqueness theorem or an ansatz, and the central claim rests on observable performance numbers rather than on any reduction of the form “Eq. X equals the input data by construction.” The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that quantum feature maps can produce a more separable similarity structure than Euclidean distance; no free parameters are fitted inside the paper and no new entities are postulated.

axioms (1)
  • domain assumption Quantum feature maps embed classical data into a Hilbert space where inner-product kernels improve cluster separability
    Invoked when the paper replaces Euclidean distance with the quantum kernel derived from feature-mapped states.

pith-pipeline@v0.9.0 · 5549 in / 1154 out tokens · 55884 ms · 2026-05-10T18:25:53.912821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

  1. [1]

    Quantum clustering algorithms

    Esma A¨ ımeur, Gilles Brassard, and S´ ebastien Gambs. Quantum clustering algorithms. InPro- ceedings of the 24th International Conference on Machine Learning, pages 1–8. ACM, 2007

  2. [2]

    Implementing quantum k-means clustering algorithm, 2024

    Thi Bach Duong Bui. Implementing quantum k-means clustering algorithm, 2024. Bachelor’s Thesis

  3. [3]

    q-means: A quan- tum algorithm for unsupervised machine learn- ing

    Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash. q-means: A quan- tum algorithm for unsupervised machine learn- ing. InAdvances in Neural Information Process- ing Systems (NeurIPS 2019), 2019

  4. [4]

    K-means clustering on noisy intermediate scale quantum computers.arXiv preprint arXiv:1909.12183, 2019

    Sumsam Ullah Khan, Ahsan Javed Awan, and Gemma Vall-Llosera. K-means clustering on noisy intermediate scale quantum computers.arXiv preprint arXiv:1909.12183, 2019

  5. [5]

    Riva Shalom, and Michal Cha- lamish

    Avivit Levy, B. Riva Shalom, and Michal Cha- lamish. A guide to similarity measures.arXiv preprint arXiv:2408.07706, 2024

  6. [6]

    A clustering method based on k-means algorithm.Physics Procedia, 25:1104–1109, 2012

    Youguo Li and Haiyan Wu. A clustering method based on k-means algorithm.Physics Procedia, 25:1104–1109, 2012

  7. [7]

    Del Corso, and Riccardo Guidotti

    Alessandro Poggiali, Alessandro Berti, Anna Bernasconi, Gianna M. Del Corso, and Riccardo Guidotti. Quantum clustering with k-means: A hybrid approach.Theoretical Computer Science, 992:114466, 2024

  8. [8]

    Improvement of k-means clustering algorithm based on quantum state similarity measurement.Advances in Com- puter, Signals and Systems, 9(2):10–18, 2025

    Hongfei Zhang and Mingwei Li. Improvement of k-means clustering algorithm based on quantum state similarity measurement.Advances in Com- puter, Signals and Systems, 9(2):10–18, 2025

  9. [9]

    Asif Iqbal, Avijeet Shil, M

    Md Zubair, MD. Asif Iqbal, Avijeet Shil, M. J. M. Chowdhury, Mohammad Ali Moni, and Iqbal H. Sarker. An improved k-means clustering algo- rithm towards an efficient data-driven modeling. Annals of Data Science, 11(5):1525–1544, 2024. 9