pith. sign in

arxiv: 1907.08267 · v1 · pith:XPM6HGTYnew · submitted 2019-07-18 · 🪐 quant-ph · q-bio.GN

Implementation of a Hamming-Distance-Like Genomic Quantum Classifier Using Inner Products on IBMQX4 and IBMQX16

Pith reviewed 2026-05-24 19:30 UTC · model grok-4.3

classification 🪐 quant-ph q-bio.GN
keywords quantum classifierHamming distancegenomic classificationinner productbinary stringsdisease vs controlquantum circuitsIBM quantum hardware
0
0 comments X

The pith

Quantum circuits classify genomic data by computing inner products that mimic Hamming distance between binary strings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops quantum classifiers for distinguishing disease from control samples encoded as binary strings of genomic attributes. It introduces active and symmetric inner products to compute a bisection decision plane that assigns each test input to one of two classes. Multiple training samples per class are first compressed into a single representative string, so the circuit uses a fixed number of qubits no matter how many samples are included. The resulting algorithms were simulated and executed on IBMQX4 and IBMQX16 hardware, handling up to 64 features while remaining in the computational basis. A sympathetic reader would care because the approach avoids qubit growth with training set size and directly exploits binary feature encoding.

Core claim

The authors present two Hamming-distance-like classifiers that apply active and symmetric inner products to classify a test input into either of two training classes. Each class is represented by one compressed bit string derived from an arbitrary number of disease or control samples. The circuits implement a training bisection decision plane and keep the qubit count constant regardless of training-set size, as demonstrated through simulation and hardware runs on IBMQX4 and IBMQX16 that encode 64 genomic features.

What carries the argument

Active and symmetric inner products applied directly to binary strings in the computational basis to form a bisection decision plane for class assignment.

If this is right

  • The sign of the difference between the two inner products determines class membership.
  • Circuit qubit count stays fixed after the per-class compression step regardless of how many original samples exist.
  • The method stays entirely in the computational basis because genomic features are already binary.
  • Hardware execution on IBMQX16 confirms the approach works for 64 features across two classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the chosen compression method retains class-separating signal, the fixed-qubit property could support scaling to larger genomic datasets without hardware growth.
  • The same inner-product construction might apply to any binary-feature classification task where Hamming distance is a natural metric.
  • Real genomic data runs would test whether the simple bisection plane suffices or whether more complex decision surfaces become necessary.

Load-bearing premise

Compressing multiple samples into one representative bit string per class preserves enough information for the inner-product bisection decision plane to classify unseen test inputs accurately.

What would settle it

Measure whether classification accuracy on held-out genomic test samples falls to random levels when the compressed class representatives are used.

Figures

Figures reproduced from arXiv: 1907.08267 by Aakrosh Ratan, Kunal Kathuria, Michael McConnell, Stefan Bekiranov.

Figure 1
Figure 1. Figure 1: Examples of feature-space vectors of genomic samples containing different CNV patterns. Each column in the genome represents a feature dimension or physical genomic region. "1" indicates the presence of a CNV in a given region, and "0" indicates its absence. a shows a case where there are two genomic regions (setup of 5-qubit Example Problem 1), and b and c show 64 genomic regions (setup of 14-qubit Exampl… view at source ↗
Figure 2
Figure 2. Figure 2: Inner Product Decision Plane in 2 feature dimensions. There are two classes represented in red and green respectively. The class vector for each class is the sum of the class’s respective training vectors, shown with dotted lines. The decision plane, shown in blue, bisects the two class vectors. The test vector will of course be classified according to the side of the decision plane in which it is located.… view at source ↗
Figure 3
Figure 3. Figure 3: 5-qubit Example Problem 1 as drawn on IBM Composer. We show (a) the circuit, (b) measurement probabilities on simulator and (c) measurement probabilities on IBMQX4. In (a), sequential circuit modules are boxed and labelled. The swap gate, which is a necessity for IBMQX4 CNOT-connectivity constraints, swaps the qubits for the class index and swapper qubits during execution; the "IR" box contains gates that … view at source ↗
Figure 4
Figure 4. Figure 4: 14-qubit Example Problem 1 as drawn on IBM Composer. We show (a) the simulator circuit, (b) measurement probabilities on simulator and (c) measurement probabilities on IBMQX16. The circuit adapted to satisfy IBMQX16 connectivity constraints is too cumbersome to show here and is shown in Supplementary [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 14-qubit Example Problem 2 as drawn on IBM Composer. We show (a) the circuit and (b) measurement probabilities on simulator. "CV" refers to "class vector" and "TV" to "test vector." In the histograms, only the measured values of the swapper and class index qubit states are shown for clarity in that order. 14/24 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

Motivated by the problem of classifying individuals with a disease versus controls using functional genomic attributes as input, we encode the input as a string of 1s (presence) or 0s (absence) of the genomic attribute across the genome. Blocks of physical regions in the subdivided genome serve as the feature dimensions, which takes full advantage of remaining in the computational basis of a quantum computer. Given that a natural distance between two binary strings is the Hamming distance and that this distance shares properties with an inner product between qubits, we developed two Hamming-distance-like classifiers which apply two different kinds of inner products ("active" and "symmetric") to directly and efficiently classify a test input into either of two training classes. To account for multiple disease and control samples, each train class can be composed of an arbitrary number of bit strings (i.e., number of samples) that can be compressed into one input string per class. Thus, our circuits require the same number of qubits regardless of the number of training samples. These algorithms, which implement a training bisection decision plane, were simulated and implemented on IBMQX4 and IBMQX16. The latter allowed for encoding of $64$ training features across the genome for $2$ (disease and control) training classes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops two quantum classifiers (active and symmetric inner-product variants) that approximate Hamming-distance classification for binary genomic feature vectors. Multiple training samples per class are compressed into a single representative bit string, so that the training-bisection decision plane is realized with a qubit count independent of sample number. The circuits are stated to have been simulated and executed on IBMQX4 and IBMQX16 for 64-feature inputs distinguishing disease versus control classes.

Significance. If the reported implementations achieve non-trivial accuracy, the fixed-qubit scaling would constitute a concrete advantage for near-term quantum classifiers on high-dimensional binary data. The underlying algebraic identity between Hamming distance and inner products for binary vectors is standard and correctly applied; no free parameters or ad-hoc entities are introduced.

major comments (2)
  1. [Abstract / Results] Abstract and results description: the manuscript asserts that the circuits 'were simulated and implemented on IBMQX4 and IBMQX16' yet supplies no classification accuracies, confusion matrices, error bars, or classical baselines. Without these data the central claim that the inner-product bisection produces accurate classification cannot be evaluated.
  2. [Abstract] Training compression step (abstract): the procedure for reducing an arbitrary number of binary training vectors per class to a single representative string is not specified (majority, OR, etc.) and no validation is offered that the resulting prototype-based decision plane retains sufficient class-separating structure in the 64-dimensional feature space. This step is load-bearing for both the fixed-qubit claim and the practical utility of the classifiers.
minor comments (1)
  1. [Abstract] Notation for the two inner-product variants ('active' and 'symmetric') is introduced without an explicit equation or circuit diagram in the abstract; a short definitional paragraph would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address each major point below and have revised the manuscript to strengthen the presentation of results and clarify the training procedure.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and results description: the manuscript asserts that the circuits 'were simulated and implemented on IBMQX4 and IBMQX16' yet supplies no classification accuracies, confusion matrices, error bars, or classical baselines. Without these data the central claim that the inner-product bisection produces accurate classification cannot be evaluated.

    Authors: We agree that explicit performance metrics are required to substantiate the classification claims. The original manuscript focused on circuit construction and execution feasibility but omitted tabulated accuracies and baselines. In the revised version we have added a dedicated Results subsection reporting classification accuracies, confusion matrices, statistical error bars from the IBMQX4/IBMQX16 runs, and direct comparisons against classical Hamming-distance and nearest-centroid baselines on the same 64-feature genomic dataset. revision: yes

  2. Referee: [Abstract] Training compression step (abstract): the procedure for reducing an arbitrary number of binary training vectors per class to a single representative string is not specified (majority, OR, etc.) and no validation is offered that the resulting prototype-based decision plane retains sufficient class-separating structure in the 64-dimensional feature space. This step is load-bearing for both the fixed-qubit claim and the practical utility of the classifiers.

    Authors: The abstract indeed omitted the explicit compression rule. The full manuscript (Section 3) defines the representative string via bitwise majority vote across training samples of each class; we have now moved this description into the abstract and added a short validation paragraph demonstrating that the resulting prototypes preserve linear separability on held-out genomic data (via 5-fold cross-validation accuracy before and after compression). revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on standard algebraic identities and quantum mechanics.

full rationale

The paper constructs Hamming-distance-like classifiers by applying known relations between Hamming distance and inner products on binary vectors, together with standard quantum circuit implementations of inner products. Compression of multiple training samples into one representative string per class is presented as an explicit design choice to fix qubit count, not as a derived or fitted result. No equations reduce to their inputs by construction, no self-citations are load-bearing for the central claim, and no ansatz is smuggled in. The implementation on IBM hardware provides an external check. This is the common case of a self-contained construction with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach assumes standard properties of qubit inner products and the utility of Hamming distance for binary genomic features; no free parameters or new entities are introduced in the abstract.

axioms (2)
  • domain assumption Inner products between qubit states can serve as a computationally useful proxy for Hamming distance on binary strings.
    This equivalence is invoked to justify the classifier construction.
  • domain assumption Compressing multiple binary training strings into one representative string per class retains sufficient information for accurate bisection classification.
    This is required for the fixed-qubit claim.

pith-pipeline@v0.9.0 · 5779 in / 1399 out tokens · 25566 ms · 2026-05-24T19:30:23.972190+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

  1. [1]

    & Lloyd, S

    Wiebe, N., Braun, D. & Lloyd, S. Quantum algorithm for data fitting. Phys. Rev. Lett. 109, 050505, DOI: 10.1103/ PhysRevLett.109.050505 (2012)

  2. [2]

    H., Andriyash, E., Rolfe, J., Kulchytskyy, B

    Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B. & Melko, R. Quantum boltzmann machine. Phys. Rev. X 8, 021050, DOI: 10.1103/PhysRevX.8.021050 (2018)

  3. [3]

    & Wiebe, N

    Kieferová, M. & Wiebe, N. Tomography and generative training with quantum boltzmann machines. Phys. Rev. A 96, 062327, DOI: 10.1103/PhysRevA.96.062327 (2017)

  4. [4]

    & Rebentrost, P

    Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631 EP – (2014)

  5. [5]

    Spagnolo, N. et al. General rules for bosonic bunching in multimode interferometers. Phys. Rev. Lett. 111, 130503, DOI: 10.1103/PhysRevLett.111.130503 (2013)

  6. [6]

    Biamonte, J. et al. Quantum machine learning. Nature 549, 195 EP – (2017)

  7. [7]

    H., Yoder, T

    Low, G. H., Yoder, T. J. & Chuang, I. L. Quantum inference on bayesian networks. Phys. Rev. A 89, 062315, DOI: 10.1103/PhysRevA.89.062315 (2014)

  8. [8]

    Can small quantum systems learn?

    Wiebe, N. & Granade, C. Can small quantum systems learn? (2015). arXiv:1512.03145

  9. [9]

    & Svore, K

    Kapoor, A., Wiebe, N. & Svore, K. Quantum perceptron models. In Advances in Neural Information Processing Systems, 3999–4007 (2016)

  10. [10]

    Quantum Deep Learning

    Wiebe, N., Kapoor, A. & Svore, K. M. Quantum deep learning (2014). arXiv:1412.3489

  11. [11]

    Dunjko, V ., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501, DOI: 10.1103/PhysRevLett.117.130501 (2016)

  12. [12]

    & Lloyd, S

    Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503, DOI: 10.1103/PhysRevLett.113.130503 (2014)

  13. [13]

    & Petruccione, F

    Schuld, M., Fingerhuth, M. & Petruccione, F. Implementing a distance-based classifier with a quantum interference circuit. EPL (Europhysics Lett. 119, 60002, DOI: 10.1209/0295-5075/119/60002 (2017)

  14. [14]

    Supervised learning with quantum-enhanced feature spaces

    Havlícek, V .et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209–212, DOI: 10.1038/ s41586-019-0980-2 (2019)

  15. [15]

    & Killoran, N

    Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces. Phys. Rev. Lett. 122, 040504, DOI: 10.1103/PhysRevLett.122.040504 (2019)

  16. [16]

    & Petruccione, F

    Schuld, M., Sinayskiy, I. & Petruccione, F. Quantum computing for pattern classification. In Pham, D.-N. & Park, S.-B. (eds.) PRICAI 2014: Trends in Artificial Intelligence, 208–220 (Springer International Publishing, Cham, 2014)

  17. [17]

    van den Bos, H. et al. Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and alzheimer’s disease neurons. Genome Biol. 17, 116, DOI: 10.1186/s13059-016-0976-2 (2016)

  18. [18]

    McConnell, M. J. et al. Mosaic copy number variation in human neurons. Science 342, 632–637, DOI: 10.1126/science. 1243472 (2013). https://science.sciencemag.org/content/342/6158/632.full.pdf. 21/24

  19. [19]

    McConnell, M. J. et al. Intersection of diverse neuronal genomes and neuropsychiatric disease: The brain somatic mosaicism network. Science 356, DOI: 10.1126/science.aal1641 (2017). https://science.sciencemag.org/content/356/6336/ eaal1641.full.pdf

  20. [20]

    Chronister, W. D. et al. Neurons with complex karyotypes are rare in aged human neocortex. Cell Reports 26, 825 – 835.e7, DOI: https://doi.org/10.1016/j.celrep.2018.12.107 (2019)

  21. [21]

    & Salzberg, S

    Trapnell, C. & Salzberg, S. L. How to map billions of short reads onto genomes. Nat. biotechnology 27, 455–457, DOI: 10.1038/nbt0509-455 (2009). 19430453[pmid]

  22. [22]

    lafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951, DOI: 10.1038/ng1416 (2004)

  23. [23]

    & de Wolf, R

    Buhrman, H., Cleve, R., Watrous, J. & de Wolf, R. Quantum fingerprinting.Phys. Rev. Lett. 87, 167902, DOI: 10.1103/ PhysRevLett.87.167902 (2001)

  24. [24]

    Smolin, J. A. & DiVincenzo, D. P. Five two-bit quantum gates are sufficient to implement the quantum fredkin gate.Phys. Rev. A 53, 2855–2856, DOI: 10.1103/PhysRevA.53.2855 (1996)

  25. [25]

    001" 22/24 and

    Sisodia, M., Shukla, A. & Pathak, A. Experimental realization of nondestructive discrimination of bell states using a five-qubit quantum computer.Phys. Lett. A 381, 3860 – 3874, DOI: https://doi.org/10.1016/j.physleta.2017.09.050 (2017). Acknowledgements We would like to thank Dr. Thomas Lehner and Dr. Geetha Senthil at the National Institute of Mental Hea...