pith. sign in

arxiv: 2606.28655 · v1 · pith:JFIXOMLQnew · submitted 2026-06-27 · 🪐 quant-ph · cs.LG· q-bio.BM

Exploring the Effects of Entanglement on Quantum Machine Learning of Pathogen Epitope-Receptor Binding

Pith reviewed 2026-06-30 10:10 UTC · model grok-4.3

classification 🪐 quant-ph cs.LGq-bio.BM
keywords quantum machine learningentanglementfeature mapepitope bindingquantum neural networkoverfittinghybrid QNNPRRS
0
0 comments X

The pith

High-entanglement ZZ feature map reduces training overfit in hybrid QNN for epitope binding classification while keeping competitive test accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether the number and connectivity of entangling gates in the feature-map stage of a parameterized quantum circuit changes generalization behavior in a hybrid quantum neural network. It applies the models to a fixed task of labeling 9-mer epitopes as strong or weak binders to a receptor, drawn from docking data on Porcine Reproductive and Respiratory Syndrome virus. Four feature-map variants are compared against a classical CNN baseline on an 80-example set split 40:30:30. The all-to-all ZZ map produces the lowest training AUAC and the highest test-to-training AUAC ratio, indicating less overfitting. A reader would care because the result points to entanglement topology as a tunable design choice that can improve generalization on small biological datasets without altering the rest of the workflow.

Core claim

Among the four feature-map configurations tested in the hybrid Embedding-QNN workflow, the high-entanglement all-to-all ZZ feature map yields the lowest training AUAC together with the highest test/training AUAC ratio while maintaining test-set accuracy competitive with both the classical CNN benchmark and the other quantum maps; the paper interprets this pattern as evidence that entanglement topology influences overfitting on this N=80 epitope-receptor binding task.

What carries the argument

The ZZ feature map with all-to-all two-qubit entangling gates placed in the embedding stage before the variational quantum neural network layers.

If this is right

  • Entanglement topology in the feature map functions as an independent design variable that can be adjusted to lower training-set overfit on sparse biological classification problems.
  • The ZZ configuration preserves test accuracy while lowering the training AUAC, implying improved generalization relative to the low-entanglement and non-entangling maps on this task.
  • The same pattern of results would be expected to appear in other small-scale epitope or receptor-binding datasets if the entanglement effect is robust.
  • Further evaluation with noise models or actual hardware runs is required before claiming practical advantage on NISQ devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the advantage persists on larger epitope libraries, entanglement topology could become a standard hyperparameter in quantum screening pipelines for vaccine design.
  • Testing the same maps on datasets with different sequence lengths or binding thresholds would reveal whether the effect is specific to 9-mers or generalizes across molecular representations.
  • Pairing the ZZ map with classical post-processing layers might amplify or cancel the observed generalization benefit.
  • The result leaves open whether an intermediate entanglement density between the tested low and high patterns would produce an even better ratio.

Load-bearing premise

The observed differences in AUAC ratios are caused by the entanglement topology of each feature map rather than by uncontrolled factors such as random seed, optimizer choice, or the specific 40:30:30 split on the N=80 dataset.

What would settle it

Re-running the identical workflow across several independent random seeds and at least two different train-validation-test partitions and finding that the ZZ map no longer produces the highest test/training AUAC ratio would falsify the claim that entanglement topology drives the reduced overfit.

Figures

Figures reproduced from arXiv: 2606.28655 by Aspen Erlandsson Brisebois, Brook Byrns, Christophe Pere, Connor Burbridge, Gordon Broderick, Heather L. Wilson, Luis Pablo Gonzalez Dominguez, Shivansi Prajapati, Steven Rayan, Sureesh Tikoo, Zahed Khatooni.

Figure 1
Figure 1. Figure 1: One-hot encoding scheme applied to the epitope RVPILRTVF Because in vivo measurements and high-fidelity computational screens are resource-intensive, biological datasets of this kind are often sparse and limited in scope. To test performance under a deliberately conservative data regime, we partitioned the 80 example epitopes into training, validation, and test subsets using a 40:30:30 split [Kjeldsberg et… view at source ↗
Figure 2
Figure 2. Figure 2: CNN Classifier Benchmark Model The hybrid QML architecture ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: QML High-Level Architecture Since the one-hot encoding of the 9-mer epitopes produces an input space of 180 elements (9 x 20), we first perform dimensional reduction using a sparse autoencoder to generate a 20 x 1 learned embedding vector [Gallifant et al., 2025]. Though not necessarily optimal this dimensional reduction provides a compact learned representation suitable for loading epitope characteristics… view at source ↗
Figure 8
Figure 8. Figure 8: Training and test-set accuracy for the classical CNN benchmark (blue) versus the Embedding-QNN architecture (orange) under four feature-map configurations after retaining only runs with validation-set accuracy above 85%: Z feature map without feature-map entanglement (A), high-entanglement all-to-all ZZ feature map (B), Z feature map with low-depth interleaved entanglement (C), and Z feature map with high-… view at source ↗
read the original abstract

Parameterized quantum circuits (PQCs) provide a flexible substrate for hybrid quantum machine learning (QML), but their practical value on Noisy Intermediate-Scale Quantum (NISQ) devices remains an empirical question, especially because training depth and scale can introduce optimization challenges such as barren plateaus. Here we study how the number and topology of two-qubit entangling gates in the feature-map stage influence a fixed hybrid QNN workflow for classifying strong versus weak epitope-receptor binding in Porcine Reproductive and Respiratory Syndrome (PRRS) vaccine design. The dataset consists of docking-derived binding affinities for N=80 9-mer epitopes, labeled as Strong or Weak binding, and partitioned into training, validation, and test subsets using a 40:30:30 split. We compare a classical CNN benchmark with a hybrid Embedding-QNN architecture under four feature-map configurations: a non-entangling Z feature map, an all-to-all high-entanglement ZZ feature map, and two interleaved nearest-neighbour entanglement patterns of low and high depth. Among the configurations tested, the high-entanglement ZZ feature map is seen to provide the strongest evidence of reduced training-set overfit, with a lower training area under the accuracy curve (AUAC) and the highest test/training AUAC ratio, while preserving competitive test-set accuracy. These results do not establish a general QML advantage, but they suggest that feature-map entanglement topology is a meaningful design variable for sparse biological screening tasks and warrants further evaluation with additional metrics, larger datasets, and noise-aware or hardware-based experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript empirically compares four feature-map variants (non-entangling Z, high-entanglement all-to-all ZZ, and two interleaved nearest-neighbour patterns) inside a fixed hybrid Embedding-QNN workflow for binary classification of strong vs. weak epitope-receptor binding on an N=80 docking-derived dataset of 9-mer epitopes, using a single 40:30:30 train/val/test split. It reports that the high-entanglement ZZ map yields the lowest training AUAC, the highest test/training AUAC ratio, and competitive test accuracy, suggesting that entanglement topology can mitigate overfitting in this sparse biological screening task without establishing a general QML advantage.

Significance. If the observed AUAC differences can be shown to arise specifically from entanglement topology rather than uncontrolled stochasticity, the result would usefully highlight feature-map design as a controllable variable for NISQ-era QML on small biological datasets. The work supplies concrete, reproducible metrics on an explicit workflow and four variants but does not claim broad superiority over classical methods.

major comments (2)
  1. [Abstract] Abstract and central empirical claim: the attribution of reduced training AUAC and elevated test/training AUAC ratio specifically to the high-entanglement ZZ topology rests on a single 40:30:30 split of N=80 samples with no reported multiple random seeds, fixed-seed sweeps, or k-fold cross-validation. On this scale, PQC training stochasticity (initialization, optimizer path, barren-plateau effects) could produce the observed ordering without any topological cause.
  2. [Abstract] Abstract: no error bars, standard deviations, or statistical significance tests accompany the reported AUAC values or ratios, so it is impossible to assess whether the differences between the four feature maps exceed the variability expected from the small dataset and single partition.
minor comments (2)
  1. [Abstract] The abstract states that results 'do not establish a general QML advantage' yet the title and framing emphasize entanglement effects; a brief clarification of scope in the introduction would help readers.
  2. Notation for AUAC (area under the accuracy curve) is introduced without an explicit definition or reference to how the curve is constructed from the validation or test predictions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We agree with the concerns regarding the statistical robustness of our empirical results and will revise the manuscript to address these issues by incorporating multiple runs and statistical measures.

read point-by-point responses
  1. Referee: [Abstract] Abstract and central empirical claim: the attribution of reduced training AUAC and elevated test/training AUAC ratio specifically to the high-entanglement ZZ topology rests on a single 40:30:30 split of N=80 samples with no reported multiple random seeds, fixed-seed sweeps, or k-fold cross-validation. On this scale, PQC training stochasticity (initialization, optimizer path, barren-plateau effects) could produce the observed ordering without any topological cause.

    Authors: We fully acknowledge this limitation. Our current study used a single data split, and the observed differences could indeed be influenced by training stochasticity. In the revised version, we will conduct experiments with multiple random seeds (at least 5-10) for each feature map, reporting average AUAC values along with standard deviations. This will allow us to better attribute any consistent differences to the entanglement topology rather than random variation. We will also explore the feasibility of k-fold cross-validation given the computational constraints. revision: yes

  2. Referee: [Abstract] Abstract: no error bars, standard deviations, or statistical significance tests accompany the reported AUAC values or ratios, so it is impossible to assess whether the differences between the four feature maps exceed the variability expected from the small dataset and single partition.

    Authors: We agree that the absence of error bars and statistical tests makes it difficult to evaluate the significance of the results. We will update the manuscript to include error bars based on multiple runs and perform statistical significance tests (such as t-tests) between the different feature maps to determine if the observed differences are statistically meaningful. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical measurements only

full rationale

The paper reports direct empirical AUAC values computed on a fixed 40:30:30 split of an N=80 dataset for four explicitly defined feature-map circuits. No derived quantity is obtained by fitting a parameter to one subset and then relabeling a closely related quantity as a prediction, nor is any central result obtained by self-citation to an unverified uniqueness theorem or ansatz. The observed training/test AUAC ratios are computed quantities from the same evaluation procedure applied to each circuit; they do not reduce to the input definitions by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study is purely empirical; it introduces no new mathematical axioms, free parameters beyond standard circuit training, or postulated entities. All modeling choices (dataset labeling, split, AUAC) are conventional and stated in the abstract.

pith-pipeline@v0.9.1-grok · 5865 in / 1255 out tokens · 27370 ms · 2026-06-30T10:10:12.275284+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Exploring the Effects of Entanglement on Quantum Machine Learning of Pathogen Epitope-Receptor Binding Aspen Erlandsson Brisebois1,2, Luis Pablo Gonzalez Dominguez1,3,4, Shivansi Prajapati4,5, Zahed Khatooni1, Heather L. Wilson1, Connor Burbridge6, Brook Byrns6, Sureesh Tikoo1,7, Christophe Pere8, Steven Rayan3,4*, Gordon Broderick1,3,4* 1 Vaccine and Inf...

  2. [2]

    Figure 1: One-hot encoding scheme applied to the epitope RVPILRTVF Because in vivo measurements and high-fidelity computational screens are resource-intensive, biological datasets of this kind are often sparse and limited in scope. To test performance under a deliberately conservative data regime, we partitioned the 80 example epitopes into training, valid...

  3. [3]

    consists of an initial classical feature-embedding stage for dimensional reduction, followed by a parameterized QNN circuit comprising a feature map, a quantum convolutional stage, a quantum pooling stage, a variational ansatz stage, qubit measurement, and a final classical output layer. The design is therefore hybrid throughout: the embedding and output w...

  4. [4]

    While the feature-map configurations vary, the quantum convolutional, pooling, RealAmplitudes ansatz, and classical output components are held constant across all four QNN experiments. The PyTorch [Imambi et al., 2021] Python library was used for classical parameter training and evaluation, with the Qiskit Machine Learning [Sahin et al., 2025] TorchConnect...

  5. [5]

    Node Degree Multiplicity Recip

    Topological properties of entanglement patterns Feature-map configuration Graph Diameter Avg. Node Degree Multiplicity Recip. Directed steps (unique pairs) Z feature map baseline (1 rep; no feature-map entanglement) 0 0.00 0 0 0 ZZ feature map high entanglement (2 reps; all-to-all) 1 4.00 4 0 144 (36) Z feature map + low-depth interleaved entanglement (1 p...

  6. [6]

    Thus, the present data support feature-map entanglement topology as a useful design variable for further study, not as a standalone explanation of broad generalization advantage

    similarly frame the approximation-generalization trade-oX in quantum-information terms, underscoring that finite-data limitations cannot be bypassed merely by choosing a quantum model. Thus, the present data support feature-map entanglement topology as a useful design variable for further study, not as a standalone explanation of broad generalization advan...

  7. [7]

    Generalization in quantum machine learning from few training data

    Caro MC, Huang HY , Cerezo M, Sharma K, Sornborger A, Cincio L, Coles PJ. Generalization in quantum machine learning from few training data. Nature Communications. 2022 Aug 22;13(1):4919. Gil-Fuster E, Eisert J, Bravo-Prieto C. Understanding quantum machine learning also requires rethinking generalization. Nature Communications. 2024 Mar 13;15(1):2277. Ba...

  8. [8]

    Hybrid Quantum Neural Networks for EXicient Protein-Ligand Binding AXinity Prediction

    Jeong SG, Moon KH, Hwang WJ. Hybrid Quantum Neural Networks for EXicient Protein-Ligand Binding AXinity Prediction. arXiv preprint arXiv:2509.11046. 2025 Sep

  9. [9]

    Molecular architecture and dynamics of SARS-CoV-2 envelope by integrative modeling

    Pezeshkian W, Grünewald F, Narykov O, Lu S, Arkhipova V, Solodovnikov A, Wassenaar TA, Marrink SJ, Korkin D. Molecular architecture and dynamics of SARS-CoV-2 envelope by integrative modeling. Structure. 2023 Apr 6;31(4):492-503. Zhang N, Qi J, Feng S, Gao F , Liu J, Pan X, Chen R, Li Q, Chen Z, Li X, Xia C. Crystal structure of swine major histocompatibi...

  10. [10]

    2009 Apr;73(4):307-315

    Tissue Antigens. 2009 Apr;73(4):307-315. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, et al. Accurate structure prediction of biomolecular interactions with AlphaFold

  11. [11]

    2024 Jun 13;630(8016):493-500

    Nature. 2024 Jun 13;630(8016):493-500. Jiang L, Zhang K, Zhu K, Wang Y , Kang Y , Hou T. Revisiting Protein-Protein Docking: A Systematic Evaluation Framework. Journal of Chemical Information and Modeling. 2025 Sep

  12. [12]

    Evaluation of Structure Prediction and Molecular Docking Tools for Therapeutic Peptides in Clinical Use and Trials Targeting Coronary Artery Disease

    Alotaiq N, Dermawan D. Evaluation of Structure Prediction and Molecular Docking Tools for Therapeutic Peptides in Clinical Use and Trials Targeting Coronary Artery Disease. International Journal of Molecular Sciences. 2025 Jan 8;26(2):462. Honorato RV , Trellet ME, Jiménez-García B, Schaarschmidt JJ, Giulini M, Reys V , Koukos PI, Rodrigues JP , Karaca E,...

  13. [13]

    Sparse autoencoder features for classifications and transferability

    Gallifant J, Chen S, Sasse K, Aerts H, Hartvigsen T, Bitterman DS. Sparse autoencoder features for classifications and transferability. arXiv preprint arXiv:2502.11367. 2025 Feb

  14. [14]

    Modeling Feature Maps for Quantum Machine Learning

    Singh N, Pokhrel SR. Modeling Feature Maps for Quantum Machine Learning. arXiv preprint arXiv:2501.08205. 2025 Jan

  15. [15]

    Circuit-centric quantum classifiers

    Schuld M, Bocharov A, Svore KM, Wiebe N. Circuit-centric quantum classifiers. Physical Review A. 2020 Mar;101(3):032308. Havlíček V , Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM. Supervised learning with quantum-enhanced feature spaces. Nature. 2019 Mar 14;567(7747):209-212. Anand A. On the power of interleaved low-depth quantum and cl...

  16. [16]

    p. 87-104. Sahin ME, Altamura E, Wallis O, Wood SP , Dekusar A, Millar DA, Imamichi T, Matsuo A, Mensa S. Qiskit Machine Learning: an open-source library for quantum machine learning tasks at scale on quantum hardware and classical simulators. arXiv preprint arXiv:2505.17756. 2025 May

  17. [17]

    Identifying Protein Co-regulatory Network Logic by Solving B-SAT Problems through Gate-based Quantum Computing

    Powell MJD. An eXicient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal. 1964 Jan 1;7(2):155-162. Li S, Xia Y , Xu Z. Simultaneous perturbation stochastic approximation: towards one-measurement per iteration. Numerical Algorithms. 2023 Nov;94(3):1085-1101. Brisebois AE, Broderick J, Kh...

  18. [18]

    Implementing Grover’s algorithm on the IBM quantum computers

    Mandviwalla A, Ohshiro K, Ji B. Implementing Grover’s algorithm on the IBM quantum computers. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE; 2018 Dec

  19. [19]

    2531-2537

    p. 2531-2537. Abane A, Cubeddu M, Mai VS, Battou A. Entanglement routing in quantum networks: A comprehensive survey. IEEE Transactions on Quantum Engineering. 2025 Feb

  20. [20]

    To Entanglement and Beyond: Explaining Superior Generalizability of Quantum Neural Networks

    Park J. To Entanglement and Beyond: Explaining Superior Generalizability of Quantum Neural Networks. Proceedings of Quantum Techniques in Machine Learning (QTML2024), University of Melbourne, Melbourne, Australia; 2024 Nov 25-29. Acknowledgment This work was supported by the University of Saskatchewan’s Centre for Quantum Topology and Its Applications (qu...