Effective Dimension Governs Generalization in Quantum Kernel Vision Models

Delu Zeng; Jian Xu; John Paisley; Qibin Zhao

arxiv: 2606.20183 · v1 · pith:US4EKK5Znew · submitted 2026-06-18 · 💻 cs.LG

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

Jian Xu , Delu Zeng , John Paisley , Qibin Zhao This is my paper

Pith reviewed 2026-06-26 18:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords quantum kernelseffective dimensiongeneralizationquantum vision modelsentanglementquantum noisekernel classifiers

0 comments

The pith

The effective dimension of the quantum feature kernel explains why entanglement and noise both improve generalization in quantum vision models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that two separate empirical observations in quantum vision models—better generalization from more uniform entanglement and from injected noise—are both controlled by a single quantity: the effective dimension of the noise-shaped quantum feature kernel. In the overfitting regime where these models are typically trained, shrinking this effective dimension functions as a form of ridge-like regularization that reduces capacity without changing the underlying feature map. The authors supply an exact spectral decomposition for the depolarized kernel, a contraction analysis for amplitude damping, and a capacity-alignment risk bound that together turn the two anecdotes into one measurable design principle. Because the contraction is verified empirically rather than proven for all cases, the result applies directly to the kernel-classifier setting studied.

Core claim

Both the benefit of entanglement structure and the benefit of quantum noise are manifestations of a single measurable quantity: the effective dimension d_eff of the (noise-shaped) quantum feature kernel. In an overfitting regime, contracting d_eff acts as ridge-like regularization. An exact decomposition of the depolarized kernel K_p = (1-p)^2 K + p(2-p)/D 1 1^T shows d_eff(K_p) approaches 1; amplitude damping contracts d_eff and lifts test accuracy by up to +13 percent along an inverted-U curve whose sign flips between over- and under-fitting regimes; a kernel-machine capacity bound and capacity/alignment risk decomposition complete the account.

What carries the argument

The effective dimension d_eff of the (noise-shaped) quantum feature kernel, which entanglement structure and quantum noise both move as control knobs.

If this is right

Along the depolarizing channel the kernel admits an exact closed-form decomposition whose effective dimension collapses to 1 by construction.
Amplitude damping produces a non-monotonic accuracy curve whose peak occurs at an intermediate noise level that matches an explicit spectral-filtering frontier.
The sign of the noise effect reverses when the base model moves from overfitting to underfitting, confirming the regularization interpretation.
A capacity bound derived from the kernel spectrum directly limits the risk gap once d_eff is known.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could target a desired d_eff directly rather than searching over entanglement patterns or noise rates.
The same spectral mechanism may apply to other kernel-based quantum models outside vision, provided they remain in the overfitting regime.
If a general proof of monotone contraction under entanglement were found, the empirical verification step could be removed from future analyses.

Load-bearing premise

The models operate in an overfitting regime where contracting effective dimension improves generalization, and the observed monotone contraction of d_eff under entanglement extends beyond the tested cases.

What would settle it

A controlled experiment that varies entanglement or noise while holding measured d_eff fixed and still observes a change in test accuracy would falsify the claim that d_eff is the governing quantity.

Figures

Figures reproduced from arXiv: 2606.20183 by Delu Zeng, Jian Xu, John Paisley, Qibin Zhao.

**Figure 1.** Figure 1: Overview. A quantum feature map ρ(x) induces a Hilbert–Schmidt kernel whose spectrum is summarized by the effective dimension deff. Entanglement topology and an injected noise channel are two knobs that both move deff; within the entangled regime, generalization is a function of deff alone, and the sign of its effect is set by the bias–variance regime. between regimes; noise injection lands on an explicit … view at source ↗

**Figure 2.** Figure 2: Noise reshapes the quantum feature geometry by contracting the spectrum. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Test accuracy as a function of deff across the 4 × 6 grid (ansatz × noise rate, 25% label noise). Among the three entangled ansatze (chain/ring/all-to-all), points whose deff is set by topology and those whose deff is set by noise collapse onto a single curve (R2=0.92). The unentangled product circuit (gray ×) lies well below the curve at every deff: entanglement is a precondition, after which deff govern… view at source ↗

**Figure 4.** Figure 4: Example images from the three vision benchmarks used (each reduced to [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Real IBM Heron (ibm kawasaki) in an overfitting regime. Left: hardware-measured features shrink toward 0 relative to ideal (slope 0.72), the device-noise contraction. Middle/right: this contracts deff (4.06→3.38) and improves test accuracy (0.863→0.900), realizing the noise-asregularization mechanism on silicon. 10 1 deff (log) 0.650 0.675 0.700 0.725 0.750 0.775 0.800 test accuracy noise = 0.79 Overfitti… view at source ↗

**Figure 6.** Figure 6: The deff–accuracy relation flips sign between regimes. Left (overfitting; expressive circuit, 25% noisy labels, train acc 1.0): contracting deff via noise raises test accuracy (ρ= − 0.79). Right (underfitting; 3-qubit product map, clean labels, train acc 0.85): the same contraction lowers test accuracy (ρ= + 0.91). Arrows mark increasing noise. 4.7 FALSIFICATION: THE deff –ACCURACY SIGN FLIPS BETWEEN REGIM… view at source ↗

**Figure 7.** Figure 7: The four entangling topologies (shown for [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: One data-re-uploading layer (RING entangling block, amplitude damping Aγ). The kernel uses ρ(x) after L such layers via Eq. equation 2. q0 RY RZ(θ0) all-to-all CNOTs Aγ q1 RY RZ(θ1) Aγ q2 RY RZ(θ2) Aγ q3 RY RZ(θ3) Aγ [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: QViT-like trained map (shown on 4 qubits): trainable angle encoding, an all-to-all entangling block, optional injected Aγ, repeated ×L; the per-qubit ⟨Zi⟩ feed a trained linear head. B PROOFS Throughout, ρ(x) ∈ C D×D (D = 2nq ) are density matrices, ϕ(x) = vec ρ(x) ∈ C D2 are their vectorizations, and K ∈ R n×n is the Gram matrix Kij = Tr[ρ(xi)ρ(xj )] = ⟨ϕ(xi), ϕ(xj )⟩ with eigenvalues λ1 ≥ · · · ≥ λn ≥ 0… view at source ↗

**Figure 10.** Figure 10: QCNN-like trained map. Left: two brick-pattern layers of two-qubit convolutions U (here one layer on pairs (0, 1),(2, 3),(4, 5) and the next on (1, 2),(3, 4), with the ring closure (5, 0) omitted for the drawing), optional Aγ, then per-qubit readout into a linear head. Right: the convolution block U(θ). using Tr ρ = 1 and Tr I = D. The last three terms sum to p D [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

read the original abstract

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ties two quantum kernel phenomena to effective dimension via an exact depolarizing decomposition and empirical contraction results, but the entanglement part stays observational.

read the letter

The main takeaway is that effective dimension d_eff of the noise-shaped kernel organizes why more entanglement or certain noise can improve generalization in these vision models. The authors decompose the depolarized kernel exactly as K_p = (1-p)^2 K + p(2-p)/D 1 1^T, show d_eff collapses to 1, and give a capacity/alignment risk split that treats d_eff contraction like ridge regularization in the overfitting regime.

What holds up is the spectral account for the depolarizing family and the amplitude-damping contraction (with an inverted-U accuracy lift up to +13%). Those pieces are concrete and the kernel decomposition checks to machine precision at 12 qubits. The risk bound and alignment term give a clean way to see the regularization effect.

The softer part is the entanglement claim. The monotone contraction of d_eff under entanglement is verified only in the reported experiments, not derived in general. That makes the unification rest on the specific ansatze and data regimes tested rather than a property that follows from the kernel spectrum alone. If those experiments sit in a narrow overfitting window, the story may not travel to other quantum-kernel setups.

The work is aimed at people building or analyzing quantum feature maps for kernel classifiers. It turns two grid-search anecdotes into a measurable design knob, which is useful even if the entanglement mechanism needs a tighter proof. I would send it to review; the exact decomposition and the risk decomposition are solid enough to justify referee time, though the authors should clarify how far the empirical contraction generalizes.

Referee Report

1 major / 2 minor

Summary. The paper claims that the effective dimension d_eff of the (noise-shaped) quantum feature kernel unifies two empirical phenomena in quantum vision models: (i) ansatze with more or more uniform entanglement generalize better, and (ii) certain quantum noise injections improve test accuracy. Both act by contracting d_eff, which provides ridge-like regularization in overfitting regimes. Support includes an exact decomposition of the depolarized kernel K_p = (1-p)^2 K + p(2-p)/D 1 1^T with d_eff(K_p) -> 1, a contraction result (with boundary) for amplitude damping, a kernel-machine capacity bound, a capacity/alignment risk decomposition, and empirical verification (not a general proof) of monotone d_eff contraction under the tested entangled ansatze. Experiments show noise lifting accuracy by up to +13% along an inverted-U, with sign flip between over- and under-fitting regimes.

Significance. If the central claim holds, the work converts two reported anecdotes into a single measurable spectral principle for ansatz and noise design in quantum-kernel vision models. Credit is given for the exact kernel decomposition (confirmed to machine precision at up to 12 qubits), the capacity bound, and the reproducible empirical contraction results along the depolarizing family.

major comments (1)

[Abstract and entanglement-experiment section] Abstract and the entanglement-experiment section: the unification that both entanglement structure and noise act via d_eff contraction rests on the monotone contraction under entanglement being verified empirically rather than derived from a general spectral property; while the paper correctly scopes the claim to the tested ansatze, this makes the load-bearing mechanism dependent on the specific regimes and data rather than a derived property that would extend to the broader class of quantum-kernel vision models.

minor comments (2)

Clarify in the main text how the overfitting regime is identified in each experiment (e.g., via training vs. test gap thresholds) so that the ridge-regularization interpretation can be directly checked against the reported accuracy curves.
The capacity/alignment risk decomposition is central; ensure the precise statement of the bound (including any constants or assumptions on the feature map) is stated explicitly rather than referenced only in passing.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, the positive assessment of the exact kernel decomposition and capacity bound, and the recommendation for minor revision. We respond to the single major comment below.

read point-by-point responses

Referee: [Abstract and entanglement-experiment section] Abstract and the entanglement-experiment section: the unification that both entanglement structure and noise act via d_eff contraction rests on the monotone contraction under entanglement being verified empirically rather than derived from a general spectral property; while the paper correctly scopes the claim to the tested ansatze, this makes the load-bearing mechanism dependent on the specific regimes and data rather than a derived property that would extend to the broader class of quantum-kernel vision models.

Authors: We agree that the monotone contraction of d_eff under the entangled ansatze is verified empirically for the tested families rather than derived from a general spectral property. The manuscript already states this limitation explicitly in the abstract and main text: 'the monotone contraction operative in our entangled experiments is verified empirically, not proven in general.' The unification claim is therefore scoped to the quantum-kernel vision models, ansatze, and overfitting regimes studied, where both entanglement structure and noise are shown (via the exact depolarizing decomposition, amplitude-damping contraction, capacity bound, and empirical results) to contract d_eff and act as ridge-like regularization. While a general spectral theorem applicable to arbitrary ansatze would be desirable, the present work converts the two empirical phenomena into a single measurable quantity within the considered class. No further revision to the scoping language appears necessary. revision: no

Circularity Check

0 steps flagged

No significant circularity; explicit decompositions and stated empirical verification keep derivation self-contained

full rationale

The paper supplies an exact kernel decomposition K_p = (1-p)^2 K + p(2-p)/D 1 1^T with d_eff(K_p) -> 1 for the depolarizing channel (used only to verify the decomposition to machine precision, not as evidence for d_eff), a contraction result with boundary for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition. It explicitly states that the monotone contraction of d_eff under entanglement 'is verified empirically, not proven in general.' No quoted step reduces a claimed prediction or uniqueness result to a fitted input, self-citation, or definitional equivalence. The unification of entanglement and noise effects under d_eff is therefore organized from these independent analyses plus experiments rather than forced by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the definition of effective dimension, the exact kernel decomposition under depolarizing noise, and the empirical observation of monotone contraction under amplitude damping. Full list of free parameters and axioms cannot be extracted from the abstract alone.

axioms (1)

standard math Exact decomposition of the depolarized kernel K_p = (1-p)^2 K + p(2-p)/D 1 1^T
Stated directly in the abstract as the basis for d_eff contraction.

pith-pipeline@v0.9.1-grok · 5893 in / 1262 out tokens · 37324 ms · 2026-06-26T18:03:03.086840+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references

[1]

Nature computational science , volume=

The power of quantum neural networks , author=. Nature computational science , volume=. 2021 , publisher=

2021
[2]

International Workshop on Efficient Medical Artificial Intelligence , pages=

From O (n 2) to O (n) parameters: Quantum self-attention in vision transformers for biomedical image classification , author=. International Workshop on Efficient Medical Artificial Intelligence , pages=. 2025 , organization=

2025
[3]

Quantum , volume=

Quantum vision transformers , author=. Quantum , volume=
[4]

Nature Physics , volume=

Quantum convolutional neural networks , author=. Nature Physics , volume=. 2019 , publisher=

2019
[5]

Advances in neural information processing systems , volume=

On kernel-target alignment , author=. Advances in neural information processing systems , volume=
[6]

Nature , volume=

Supervised learning with quantum-enhanced feature spaces , author=. Nature , volume=. 2019 , publisher=

2019
[7]

Nature communications , volume=

Power of data in quantum machine learning , author=. Nature communications , volume=. 2021 , publisher=

2021
[8]

Scientific Reports , volume=

Hybrid quantum-classical-quantum convolutional neural networks , author=. Scientific Reports , volume=. 2025 , publisher=

2025
[9]

arXiv preprint arXiv:2505.05957 , year=

Efficient quantum convolutional neural networks for image classification: Overcoming hardware constraints , author=. arXiv preprint arXiv:2505.05957 , year=

arXiv
[10]

arXiv preprint arXiv:2101.11020 , year=

Supervised quantum machine learning models are kernel methods , author=. arXiv preprint arXiv:2101.11020 , year=

arXiv
[11]

Physical review letters , volume=

Quantum machine learning in feature Hilbert spaces , author=. Physical review letters , volume=. 2019 , publisher=

2019
[12]

Quantum Information Processing , volume=

Quantum convolutional neural networks for multiclass image classification , author=. Quantum Information Processing , volume=. 2024 , publisher=

2024
[13]

Advanced Quantum Technologies , volume=

Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms , author=. Advanced Quantum Technologies , volume=. 2019 , publisher=

2019
[14]

arXiv preprint arXiv:2510.12291 , year=

Hybrid Vision Transformer and Quantum Convolutional Neural Network for Image Classification , author=. arXiv preprint arXiv:2510.12291 , year=

arXiv
[15]

Hybrid quantum inception-inspired convolutional neural network for image classification: W. Wu, Y. Zhang , author=. The Journal of Supercomputing , volume=. 2025 , publisher=

2025
[16]

arXiv preprint arXiv:2504.02730 , year=

Hqvit: Hybrid quantum vision transformer for image classification , author=. arXiv preprint arXiv:2504.02730 , year=

arXiv

[1] [1]

Nature computational science , volume=

The power of quantum neural networks , author=. Nature computational science , volume=. 2021 , publisher=

2021

[2] [2]

International Workshop on Efficient Medical Artificial Intelligence , pages=

From O (n 2) to O (n) parameters: Quantum self-attention in vision transformers for biomedical image classification , author=. International Workshop on Efficient Medical Artificial Intelligence , pages=. 2025 , organization=

2025

[3] [3]

Quantum , volume=

Quantum vision transformers , author=. Quantum , volume=

[4] [4]

Nature Physics , volume=

Quantum convolutional neural networks , author=. Nature Physics , volume=. 2019 , publisher=

2019

[5] [5]

Advances in neural information processing systems , volume=

On kernel-target alignment , author=. Advances in neural information processing systems , volume=

[6] [6]

Nature , volume=

Supervised learning with quantum-enhanced feature spaces , author=. Nature , volume=. 2019 , publisher=

2019

[7] [7]

Nature communications , volume=

Power of data in quantum machine learning , author=. Nature communications , volume=. 2021 , publisher=

2021

[8] [8]

Scientific Reports , volume=

Hybrid quantum-classical-quantum convolutional neural networks , author=. Scientific Reports , volume=. 2025 , publisher=

2025

[9] [9]

arXiv preprint arXiv:2505.05957 , year=

Efficient quantum convolutional neural networks for image classification: Overcoming hardware constraints , author=. arXiv preprint arXiv:2505.05957 , year=

arXiv

[10] [10]

arXiv preprint arXiv:2101.11020 , year=

Supervised quantum machine learning models are kernel methods , author=. arXiv preprint arXiv:2101.11020 , year=

arXiv

[11] [11]

Physical review letters , volume=

Quantum machine learning in feature Hilbert spaces , author=. Physical review letters , volume=. 2019 , publisher=

2019

[12] [12]

Quantum Information Processing , volume=

Quantum convolutional neural networks for multiclass image classification , author=. Quantum Information Processing , volume=. 2024 , publisher=

2024

[13] [13]

Advanced Quantum Technologies , volume=

Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms , author=. Advanced Quantum Technologies , volume=. 2019 , publisher=

2019

[14] [14]

arXiv preprint arXiv:2510.12291 , year=

Hybrid Vision Transformer and Quantum Convolutional Neural Network for Image Classification , author=. arXiv preprint arXiv:2510.12291 , year=

arXiv

[15] [15]

Hybrid quantum inception-inspired convolutional neural network for image classification: W. Wu, Y. Zhang , author=. The Journal of Supercomputing , volume=. 2025 , publisher=

2025

[16] [16]

arXiv preprint arXiv:2504.02730 , year=

Hqvit: Hybrid quantum vision transformer for image classification , author=. arXiv preprint arXiv:2504.02730 , year=

arXiv