pith. sign in

arxiv: 2602.16623 · v2 · submitted 2026-02-18 · 🪐 quant-ph

Scalable Quantum Machine Learning via Multi-layer Fully-Connected Variational Quantum Circuits

Pith reviewed 2026-05-15 21:09 UTC · model grok-4.3

classification 🪐 quant-ph
keywords variational quantum circuitsquantum machine learningscalable quantum modelsfully-connected architecturelinear parameter scalingtabular regressionPDE approximation
0
0 comments X

The pith

Multi-layer fully-connected variational quantum circuits scale trainable parameters linearly with input dimension while matching or beating monolithic VQCs and matched deep neural networks on regression, classification, and PDE tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Multi-Layer Fully-Connected Variational Quantum Circuits (FC-VQC) to resolve the expressivity-trainability dilemma of standard variational quantum circuits. High-dimensional inputs are partitioned into fixed-size local VQC blocks whose outputs are combined across layers by deterministic mixing rules rather than trainable quantum gates. This keeps every quantum circuit small enough for efficient simulation while making the total number of trainable parameters grow only linearly with input dimension. On tabular regression, tabular classification, and spatio-temporal BSDE/PDE approximation benchmarks, FC-VQC outperforms monolithic single-block VQCs and reaches competitive or better accuracy than structure-matched deep neural networks despite using substantially fewer parameters.

Core claim

FC-VQC decomposes high-dimensional inputs into fixed-size local variational quantum circuit blocks connected by deterministic block-mixing rules. Each quantum computation remains local to its block, and the number of trainable quantum parameters scales linearly with input dimension. Across tabular regression, tabular classification, and spatio-temporal BSDE/PDE approximation tasks, this architecture improves performance over monolithic VQC baselines and achieves competitive or improved performance relative to structure-matched deep neural network baselines while using substantially fewer trainable parameters.

What carries the argument

Multi-layer fully-connected variational quantum circuit (FC-VQC) architecture that partitions inputs into fixed-size local VQC blocks linked by deterministic block-mixing rules to propagate information without extra trainable quantum parameters.

If this is right

  • Trainable parameters grow linearly rather than quadratically or worse with input dimension.
  • Each local quantum block stays small enough for classical simulation and gradient-based optimization on near-term hardware.
  • The same modular pattern delivers gains on both supervised learning tasks and differential-equation approximation.
  • Parameter count drops substantially relative to deep neural networks of matched width and depth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same block-and-fixed-mixing pattern could be applied to other families of quantum circuits to improve their scaling.
  • Success of fixed mixing rules implies that explicit learned cross terms may be less necessary in quantum ML than often assumed.
  • Larger input dimensions become practical on hardware limited to small qubit counts if local blocks fit within available qubits.

Load-bearing premise

The fixed deterministic block-mixing rules preserve enough expressivity to capture cross-block correlations without needing additional trainable quantum parameters or post-processing adjustments.

What would settle it

A dataset in which cross-block correlations are essential and cannot be recovered by the fixed mixing rules, causing FC-VQC accuracy to fall below both monolithic VQCs and matched DNNs even after increasing local block size.

Figures

Figures reproduced from arXiv: 2602.16623 by Chen-Yu Liu, Howard Su, Huan-Hsin Tseng, Kuan-Cheng Chen, Samuel Yen-Chi Chen.

Figure 1
Figure 1. Figure 1: Overview of Scalable VQC Architectures. Each block x (b) is independently processed by a q-qubit variational quantum circuit using the map fΘ defined in Section 2.1. Thus, the input layer performs: h (1,b) = fΘ (0) b (x (b) ), b = 1, . . . , B, (7) where Θ (0) b denotes the trainable parameters associated with block b in the input layer. The output of the input layer is ob￾tained by concatenating the B qua… view at source ↗
Figure 3
Figure 3. Figure 3: Schematic plot of 9t3t1 2.3.3. OUTPUT LAYER The output layer transforms the final hidden representation into the model prediction. As in the preceding layers, this transformation is implemented entirely by VQCs; no classi￾cal affine mappings and activation functions are introduced. Let H(L) denote the output of the last hidden layer. De￾pending on the desired output dimension, we consider two cases: Dimens… view at source ↗
Figure 4
Figure 4. Figure 4: Gradient Dynamics on Concrete (SingleVQC 8). The grid displays gradient variance over training epochs. Rows correspond to Stacking Layers L ∈ {1, 3, 5, 7, 9} and Columns to Internal Depth K ∈ {1, 3, 5, 7, 9}. Row 1 (L = 1) represents the monolithic Type 1 baseline, which immediately succumbs to barren plateaus (vanishing variance). Rows 2–5 (Type 2) show that simply stacking monolithic blocks fails to stab… view at source ↗
Figure 5
Figure 5. Figure 5: Gradient Dynamics on Concrete (Type 3: 8t3t1). Grid layout: Rows L ∈ {1..9}, Columns K ∈ {1..9}. Unlike the SingleVQC baseline, this modular fully-connected architecture begins to show signs of trainability. While extremely shallow blocks (K = 1, Column 1) still face barren plateaus, deeper configurations (K ≥ 3) combined with stacking (L ≥ 3) exhibit non-zero, healthy gradient variance. 29 [PITH_FULL_IMA… view at source ↗
Figure 6
Figure 6. Figure 6: Gradient Dynamics on Concrete (Type 4: 16t4t1). Grid layout: Rows L ∈ {1..9}, Columns K ∈ {1..9}. With feature expansion and slightly larger blocks (q = 4), the “healthy” regime expands. Note that the variance in the lower-right quadrant (High L, High K) stabilizes significantly compared to Type 1/2, avoiding exponential decay. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Gradient Dynamics on Concrete (Type 4: 32t11t4t1). Grid layout: Rows L ∈ {1..9}, Columns K ∈ {1..9}. This architecture, which achieves parity with classical gradient boosting, displays robust training dynamics. The plots for K ≥ 3 show sustained gradient variance throughout the training process, confirming that the high test accuracy is supported by a trainable optimization landscape. 31 [PITH_FULL_IMAGE:… view at source ↗
Figure 8
Figure 8. Figure 8: Gradient Dynamics on Concrete (Type 4: 40t14t5t1). Grid layout: Rows L ∈ {1..9}, Columns K ∈ {1..9}. Even with the highest parameter count and depth, the modular structure prevents barren plateaus. The gradients in the deep regimes (L ≥ 5, K ≥ 5) remain healthy and oscillatory, validating the scalability of the FC-VQCs. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗
read the original abstract

Variational Quantum Circuits (VQC) are promising models for quantum machine learning, but standard monolithic architectures face an expressivity--trainability dilemma: small circuits can be under-parameterized, while larger circuits are difficult to simulate and optimize. We propose Multi-Layer Fully-Connected Variational Quantum Circuits (FC-VQC), a modular framework that decomposes high-dimensional inputs into fixed-size local VQC blocks connected by deterministic block-mixing rules. This design keeps each quantum computation local while allowing the number of trainable quantum parameters to scale linearly with input dimension. We evaluate FC-VQC across tabular regression, tabular classification, and spatio-temporal BSDE/PDE approximation. Across the evaluated tasks, FC-VQC improves over monolithic VQC baselines and achieves competitive or improved performance relative to structure-matched deep neural network (DNN) baselines, while using substantially fewer trainable parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Multi-Layer Fully-Connected Variational Quantum Circuits (FC-VQC), a modular architecture that decomposes high-dimensional inputs into fixed-size local VQC blocks linked by deterministic block-mixing rules. This yields linear scaling of trainable quantum parameters with input dimension while targeting improved performance over monolithic VQCs and competitiveness with structure-matched DNNs on tabular regression, classification, and spatio-temporal BSDE/PDE tasks.

Significance. If the empirical gains hold under rigorous controls, the work would demonstrate a practical route to scaling VQCs without exponential parameter growth or barren-plateau issues, offering a hybrid quantum-classical model with substantially lower parameter counts than matched DNNs. The linear-scaling claim and explicit comparison to DNN baselines are the strongest potential contributions, provided the mixing rules demonstrably propagate cross-block correlations.

major comments (3)
  1. [§3.2] §3.2 (Block-Mixing Rules): The deterministic mixing (fixed permutations, summations, or tensor contractions) is presented as sufficient to recover data-dependent cross-block correlations without extra trainable parameters. No proof or ablation is given showing that this static wiring preserves expressivity for tasks with non-local dependencies (e.g., BSDE/PDE); if the mixing is data-independent, the overall circuit may reduce to disconnected local blocks, undermining both the performance claim over monolithic VQCs and the linear-scaling advantage.
  2. [§4] §4 (Experimental Setup): The abstract and results claim competitive or superior performance versus DNN baselines with far fewer parameters, yet no full specification of baseline architectures, hyperparameter matching, error bars, data exclusion criteria, or statistical tests is provided. Without these, the central empirical claim cannot be evaluated and the “substantially fewer trainable parameters” comparison remains unverified.
  3. [Tables 2-3] Table 2 / Table 3 (Performance Metrics): Reported improvements lack standard deviations across runs and any analysis of variance; for the BSDE/PDE tasks the reported gains could be within noise, weakening the claim that FC-VQC is competitive with DNNs while using linear parameters.
minor comments (2)
  1. [Eq. (7)] Notation for the mixing operator (e.g., Eq. (7)) is introduced without an explicit definition of its action on the quantum state vector; a short matrix or circuit-diagram expansion would clarify the claim of locality.
  2. [Introduction] The manuscript cites prior VQC scaling work but omits direct comparison to recent tensor-network or hybrid quantum-classical approaches that also target linear parameter scaling.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, providing clarifications and committing to revisions that strengthen the empirical and theoretical support for FC-VQC.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Block-Mixing Rules): The deterministic mixing (fixed permutations, summations, or tensor contractions) is presented as sufficient to recover data-dependent cross-block correlations without extra trainable parameters. No proof or ablation is given showing that this static wiring preserves expressivity for tasks with non-local dependencies (e.g., BSDE/PDE); if the mixing is data-independent, the overall circuit may reduce to disconnected local blocks, undermining both the performance claim over monolithic VQCs and the linear-scaling advantage.

    Authors: The mixing rules are data-independent by design, but they operate across successive layers on quantum states that already encode data-dependent information from prior blocks. This layered propagation enables cross-block correlations to accumulate, analogous to how fixed-weight connections in classical fully-connected networks still permit rich feature interactions. We agree that the manuscript would benefit from stronger evidence and will add (i) an ablation comparing FC-VQC variants with and without inter-block mixing on the BSDE/PDE tasks and (ii) a concise theoretical argument showing that the multi-layer composition preserves the ability to represent non-local functions. These additions will demonstrate that the architecture does not collapse to disconnected local blocks. revision: yes

  2. Referee: [§4] §4 (Experimental Setup): The abstract and results claim competitive or superior performance versus DNN baselines with far fewer parameters, yet no full specification of baseline architectures, hyperparameter matching, error bars, data exclusion criteria, or statistical tests is provided. Without these, the central empirical claim cannot be evaluated and the “substantially fewer trainable parameters” comparison remains unverified.

    Authors: We accept that §4 currently lacks sufficient detail for independent verification. In the revised manuscript we will expand the experimental setup to report: (a) exact DNN architectures (layer widths, depths, activations, and initialization), (b) the hyperparameter search protocol and final values used for both FC-VQC and DNNs, (c) the number of independent runs and how error bars were computed, (d) data preprocessing, splitting, and any exclusion criteria, and (e) the statistical tests applied to compare methods. These additions will make the parameter-efficiency and performance claims fully reproducible. revision: yes

  3. Referee: [Tables 2-3] Table 2 / Table 3 (Performance Metrics): Reported improvements lack standard deviations across runs and any analysis of variance; for the BSDE/PDE tasks the reported gains could be within noise, weakening the claim that FC-VQC is competitive with DNNs while using linear parameters.

    Authors: We agree that standard deviations and variance analysis are essential. We will revise Tables 2 and 3 to include standard deviations computed over the multiple independent runs already performed. We will also add a short discussion of observed variance, with particular attention to the BSDE/PDE tasks, and will qualify performance claims where differences fall within statistical noise. If appropriate, we will report p-values or confidence intervals to support the competitiveness statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical evaluation

full rationale

The paper introduces FC-VQC as a modular architecture with deterministic block-mixing and reports performance gains via direct experimental comparisons against monolithic VQCs and DNN baselines. No derivation chain is presented that reduces a claimed result to its own inputs by construction, self-citation, or fitted-parameter renaming. The linear scaling and expressivity claims are architectural design choices validated empirically rather than derived from equations that presuppose the target outcome. This is the standard non-circular case for an empirical proposal paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The architecture rests on standard variational quantum circuit assumptions and quantum mechanics; the abstract introduces no new free parameters, axioms beyond domain standards, or invented entities.

axioms (1)
  • standard math Standard assumptions underlying variational quantum circuits and quantum state evolution
    The framework extends existing VQC models without stating new foundational axioms.

pith-pipeline@v0.9.0 · 5459 in / 1213 out tokens · 49554 ms · 2026-05-15T21:09:05.305482+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    C., Endo, S., Fujii, K., McClean, J

    Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S. C., Endo, S., Fujii, K., McClean, J. R., Mitarai, K., Yuan, X., Cincio, L., et al. Variational quantum algorithms.Nature Reviews Physics, 3(9):625–644, 2021a. Cerezo, M., Sone, A., V olkoff, T., Cincio, L., and Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits...

  2. [2]

    J., Cha, J., Chen, S

    Park, J. J., Cha, J., Chen, S. Y .-C., Tseng, H.-H., and Yoo, S. Addressing the current challenges of quantum machine learning through multi-chip ensembles.arXiv preprint arXiv:2505.08782,

  3. [3]

    Y .-C., Chen, P.-Y ., Zenil, H., and Tegner, J

    Qi, J., Yang, C.-H., Chen, S. Y .-C., Chen, P.-Y ., Zenil, H., and Tegner, J. Leveraging pre-trained neural networks to enhance machine learning with variational quantum circuits.arXiv preprint arXiv:2411.08552,

  4. [4]

    Physical Review Letters 85(10), 2200–2203 (2000)

    doi: 10.1103/PhysRevLett. 131.100803. Schuld, M., Bocharov, A., Svore, K. M., and Wiebe, N. Circuit-centric quantum classifiers.Physical Review A, 101(3):032308,

  5. [5]

    and Tseng, H.-H

    9 Multi-Layer Fully-Connected Variational Quantum Circuits Su, H. and Tseng, H.-H. On quantum BSDE solver for high- dimensional parabolic PDEs. InProceedings of the 2025 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 205–210. IEEE,

  6. [6]

    Experimental Setup This appendix provides detailed specifications for the datasets, baseline models, and training protocols used in our experiments

    10 Multi-Layer Fully-Connected Variational Quantum Circuits A. Experimental Setup This appendix provides detailed specifications for the datasets, baseline models, and training protocols used in our experiments. Comprehensive lists of hyperparameters and architectural topologies are summarized in Table 11 and Table 12, respectively. A.1. Standard Benchmar...

  7. [7]

    NX i=1 MX k=0 fθ(X(i) tk )−Y (i) tk 2 (26) We utilize the Adam optimizer for gradient descent. A.2.4. VALIDATIONMETRIC: PORTFOLIORELATIVEMAE During validation, the model’s accuracy is assessed using the Portfolio Relative Mean Absolute Error (RelMAE) against the analytical Black-Scholes solution (Black & Scholes, 1973). Unlike component-wise metrics, this...

  8. [8]

    The crucial property islocality: (gsw(H)) b depends only on the neighborhood{b−r,

    blocks back to one block (e.g., concatenation followed by a fixed linear projection, or averaging, etc.). The crucial property islocality: (gsw(H)) b depends only on the neighborhood{b−r, . . . , b+r}. Theorem C.3(Receptive-field growth under sliding-window mixing).Consider the recursion (49) with g(l) ≡g sw satisfying the locality property(51)for radiusr...

  9. [9]

    1 0.3925 0.5970 0.6280 0.6863 3 0.6176 0.7585 0.7231 0.7589 5 0.6418 0.7255 0.7603 0.7410 7 0.5469 0.5005 0.7244 0.7105 9 0.59300.77570.6484 0.7112 8t3t1 1 0.7623 0.7816 0.7767 0.7593 3 0.7922 0.8134 0.8284 0.8222 5 0.8051 0.8096 0.8380 0.7929 7 0.7775 0.8201 0.75590.8446 9 0.7955 0.7378 0.8247 0.7619 16t4t1 1 0.8276 0.8359 0.8645 0.8678 3 0.8636 0.8507 0...

  10. [10]

    1 0.8743 0.8988 0.9084 0.8951 3 0.8706 0.8979 0.9028 0.8539 5 0.8571 0.9202 0.8870 0.9094 7 0.84080.92390.8941 0.8667 9 0.8551 0.9051 0.8822 0.8558 40t14t5t1 (Type

  11. [11]

    For DNN,Kis Hidden Layers

    1 0.8853 0.88470.90280.8914 3 0.8940 0.8772 0.8898 0.9122 5 0.8801 0.8906 0.8959 0.8607 7 0.8861 0.8854 0.9004 0.8741 9 0.8594 0.8716 0.8681 0.8751 Note: For Quantum models,Kis VQC Circuit Depth. For DNN,Kis Hidden Layers. For XGBoost/CatBoost,Kis Tree Depth. 20 Multi-Layer Fully-Connected Variational Quantum Circuits Table 6.Concrete Compressive Strength...

  12. [12]

    1 0.8796 0.8786 0.8848 0.8763 3 0.8696 0.8929 0.8878 0.8941 5 0.8963 0.8848 0.87490.9052 7 0.8404 0.8930 0.8616 0.7878 9 0.8688 0.8402 0.8919 0.8605 24t5t1 (q=

  13. [13]

    1 0.8424 0.88650.89960.8876 3 0.8388 0.8783 0.8631 0.8290 5 0.8180 0.7818 0.7817 0.8283 7 0.8377 0.8344 0.7835 0.8628 9 0.8309 0.8256 0.8232 0.8503 24t8t3t1 Parallel (q=

  14. [14]

    •24t5t1:Fully Connected Block Mixing(5 blocks of 5–qubit–VQC)

    1 0.8349 0.8469 0.8618 0.8356 3 0.8496 0.8118 0.8481 0.8679 5 0.8410 0.8399 0.7976 0.8295 7 0.8134 0.80730.87030.8118 9 0.8262 0.8267 0.8204 0.7854 Mixing Strategies: •24t8t3t1:Sliding Window Block Mixing(8 blocks of 3–qubit–VQC). •24t5t1:Fully Connected Block Mixing(5 blocks of 5–qubit–VQC). •24t8t3t1 Parallel:Parallel Block Mixing(8 blocks of 3–qubit–VQ...

  15. [15]

    1 0.4542 0.5875 0.56670.6000 3 0.5667 0.6000 0.5625 0.6000 5 0.5750 0.60830.62500.6125 7 0.6000 0.6125 0.6208 0.6167 9 0.5750 0.5917 0.62080.6250 12t8t6 (Type

  16. [16]

    1 0.6042 0.6208 0.6000 0.6083 3 0.5833 0.6167 0.61670.6333 5 0.6292 0.60420.63330.6208 7 0.5917 0.6125 0.5792 0.5917 9 0.5500 0.6000 0.6292 0.5750 22t8t6 (Type

  17. [17]

    1 0.6000 0.6083 0.6000 0.5792 3 0.5458 0.5708 0.5875 0.5958 5 0.6000 0.6000 0.6125 0.6083 7 0.5875 0.5958 0.5833 0.6125 9 0.5917 0.57500.62500.5875 33t12t8t6 (Type

  18. [18]

    1 0.6167 0.6333 0.6125 0.6000 3 0.6000 0.6250 0.5917 0.6375 5 0.5917 0.6042 0.6000 0.6000 7 0.5875 0.6542 0.5792 0.6542 9 0.6083 0.5958 0.59170.6583 44t15t10t8t6 (Type

  19. [19]

    For DNN,Kis Hidden Layers

    1 0.6042 0.5917 0.6125 0.5875 3 0.5750 0.6000 0.6125 0.6250 5 0.6083 0.5792 0.5667 0.5583 7 0.5958 0.5958 0.6042 0.5667 9 0.5458 0.6042 0.55000.6458 Note: For Quantum models,Kis VQC Circuit Depth. For DNN,Kis Hidden Layers. For XGBoost/CatBoost,Kis Tree Depth. 22 Multi-Layer Fully-Connected Variational Quantum Circuits Table 8.Option Portfolio Valuation o...

  20. [20]

    (Boldindicates best performance). Model DepthK= 3DepthK= 5DepthK= 7DepthK= 9 DNN 0.0354 0.0365 0.0355 0.0348 CatBoost 0.0190 0.01820.01770.0194 XGBoost0.01770.0199 0.0261 0.0318 QNN Q3 0.0184 0.0196 0.0188 0.0176 QNN Q3 Parallel 0.0177 0.0176 0.01830.0171 23 Multi-Layer Fully-Connected Variational Quantum Circuits Table 10.Option Portfolio Valuation of300...

  21. [21]

    (Boldindicates best performance). Model DepthK= 3DepthK= 5DepthK= 7DepthK= 9 DNN 0.0274 0.0274 0.0278 0.0265 CatBoost 0.0249 0.0231 0.02080.0188 XGBoost 0.0226 0.0191 0.0213 0.0230 QNN Q3 0.0132 0.0125 0.0135 0.0120 QNN Q3 Parallel 0.0125 0.0118 0.01240.0107 24 Multi-Layer Fully-Connected Variational Quantum Circuits Table 11.Experimental setup and hyperp...