arxiv: 2604.20701 · v1 · submitted 2026-04-22 · 🪐 quant-ph

Recognition: unknown

Divide-and-Conquer Neural Network Surrogates for Quantum Sampling: Accelerating Markov Chain Monte Carlo in Large-Scale Constrained Optimization Problems

Yuya Kawamata , Yuichiro Nakano , Keisuke Fujii

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:42 UTC · model grok-4.3

classification 🪐 quant-ph

keywords Markov Chain Monte CarloQuantum Approximate Optimization AlgorithmNeural Network SurrogatesConstrained SamplingFixed Hamming WeightIsing ModelsNISQ DevicesDivide-and-Conquer

0 comments

The pith

Neural network surrogates trained on QAOA subgraph samples accelerate MCMC mixing by factors of 20 in large fixed-Hamming-weight problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a divide-and-conquer framework that breaks large Ising graphs into subgraphs, generates quantum samples from each using QAOA with an XY mixer, and trains neural networks conditioned on local Hamming weight to supply proposal moves for MCMC. These proposals preserve the global fixed-Hamming-weight constraint while producing faster autocorrelation decay than classical pair-flip methods. A reader would care because many real-world sampling tasks, from Boltzmann sampling on regular graphs to feature selection in machine learning, are slowed by poor mixing under constraints; the reported speedups that grow with system size suggest a practical route to better performance on NISQ hardware. The MNIST experiment with 784 variables shows both quicker energy drop and a 2 percent accuracy lift, illustrating direct utility.

Core claim

Dividing the interaction graph, sampling subproblems with QAOA using XY mixers, and training weight-conditioned neural network surrogates to generate constraint-preserving proposals yields MCMC that mixes faster as N grows, with average autocorrelation decay improvements of 20.3 times over nearest-neighbor pair flips and 7.6 times over non-nearest-neighbor flips on 3-regular graphs, plus faster convergence and 2.03 percent higher classification accuracy on an MNIST feature-mask task with N=784.

What carries the argument

The divide-and-conquer neural network surrogate that conditions proposal distributions on local Hamming weight after QAOA sampling of subgraphs.

If this is right

Mixing acceleration grows with increasing system size N on 3-regular graphs.
The approach outperforms both nearest-neighbor and non-nearest-neighbor classical pair-flip proposals.
Energy convergence is faster on the N=784 MNIST feature mask problem.
Final classification accuracy improves by 2.03 percent over the compared classical baseline.
The framework supports scalable MCMC on NISQ devices for large constrained Ising problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subgraph-plus-surrogate pattern could be tested on hardware-generated QAOA samples to measure how device noise affects surrogate fidelity at scale.
Conditioning networks on additional local observables beyond Hamming weight might allow the method to handle other global constraints such as parity or energy bounds.
Because speedup increases with N, the technique may become relatively more valuable precisely when classical MCMC becomes intractable.
Integration with other quantum-enhanced MCMC proposals could be explored to combine divide-and-conquer efficiency with different proposal mechanisms.

Load-bearing premise

Neural network surrogates trained only on subgraph QAOA samples can generate proposal distributions that preserve the global fixed Hamming weight constraint and deliver mixing gains that continue to scale with system size without bias or overfitting.

What would settle it

Running the method on 3-regular graphs with N several times larger than tested and checking whether the measured autocorrelation decay rate keeps improving at the reported factor or saturates while the sampled distribution remains unbiased against exact enumeration on small instances.

Figures

Figures reproduced from arXiv: 2604.20701 by Keisuke Fujii, Yuichiro Nakano, Yuya Kawamata.

**Figure 1.** Figure 1: FIG. 1. A schematic diagram of the proposed method in this study. The method comprises four main components and proceeds [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: FIG. 2. A conceptual diagram of the conditional MADE. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: shows an example of the MCMC results for N = 128 and QAOA size |B| = 16. The horizontal axis represents the lag l, and the vertical axis shows the autocorrelation ρ(l). The blue solid line corresponds to our proposed method, while the red and green solid lines represent global and local Kawasaki dynamics, respectively. The autocorrelation of our method decays faster, indicating faster MCMC mixing. The d… view at source ↗

**Figure 4.** Figure 4: FIG. 4. The dependence on the system size [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. MCMC result for MNIST feature selection. The hori [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Autocorrelation [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. MCMC results for MNIST feature selection. The hor [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Sampling problems are promising candidates for demonstrating quantum advantage, and one approach known as quantum-enhanced Markov chain Monte Carlo [Layden, D. et al., Nature 619, 282-287 (2023)] uses quantum samples as a proposal distribution to accelerate convergence to a target distribution. On the other hand, many practical problems are large-scale and constrained, making it difficult to construct efficient proposal distributions in classical methods and slowing down MCMC mixing. In this work, we propose a divide-and-conquer neural network surrogate framework for quantum sampling to accelerate MCMC under fixed Hamming weight constraints. Our method divides the interaction graph for an Ising problem into subgraphs, generates samples using QAOA for those subproblems with an XY mixer, and trains neural network surrogates conditioned on the Hamming weight to provide proposal distributions for each subset while preserving the constraint. In numerical experiments of Boltzmann sampling on 3-regular graphs, our method consistently accelerated mixing as the system size $N$ increased, with average improvements in the autocorrelation decay rate constant by speedup factors of about $20.3$ and $7.6$ over classical pair-flip methods based on nearest-neighbor and non-nearest-neighbor exchanges, respectively. We also applied the method to an MNIST feature mask optimization problem with $N=784$, obtaining faster energy convergence and a $2.03\%$ higher classification accuracy. These results show that our method enables efficient and scalable MCMC and can outperform classical methods for practical applications on NISQ devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable divide-and-conquer scheme that uses QAOA on subgraphs plus Hamming-weight-conditioned NN surrogates to speed up constrained MCMC, with concrete speedups shown on 3-regular graphs and a modest win on an MNIST mask task.

read the letter

The main point worth knowing is that the authors split the interaction graph, run QAOA with an XY mixer on the pieces to get local samples, then train neural nets conditioned on Hamming weight to turn those into global MCMC proposals that keep the total weight fixed. This is a clean way to handle the constraint that usually trips up quantum-enhanced sampling on large instances. The numerical tests on 3-regular graphs show mixing improving as N grows, with the reported factors of roughly 20 and 7 over the two classical pair-flip baselines, and the MNIST run with 784 variables gives faster energy drop plus a 2% accuracy bump. Those results are the strongest part of the work. The combination of partitioning, XY-mixer QAOA, and weight-conditioned surrogates is not in the earlier quantum MCMC papers they cite, so the framework itself is new. The constraint handling looks deliberate and avoids the obvious bias pitfalls. On the soft side, the abstract gives no error bars, no variance across random seeds or graph ensembles, and no explicit training details or partitioning algorithm, so it is hard to tell how stable the gains are or whether they depend on lucky hyperparameter choices. The speedups are measured against simple classical moves rather than stronger modern baselines, which makes the comparison less decisive. The MNIST accuracy lift is small enough that it could be within noise for some models. This is aimed at people already working on hybrid quantum-classical sampling for optimization or machine learning. A reader who cares about practical NISQ proposals for constrained problems will find the concrete scaling behavior useful. The paper deserves a serious referee because the central construction is coherent, the experiments are on non-trivial sizes, and the claims are directly testable once the methods section and code are examined.

Referee Report

2 major / 2 minor

Summary. The paper proposes a divide-and-conquer neural network surrogate framework for quantum-enhanced MCMC sampling of Ising problems under fixed Hamming weight constraints. The interaction graph is partitioned into subgraphs; QAOA with an XY mixer generates samples on subproblems; neural networks conditioned on Hamming weight are trained as surrogates to produce proposal distributions for the global MCMC chain. Numerical experiments on 3-regular graphs report that the method accelerates mixing with average autocorrelation decay speedups of 20.3 and 7.6 relative to nearest-neighbor and non-nearest-neighbor classical pair-flip baselines, with the advantage growing with system size N; an MNIST feature-mask optimization with N=784 is also shown to converge faster and yield 2.03% higher classification accuracy.

Significance. If the numerical claims hold, the work offers a scalable route to combine NISQ quantum sampling with classical MCMC for large constrained optimization tasks. The divide-and-conquer structure directly addresses the difficulty of constructing valid proposals under global constraints, and the reported scaling of speedup with N together with the MNIST demonstration constitute concrete evidence of practical utility beyond small-system toy models. The explicit conditioning mechanism and use of subgraph QAOA are strengths that could be extended to other constrained sampling problems.

major comments (2)

Abstract and Numerical Experiments section: the reported average speedup factors (20.3 and 7.6) and the 2.03% accuracy gain are presented without error bars, the number of independent graph realizations or MCMC runs, or any statistical test, which is required to substantiate the central claim that acceleration is consistent and increases with N.
Methods / Numerical Experiments: the description of how subgraph proposals are combined while exactly preserving the global Hamming weight (via NN conditioning) lacks an explicit normalization step, pseudocode, or verification that the resulting proposal distribution remains unbiased with respect to the target Boltzmann measure; this is load-bearing for the validity of the MCMC chain.

minor comments (2)

The abstract and introduction should define QAOA on first use and briefly state the neural-network architecture (layer widths, activations, training loss) to improve readability.
Figure captions for the autocorrelation plots should include the precise definition of the decay-rate constant and the range of N values tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. The comments identify opportunities to strengthen the statistical presentation and the clarity of the algorithmic details. We address each point below and will incorporate the suggested revisions.

read point-by-point responses

Referee: Abstract and Numerical Experiments section: the reported average speedup factors (20.3 and 7.6) and the 2.03% accuracy gain are presented without error bars, the number of independent graph realizations or MCMC runs, or any statistical test, which is required to substantiate the central claim that acceleration is consistent and increases with N.

Authors: We agree that error bars, the number of independent realizations, and a statistical test are necessary to support the scaling claims. In the revised manuscript we will report the speedup factors with standard errors computed over the independent graph realizations and MCMC trajectories used in the experiments. We will explicitly state the number of independent 3-regular graphs generated for each system size N and the number of MCMC runs per graph. A statistical test (e.g., linear regression on log-speedup versus log-N with p-value) will be added to quantify the growth of the advantage with N. The MNIST accuracy improvement will likewise be reported with its standard deviation across repeated training runs. revision: yes
Referee: Methods / Numerical Experiments: the description of how subgraph proposals are combined while exactly preserving the global Hamming weight (via NN conditioning) lacks an explicit normalization step, pseudocode, or verification that the resulting proposal distribution remains unbiased with respect to the target Boltzmann measure; this is load-bearing for the validity of the MCMC chain.

Authors: We appreciate the referee's emphasis on this foundational aspect. The current text explains that the neural-network surrogates are conditioned on the global Hamming weight and that subgraph QAOA samples are used to build the proposal, but we concur that an explicit normalization procedure, pseudocode, and a direct verification of unbiasedness would improve rigor. In the revision we will insert a new subsection (or appendix) containing: (i) the precise normalization formula that ensures the proposal probabilities sum to one while respecting the global weight constraint, (ii) a compact pseudocode listing the full proposal-generation and acceptance steps, and (iii) a short verification argument (or small-scale numerical check) showing that the resulting proposal satisfies the necessary conditions for the Metropolis-Hastings chain to remain unbiased with respect to the target Boltzmann distribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on direct numerical benchmarks

full rationale

The paper presents an algorithmic framework (divide-and-conquer QAOA + conditioned NN surrogates) whose central performance claims—speedup factors of ~20.3 and ~7.6 in autocorrelation decay, plus MNIST accuracy gain—are obtained from explicit Monte Carlo simulations on 3-regular graphs and N=784 instances. These quantities are measured against independent classical pair-flip baselines rather than being algebraically forced by any fitted parameter or self-referential definition. No equation or theorem in the manuscript reduces a reported prediction to its own training inputs by construction, and external citations (e.g., Layden et al.) supply the motivating quantum-sampling idea without carrying the load of the reported speedups. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The framework relies on standard quantum and machine-learning components plus several implementation choices whose values are not fixed by first principles.

free parameters (3)

QAOA circuit parameters and depth
Optimized per subgraph to generate proposal samples; values chosen or fitted for each subproblem.
Neural-network architecture and training hyperparameters
Layer sizes, learning rates, and conditioning mechanisms are selected to match QAOA output distributions.
Subgraph partitioning scheme
How the interaction graph is divided into subgraphs is a design choice that affects surrogate quality.

axioms (2)

domain assumption QAOA with XY mixer produces useful samples for constrained subproblems that can be learned by a neural network.
Invoked when the method trains surrogates on QAOA output to replace quantum calls.
domain assumption Conditioning the neural network on Hamming weight preserves the global constraint when subproblem proposals are combined.
Central to the divide-and-conquer claim that the full-system constraint remains satisfied.

pith-pipeline@v0.9.0 · 5588 in / 1581 out tokens · 50648 ms · 2026-05-10T00:42:15.610999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages · 1 internal anchor

[1]

As a baseline, we use simulated annealing with Kawasaki pair-flip updates, which preserves the to- tal Hamming weight

Unlike the uniform superposition obtained by ap- plying Hadamard gates to all qubits, we prepare an ini- tial state biased so that the Hamming weight is concen- trated aroundk B = 2 with small-angle single-qubit ro- tation gates, and then start the optimization from that neighborhood. As a baseline, we use simulated annealing with Kawasaki pair-flip updat...
[2]

P. W. Shor, inProceedings 35th annual symposium on foundations of computer science(Ieee, 1994) pp. 124– 134

1994
[3]

L. K. Grover, Physical review letters79, 325 (1997)

1997
[4]

D. S. Abrams and S. Lloyd, Physical Review Letters83, 5162 (1999)

1999
[5]

A. W. Harrow, A. Hassidim, and S. Lloyd, Physical re- view letters103, 150502 (2009)

2009
[6]

Preskill, Quantum2, 79 (2018)

J. Preskill, Quantum2, 79 (2018)

2018
[7]

Arute, K

F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell,et al., nature574, 505 (2019)

2019
[8]

Zhong, H

H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C. Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu,et al., Science370, 1460 (2020)

2020
[9]

L. S. Madsen, F. Laudenbach, M. F. Askarani, F. Rortais, T. Vincent, J. F. Bulmer, F. M. Miatto, L. Neuhaus, L. G. Helt, M. J. Collins,et al., Nature606, 75 (2022)

2022
[10]

Cerezo, A

M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio,et al., Nature Reviews Physics3, 625 (2021)

2021
[11]

A Quantum Approximate Optimization Algorithm

E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint arXiv:1411.4028 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Mitarai, M

K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Phys- ical Review A98, 032309 (2018)

2018
[13]

Classification with Quantum Neural Networks on Near Term Processors

E. Farhi and H. Neven, arXiv preprint arXiv:1802.06002 (2018)

work page Pith review arXiv 2018
[14]

Peruzzo, J

A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, Nature communications5, 4213 (2014)

2014
[15]

J. F. Gonthier, M. D. Radin, C. Buda, E. J. Doskocil, C. M. Abuan, and J. Romero, Physical Review Research 4, 033154 (2022)

2022
[16]

J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Nature communications9, 4812 (2018)

2018
[17]

Hangleiter and J

D. Hangleiter and J. Eisert, Reviews of Modern Physics 95, 035001 (2023)

2023
[18]

Kanno, M

K. Kanno, M. Kohda, R. Imai, S. Koh, K. Mitarai, W. Mizukami, and Y. O. Nakagawa, arXiv preprint arXiv:2302.11320 (2023)

work page arXiv 2023
[19]

Robledo-Moreno, M

J. Robledo-Moreno, M. Motta, H. Haas, A. Javadi- Abhari, P. Jurcevic, W. Kirby, S. Martiel, K. Sharma, S. Sharma, T. Shirakawa,et al., Science Advances11, eadu9991 (2025)

2025
[20]

Layden, G

D. Layden, G. Mazzola, R. V. Mishmash, M. Motta, P. Wocjan, J.-S. Kim, and S. Sheldon, Nature619, 282 (2023)

2023
[21]

Nakano, K

Y. Nakano, K. N. Okada, and K. Fujii, PRX Quantum 7, 010338 (2026)

2026
[22]

Nakano, H

Y. Nakano, H. Hakoshima, K. Mitarai, and K. Fujii, Physical Review Research6, 033105 (2024)

2024
[23]

P. C. Lotshaw, G. Siopsis, J. Ostrowski, R. Herrman, 12 R. Alam, S. Powers, and T. S. Humble, Physical Review A108, 042411 (2023)

2023
[24]

D´ ıez-Valle, D

P. D´ ıez-Valle, D. Porras, and J. J. Garc´ ıa-Ripoll, Physical review letters130, 050601 (2023)

2023
[25]

Germain, K

M. Germain, K. Gregor, I. Murray, and H. Larochelle, in International conference on machine learning(PMLR,
[26]

Uria, M.-A

B. Uria, M.-A. Cˆ ot´ e, K. Gregor, I. Murray, and H. Larochelle, Journal of Machine Learning Research17, 1 (2016)

2016
[27]

Larochelle and I

H. Larochelle and I. Murray, inProceedings of the four- teenth international conference on artificial intelligence and statistics(JMLR Workshop and Conference Pro- ceedings, 2011) pp. 29–37

2011
[28]

Neuhaus and J

T. Neuhaus and J. S. Hager, Journal of statistical physics 113, 47 (2003)

2003
[29]

Nußbaumer, E

A. Nußbaumer, E. Bittner, and W. Janke, Progress of Theoretical Physics Supplement184, 400 (2010)

2010
[30]

M. A. Nau, L. A. Nutricati, B. Camino, P. A. Warburton, and A. K. Maier, Scientific Reports15, 28937 (2025)

2025
[31]

M¨ ucke, R

S. M¨ ucke, R. Heese, S. M¨ uller, M. Wolter, and N. Pi- atkowski, Quantum Machine Intelligence5, 11 (2023)

2023
[32]

P. C. Hohenberg and B. I. Halperin, Reviews of Modern Physics49, 435 (1977)

1977
[33]

Kuchukova, M

A. Kuchukova, M. Pappik, W. Perkins, and C. Yap, Ran- dom Structures & Algorithms67, e70038 (2025)

2025
[34]

Hadfield, Z

S. Hadfield, Z. Wang, B. O’gorman, E. G. Rieffel, D. Ven- turelli, and R. Biswas, Algorithms12, 34 (2019)

2019
[35]

Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel, Physical Review A101, 012320 (2020)

2020
[36]

Papamakarios, T

G. Papamakarios, T. Pavlakou, and I. Murray, Advances in neural information processing systems30(2017)

2017
[37]

Metropolis, A

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, The journal of chemical physics21, 1087 (1953)

1953
[38]

W. K. Hastings, Biometrika57, 97 (1970)

1970
[39]

Kawasaki, Physical Review145, 224 (1966)

K. Kawasaki, Physical Review145, 224 (1966)

1966
[40]

Hukushima and K

K. Hukushima and K. Nemoto, Journal of the Physical Society of Japan65, 1604 (1996)

1996
[41]

H. G. Katzgraber, M. K¨ orner, and A. P. Young, Physical Review B—Condensed Matter and Materials Physics73, 224432 (2006)

2006
[42]

H. Peng, F. Long, and C. Ding, IEEE Transactions on pattern analysis and machine intelligence27, 1226 (2005)

2005