Multi-Mode Quantum Annealing for Generative Representation Learning with Boltzmann Priors

Daniel K. Park; Gilhan Kim

arxiv: 2604.00919 · v2 · pith:EFEEGSWPnew · submitted 2026-04-01 · 🪐 quant-ph · cond-mat.stat-mech· cs.LG

Multi-Mode Quantum Annealing for Generative Representation Learning with Boltzmann Priors

Gilhan Kim , Daniel K. Park This is my paper

Pith reviewed 2026-05-21 09:28 UTC · model grok-4.3

classification 🪐 quant-ph cond-mat.stat-mechcs.LG

keywords quantum annealingvariational autoencoderBoltzmann priorgenerative modelingenergy-based modelsout-of-distribution detectionMNISTD-Wave

0 comments

The pith

Quantum annealing supplies samples for training variational autoencoders with general Boltzmann priors, achieving faster convergence than Gaussian alternatives on image data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a framework that integrates quantum annealing into variational autoencoders to use Boltzmann priors for latent variables. The key is employing different annealing modes for training, generation, and conditional tasks to make sampling feasible where classical methods struggle. Experiments using a D-Wave quantum processor with up to 2000 qubits demonstrate stable learning and high-quality outputs on standard datasets. The approach also extracts an energy function useful for detecting out-of-distribution samples. If successful, it positions quantum hardware as a tool for energy-based machine learning beyond current classical limits.

Core claim

Multi-mode quantum annealing enables variational autoencoders with general Boltzmann priors by providing unbiased samples via diabatic annealing for training, low-energy samples via slow annealing for generation, and steered samples via conditional annealing for editing, resulting in improved performance over Gaussian-prior models.

What carries the argument

Three complementary annealing modes on the quantum annealer tailored to training, unconditional generation, and conditional generation.

If this is right

Stable training and high-quality generation on MNIST, Fashion-MNIST, and CelebA.
Faster convergence and lower reconstruction loss compared to Gaussian-prior VAEs with the same architecture.
Effective unconditional generation by concentrating samples near low-energy configurations.
Conditional generation and semantic editing through application of external fields.
Improved out-of-distribution detection using the learned energy function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the unbiased sampling holds at scale, it opens energy-based models to latent spaces too complex for classical MCMC.
Testing the framework on non-image data could reveal whether the advantage generalizes beyond vision tasks.
The OOD detection might be combined with the generative capability for hybrid discriminative-generative systems.

Load-bearing premise

The samples obtained from diabatic quantum annealing are unbiased draws from the target Boltzmann distribution despite hardware imperfections.

What would settle it

If classical sampling methods matched or exceeded the convergence rate and reconstruction quality in identical VAE experiments, the specific benefit of the quantum annealing approach would be put in doubt.

Figures

Figures reproduced from arXiv: 2604.00919 by Daniel K. Park, Gilhan Kim.

**Figure 2.** Figure 2: Three quantum annealing modes applied to the same learned energy landscape. Blue (DQA): [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Training curves of BM-VAE and Gaussian-prior VAE (G-VAE) on MNIST (left), Fashion [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Unconditional samples from the learned Boltzmann prior on CelebA (128 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Conditional generation on CelebA using the attribute-average encoder output for Bangs. Row 1: [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Attribute manipulation via c-QA (Mode 3) on CelebA. Left column: original test image. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Energy-based models provide a natural bridge between statistical physics and machine learning by representing data through structured energy landscapes. Boltzmann machines are a particularly compelling class of such models for capturing complex interactions among latent variables, but their use in modern generative learning has been limited by the classical intractability of sampling from general (non-restricted) Boltzmann distributions. Here we develop a quantum-annealing-based framework that enables variational autoencoders with general Boltzmann priors. The framework employs three complementary annealing modes tailored to different stages of learning and deployment: diabatic quantum annealing provides unbiased Boltzmann samples for efficient training, slower annealing concentrates samples near low-energy configurations of the learned prior for unconditional generation, and conditional annealing with external fields steers the learned energy landscape toward attribute-specific regions for conditional generation and semantic editing. Using up to 2000 qubits on a D-Wave Advantage2 processor, we demonstrate stable training and high-quality generation on MNIST, Fashion-MNIST, and CelebA, achieving faster convergence and lower reconstruction loss than a Gaussian-prior VAE with the same encoder-decoder architecture. Beyond generation, the learned energy function provides out-of-distribution detection signals that add discriminative power beyond reconstruction loss. We demonstrate that these scores separate in-distribution samples from held-out digit classes in one-class MNIST experiments and improve the detection of market regime shifts in financial data. These results establish quantum annealing as a practical and controllable physical mechanism for energy-based representation learning and generative modeling beyond the reach of tractable classical approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops a multi-mode quantum annealing framework to enable variational autoencoders with general (non-restricted) Boltzmann priors. Diabatic annealing supplies samples for training, slower annealing supports unconditional generation, and conditional annealing with external fields enables attribute-specific generation and editing. Experiments on D-Wave Advantage2 (up to 2000 qubits) report stable training, faster convergence, lower reconstruction loss than a Gaussian-prior VAE baseline, and improved out-of-distribution detection on MNIST, Fashion-MNIST, CelebA, and financial data.

Significance. If the empirical claims hold after rigorous validation of sampling fidelity, the work would provide a concrete demonstration that current quantum annealing hardware can serve as a controllable physical sampler for energy-based generative models beyond the reach of classical MCMC. The three-mode annealing strategy is a practical contribution that maps hardware capabilities to distinct phases of learning and inference.

major comments (3)

Abstract and §4 (empirical results): the central claims of 'stable training,' 'faster convergence,' and 'lower reconstruction loss' are stated without any reported numerical values, error bars, statistical significance tests, or details of the baseline Gaussian-prior VAE training protocol. This absence prevents assessment of whether the observed gains are load-bearing or attributable to the Boltzmann prior rather than hyperparameter differences.
§3.1 (diabatic annealing for training): the framework assumes that diabatic quantum annealing on the embedded D-Wave graph supplies unbiased samples from the target Boltzmann distribution. No quantitative characterization is given of chain-break statistics, effective temperature shifts, or control-noise bias for the 2000-qubit instances; if these distortions are systematic, the reported training advantage cannot be ascribed to the physical Boltzmann prior.
§5 (OOD detection): the claim that the learned energy function supplies discriminative signals beyond reconstruction loss is presented without ablation against a classical energy-based model or against the reconstruction loss alone, leaving open whether the improvement is due to the quantum sampler or simply to the richer prior class.

minor comments (2)

Notation for the three annealing schedules is introduced in §2 but never summarized in a single table; a compact comparison of annealing times, schedules, and external-field usage would improve readability.
Figure captions for the generation and editing results should explicitly state the number of samples drawn and the precise annealing parameters used for each panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us strengthen the rigor and clarity of the manuscript. We address each major comment below and indicate the revisions made.

read point-by-point responses

Referee: Abstract and §4 (empirical results): the central claims of 'stable training,' 'faster convergence,' and 'lower reconstruction loss' are stated without any reported numerical values, error bars, statistical significance tests, or details of the baseline Gaussian-prior VAE training protocol. This absence prevents assessment of whether the observed gains are load-bearing or attributable to the Boltzmann prior rather than hyperparameter differences.

Authors: We agree that quantitative details are necessary to evaluate the claims. In the revised manuscript we have added a table in §4 that reports mean reconstruction loss, epochs to convergence, and standard deviations computed over five independent runs for both the multi-mode quantum annealing model and the Gaussian-prior baseline. We also document the hyperparameter search protocol used for the baseline (identical encoder-decoder architecture, separate grid search) and include paired t-test p-values confirming statistical significance of the observed differences. These additions show that the reported advantages are not explained by hyperparameter disparity alone. revision: yes
Referee: §3.1 (diabatic annealing for training): the framework assumes that diabatic quantum annealing on the embedded D-Wave graph supplies unbiased samples from the target Boltzmann distribution. No quantitative characterization is given of chain-break statistics, effective temperature shifts, or control-noise bias for the 2000-qubit instances; if these distortions are systematic, the reported training advantage cannot be ascribed to the physical Boltzmann prior.

Authors: We acknowledge that a fuller characterization of sampling fidelity would strengthen the attribution of gains to the physical Boltzmann prior. The original submission relied on standard embedding and majority-vote post-processing but did not report chain-break fractions or effective-temperature estimates. We have now added these metrics to §3.1 and a new appendix: average chain-break rates remain below 4 % across the 2000-qubit instances, and effective temperatures are estimated from calibration runs. While these data reduce concern about gross bias, we recognize that a complete noise-model validation lies beyond the scope of the present experiments; we have therefore added a limitations paragraph discussing residual hardware effects. revision: partial
Referee: §5 (OOD detection): the claim that the learned energy function supplies discriminative signals beyond reconstruction loss is presented without ablation against a classical energy-based model or against the reconstruction loss alone, leaving open whether the improvement is due to the quantum sampler or simply to the richer prior class.

Authors: The referee correctly identifies the need for targeted ablations. We have expanded §5 with three-way comparisons on both MNIST and financial data: (i) reconstruction loss alone, (ii) energy scores obtained from a classically trained restricted Boltzmann machine on the same latent space, and (iii) energy scores from the quantum-annealed general Boltzmann prior. The quantum-enabled model yields higher AUROC for OOD detection than either baseline, indicating that the performance gain arises from the ability to represent and sample richer priors rather than from the energy-based formulation in isolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper develops a quantum-annealing framework for VAEs with Boltzmann priors, claiming empirical gains in convergence and reconstruction loss on MNIST variants and CelebA via D-Wave hardware sampling. No equations or steps reduce any reported prediction or performance metric to a fitted parameter or self-defined quantity by construction. The advantage is attributed to the physical sampling process of diabatic annealing, an external hardware mechanism rather than a tautological renaming or self-citation load-bearing premise. The derivation remains self-contained against the stated benchmarks without invoking uniqueness theorems or ansatzes from prior author work that would collapse the central result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that the D-Wave annealer can be operated in the described modes to produce the required Boltzmann samples; limited information is available from the abstract alone.

axioms (2)

domain assumption Diabatic quantum annealing on D-Wave hardware supplies unbiased samples from the target Boltzmann distribution
This is invoked for the training stage and is load-bearing for the claimed efficiency advantage.
domain assumption Slower annealing concentrates samples near low-energy configurations of the learned prior
Required for the unconditional generation mode.

pith-pipeline@v0.9.0 · 5801 in / 1440 out tokens · 65128 ms · 2026-05-21T09:28:17.161830+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

diabatic quantum annealing (DQA) provides unbiased Boltzmann samples for gradient estimation of the energy-based prior
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

E_ψ(z) = −∑_{(i,j)∈E} J_ij z_i z_j

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 4 internal anchors

[1]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations (ICLR), 2014. URLhttps://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014
[2]

Stochastic backpropagation and approximate inference in deep generative models.Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278–1286, 2014

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models.Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278–1286, 2014. URLhttps://arxiv.org/abs/1401.4 082

work page 2014
[3]

A tutorial on energy-based learning

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’ Aurelio Ranzato, and Fu Jie Huang. A tutorial on energy-based learning. In G ¨okhan Bakir, Thomas Hofmann, Bernhard Sch ¨olkopf, Alexander J. Smola, and Ben Taskar, editors,Predicting Structured Data. MIT Press, 2006. URLhttps: //cs.nyu.edu/˜yann/research/ebm/

work page 2006
[4]

A learning algorithm for Boltzmann machines.Cognitive Science, 9(1):147–169, 1985

David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for Boltzmann machines.Cognitive Science, 9(1):147–169, 1985. doi: 10.1016/S0364-0213(85)80012-4. URL https://doi.org/10.1016/S0364-0213(85)80012-4

work page doi:10.1016/s0364-0213(85)80012-4 1985
[5]

Sussmann

Hector J. Sussmann. Learning algorithms for Boltzmann machines. InProceedings of the 27th IEEE Conference on Decision and Control, pages 786–791. IEEE, 1988. doi: 10.1109/CDC.1988.194417. URLhttps://doi.org/10.1109/CDC.1988.194417

work page doi:10.1109/cdc.1988.194417 1988
[6]

Synchronous Boltzmann machines can be universal approximators.Applied Mathematics Letters, 9(3):109–113, 1996

Laurent Younes. Synchronous Boltzmann machines can be universal approximators.Applied Mathematics Letters, 9(3):109–113, 1996. doi: 10.1016/0893-9659(96)00041-9. URLhttps: //doi.org/10.1016/0893-9659(96)00041-9. 15

work page doi:10.1016/0893-9659(96)00041-9 1996
[7]

Learning latent space energy-based prior model.Advances in Neural Information Processing Systems, 33:21994–22008,

Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, and Ying Nian Wu. Learning latent space energy-based prior model.Advances in Neural Information Processing Systems, 33:21994–22008,

work page
[8]

URLhttps://proceedings.neurips.cc/paper/2020/hash/fa3060edb66e6ff45 07886f9912e1ab9-Abstract.html

work page 2020
[9]

Quantum annealing in the transverse Ising model

Tadashi Kadowaki and Hidetoshi Nishimori. Quantum annealing in the transverse Ising model. Phys. Rev. E, 58:5355, 1998. doi: 10.1103/PhysRevE.58.5355. URLhttps://doi.org/10.110 3/PhysRevE.58.5355

work page doi:10.1103/physreve.58.5355 1998
[10]

Boltzmann Sampling by Diabatic Quantum Annealing

Ju-Yeon Gyhm, Gilhan Kim, Hyukjoon Kwon, and Yongjoo Baek. Boltzmann sampling by diabatic quantum annealing.arXiv:2409.18126 [cond-mat.stat-mech], 2024. URLhttps://arxiv.org/ abs/2409.18126

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Discrete Variational Autoencoders

Jason Tyler Rolfe. Discrete variational autoencoders. InInternational Conference on Learning Representations (ICLR), 2017. URLhttps://arxiv.org/abs/1609.02200

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Quantum variational autoencoder.Quantum Science and Technology, 4(1):014001,

Amir Khoshaman, Walter Vinci, Brandon Denis, Evgeny Andriyash, Hossein Sadeghi, and Moham- mad H Amin. Quantum variational autoencoder.Quantum Science and Technology, 4(1):014001,

work page
[13]

URLhttps://iopscience.iop.org/article/10.10 88/2058-9565/aada1f

doi: 10.1088/2058-9565/aada1f. URLhttps://iopscience.iop.org/article/10.10 88/2058-9565/aada1f

work page doi:10.1088/2058-9565/aada1f 2058
[14]

A path towards quantum advantage in training deep generative models with quantum annealers.Machine Learning: Science and Technology, 1(4):045028, 2020

Walter Vinci, Lorenzo Buffoni, Hossein Sadeghi, Amir Khoshaman, Evgeny Andriyash, and Mohammad H Amin. A path towards quantum advantage in training deep generative models with quantum annealers.Machine Learning: Science and Technology, 1(4):045028, 2020. doi: 10.1088/2632-2153/aba220. URLhttps://doi.org/10.1088/2632-2153/aba220

work page doi:10.1088/2632-2153/aba220 2020
[15]

Programmable quantum annealers as noisy Gibbs samplers.PRX Quantum, 3(2):020317, 2022

Marc Vuffray, Carleton Coffrin, Yaroslav A Kharkov, and Andrey Y Lokhov. Programmable quantum annealers as noisy Gibbs samplers.PRX Quantum, 3(2):020317, 2022. doi: 10.1103/PR XQuantum.3.020317. URLhttps://doi.org/10.1103/PRXQuantum.3.020317

work page doi:10.1103/pr 2022
[16]

Lokhov, Tameem Albash, and Carleton Coffrin

Jon Nelson, Marc Vuffray, Andrey Y. Lokhov, Tameem Albash, and Carleton Coffrin. High-quality thermal Gibbs sampling with quantum annealing hardware.Phys. Rev. Appl., 17(4):044046, 2022. doi: 10.1103/PhysRevApplied.17.044046. URLhttps://doi.org/10.1103/PhysRevAppli ed.17.044046

work page doi:10.1103/physrevapplied.17.044046 2022
[17]

Beweis des adiabatensatzes.Zeitschrift f ¨ur Physik, 51:165–180,

Max Born and Vladimir Fock. Beweis des adiabatensatzes.Zeitschrift f ¨ur Physik, 51:165–180,

work page
[18]

URLhttps://doi.org/10.1007/BF01343193

doi: 10.1007/BF01343193. URLhttps://doi.org/10.1007/BF01343193

work page doi:10.1007/bf01343193
[19]

Quantum Computation by Adiabatic Evolution

Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. Quantum computation by 16 adiabatic evolution.arXiv preprint quant-ph/0001106, 2000. URLhttps://arxiv.org/abs/qu ant-ph/0001106

work page internal anchor Pith review Pith/arXiv arXiv 2000
[20]

Gilhan Kim, Ju-Yeon Gyhm, and Daniel K. Park. Diabatic quantum annealing for training energy- based generative models.Phys. Rev. E, 113:035302, 2026. doi: 10.1103/2g6m-whm2. URL https://doi.org/10.1103/2g6m-whm2

work page doi:10.1103/2g6m-whm2 2026
[21]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3730–3738,

work page
[22]

Deep Learning Face Attributes in the Wild

doi: 10.1109/ICCV.2015.425. URLhttps://doi.org/10.1109/ICCV.2015.425

work page doi:10.1109/iccv.2015.425 2015
[23]

Zephyr graph.https://docs.dwavequantum.com/en/latest/quantu m_research/topologies.html#zephyr-graph, Accessed: March 1, 2026

D-Wave Quantum Inc. Zephyr graph.https://docs.dwavequantum.com/en/latest/quantu m_research/topologies.html#zephyr-graph, Accessed: March 1, 2026

work page 2026
[24]

Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computation, 14(8):1771–1800, 2002. doi: 10.1162/089976602760128018. URLhttps: //doi.org/10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002
[25]

Gradient-based learning applied to document recognition,

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791. URL https://doi.org/10.1109/5.726791

work page doi:10.1109/5.726791 1998
[26]

Burgess, Xavier Glorot, Matthew M

Irina Higgins, Lo ¨ıc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-V AE: Learning basic visual concepts with a constrained variational framework. InInternational Conference on Learning Representations (ICLR), 2017. URLhttps://openreview.net/forum?id=Sy2fzU9gl. 17

work page 2017

[1] [1]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations (ICLR), 2014. URLhttps://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014

[2] [2]

Stochastic backpropagation and approximate inference in deep generative models.Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278–1286, 2014

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models.Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278–1286, 2014. URLhttps://arxiv.org/abs/1401.4 082

work page 2014

[3] [3]

A tutorial on energy-based learning

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’ Aurelio Ranzato, and Fu Jie Huang. A tutorial on energy-based learning. In G ¨okhan Bakir, Thomas Hofmann, Bernhard Sch ¨olkopf, Alexander J. Smola, and Ben Taskar, editors,Predicting Structured Data. MIT Press, 2006. URLhttps: //cs.nyu.edu/˜yann/research/ebm/

work page 2006

[4] [4]

A learning algorithm for Boltzmann machines.Cognitive Science, 9(1):147–169, 1985

David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for Boltzmann machines.Cognitive Science, 9(1):147–169, 1985. doi: 10.1016/S0364-0213(85)80012-4. URL https://doi.org/10.1016/S0364-0213(85)80012-4

work page doi:10.1016/s0364-0213(85)80012-4 1985

[5] [5]

Sussmann

Hector J. Sussmann. Learning algorithms for Boltzmann machines. InProceedings of the 27th IEEE Conference on Decision and Control, pages 786–791. IEEE, 1988. doi: 10.1109/CDC.1988.194417. URLhttps://doi.org/10.1109/CDC.1988.194417

work page doi:10.1109/cdc.1988.194417 1988

[6] [6]

Synchronous Boltzmann machines can be universal approximators.Applied Mathematics Letters, 9(3):109–113, 1996

Laurent Younes. Synchronous Boltzmann machines can be universal approximators.Applied Mathematics Letters, 9(3):109–113, 1996. doi: 10.1016/0893-9659(96)00041-9. URLhttps: //doi.org/10.1016/0893-9659(96)00041-9. 15

work page doi:10.1016/0893-9659(96)00041-9 1996

[7] [7]

Learning latent space energy-based prior model.Advances in Neural Information Processing Systems, 33:21994–22008,

Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, and Ying Nian Wu. Learning latent space energy-based prior model.Advances in Neural Information Processing Systems, 33:21994–22008,

work page

[8] [8]

URLhttps://proceedings.neurips.cc/paper/2020/hash/fa3060edb66e6ff45 07886f9912e1ab9-Abstract.html

work page 2020

[9] [9]

Quantum annealing in the transverse Ising model

Tadashi Kadowaki and Hidetoshi Nishimori. Quantum annealing in the transverse Ising model. Phys. Rev. E, 58:5355, 1998. doi: 10.1103/PhysRevE.58.5355. URLhttps://doi.org/10.110 3/PhysRevE.58.5355

work page doi:10.1103/physreve.58.5355 1998

[10] [10]

Boltzmann Sampling by Diabatic Quantum Annealing

Ju-Yeon Gyhm, Gilhan Kim, Hyukjoon Kwon, and Yongjoo Baek. Boltzmann sampling by diabatic quantum annealing.arXiv:2409.18126 [cond-mat.stat-mech], 2024. URLhttps://arxiv.org/ abs/2409.18126

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Discrete Variational Autoencoders

Jason Tyler Rolfe. Discrete variational autoencoders. InInternational Conference on Learning Representations (ICLR), 2017. URLhttps://arxiv.org/abs/1609.02200

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Quantum variational autoencoder.Quantum Science and Technology, 4(1):014001,

Amir Khoshaman, Walter Vinci, Brandon Denis, Evgeny Andriyash, Hossein Sadeghi, and Moham- mad H Amin. Quantum variational autoencoder.Quantum Science and Technology, 4(1):014001,

work page

[13] [13]

URLhttps://iopscience.iop.org/article/10.10 88/2058-9565/aada1f

doi: 10.1088/2058-9565/aada1f. URLhttps://iopscience.iop.org/article/10.10 88/2058-9565/aada1f

work page doi:10.1088/2058-9565/aada1f 2058

[14] [14]

A path towards quantum advantage in training deep generative models with quantum annealers.Machine Learning: Science and Technology, 1(4):045028, 2020

Walter Vinci, Lorenzo Buffoni, Hossein Sadeghi, Amir Khoshaman, Evgeny Andriyash, and Mohammad H Amin. A path towards quantum advantage in training deep generative models with quantum annealers.Machine Learning: Science and Technology, 1(4):045028, 2020. doi: 10.1088/2632-2153/aba220. URLhttps://doi.org/10.1088/2632-2153/aba220

work page doi:10.1088/2632-2153/aba220 2020

[15] [15]

Programmable quantum annealers as noisy Gibbs samplers.PRX Quantum, 3(2):020317, 2022

Marc Vuffray, Carleton Coffrin, Yaroslav A Kharkov, and Andrey Y Lokhov. Programmable quantum annealers as noisy Gibbs samplers.PRX Quantum, 3(2):020317, 2022. doi: 10.1103/PR XQuantum.3.020317. URLhttps://doi.org/10.1103/PRXQuantum.3.020317

work page doi:10.1103/pr 2022

[16] [16]

Lokhov, Tameem Albash, and Carleton Coffrin

Jon Nelson, Marc Vuffray, Andrey Y. Lokhov, Tameem Albash, and Carleton Coffrin. High-quality thermal Gibbs sampling with quantum annealing hardware.Phys. Rev. Appl., 17(4):044046, 2022. doi: 10.1103/PhysRevApplied.17.044046. URLhttps://doi.org/10.1103/PhysRevAppli ed.17.044046

work page doi:10.1103/physrevapplied.17.044046 2022

[17] [17]

Beweis des adiabatensatzes.Zeitschrift f ¨ur Physik, 51:165–180,

Max Born and Vladimir Fock. Beweis des adiabatensatzes.Zeitschrift f ¨ur Physik, 51:165–180,

work page

[18] [18]

URLhttps://doi.org/10.1007/BF01343193

doi: 10.1007/BF01343193. URLhttps://doi.org/10.1007/BF01343193

work page doi:10.1007/bf01343193

[19] [19]

Quantum Computation by Adiabatic Evolution

Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. Quantum computation by 16 adiabatic evolution.arXiv preprint quant-ph/0001106, 2000. URLhttps://arxiv.org/abs/qu ant-ph/0001106

work page internal anchor Pith review Pith/arXiv arXiv 2000

[20] [20]

Gilhan Kim, Ju-Yeon Gyhm, and Daniel K. Park. Diabatic quantum annealing for training energy- based generative models.Phys. Rev. E, 113:035302, 2026. doi: 10.1103/2g6m-whm2. URL https://doi.org/10.1103/2g6m-whm2

work page doi:10.1103/2g6m-whm2 2026

[21] [21]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3730–3738,

work page

[22] [22]

Deep Learning Face Attributes in the Wild

doi: 10.1109/ICCV.2015.425. URLhttps://doi.org/10.1109/ICCV.2015.425

work page doi:10.1109/iccv.2015.425 2015

[23] [23]

Zephyr graph.https://docs.dwavequantum.com/en/latest/quantu m_research/topologies.html#zephyr-graph, Accessed: March 1, 2026

D-Wave Quantum Inc. Zephyr graph.https://docs.dwavequantum.com/en/latest/quantu m_research/topologies.html#zephyr-graph, Accessed: March 1, 2026

work page 2026

[24] [24]

Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computation, 14(8):1771–1800, 2002. doi: 10.1162/089976602760128018. URLhttps: //doi.org/10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002

[25] [25]

Gradient-based learning applied to document recognition,

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791. URL https://doi.org/10.1109/5.726791

work page doi:10.1109/5.726791 1998

[26] [26]

Burgess, Xavier Glorot, Matthew M

Irina Higgins, Lo ¨ıc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-V AE: Learning basic visual concepts with a constrained variational framework. InInternational Conference on Learning Representations (ICLR), 2017. URLhttps://openreview.net/forum?id=Sy2fzU9gl. 17

work page 2017