arxiv: 2512.07808 · v2 · submitted 2025-12-08 · 🪐 quant-ph · cs.LG

LUNA: LUT-Based Neural Architecture for Fast and Low-Cost Qubit Readout

M. A. Farooq , G. Di Guglielmo , A. Rajagopala , N. Tran , V. A. Chhabria , A. Arora This is my paper

Pith reviewed 2026-05-17 00:14 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords qubit readoutneural networkLUTLogicNetsquantum error correctionsuperconducting qubitshardware acceleratordifferential evolution

0 comments p. Extension

The pith

LUNA pairs simple integrators with LUT-synthesized neural networks to cut qubit readout area by up to 10.95 times and latency by 30 percent at near-full fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LUNA, an accelerator designed for reading out superconducting qubits by turning their analog signals into classical 0 or 1 states. It first applies cheap integrators to shrink the signal dimensions without much hardware, then routes the results through LogicNets—neural networks turned directly into lookup-table logic—for rapid classification. Differential evolution tunes the design choices to balance speed, size, and accuracy. The result targets the bottleneck of high-resource, high-latency readout hardware that currently limits fast quantum error correction.

Core claim

LUNA achieves up to a 10.95x reduction in area and 30% lower latency compared to prior DNN-based readout methods, with little to no loss in classification fidelity, by pairing low-cost integrators for preprocessing with LUT-synthesized LogicNets for classification.

What carries the argument

Integrator-based dimensionality reduction combined with LogicNets (DNNs synthesized into LUT logic), tuned by differential evolution search.

If this is right

Readout hardware can be placed inside tight quantum error-correction feedback loops because inference finishes faster.
Large-scale quantum processors need far fewer FPGA or ASIC resources for the readout stage.
The same integrator-plus-LogicNet pattern can be reused for other real-time quantum measurement tasks.
Automated design search makes it practical to retarget the architecture to new qubit technologies or noise environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the fidelity holds on real chips, readout no longer needs to be the dominant consumer of control electronics in multi-qubit systems.
The approach could be stacked with cryogenic control electronics to reduce overall power and wiring overhead.
Similar LUT-based classifiers might accelerate other quantum tasks such as state discrimination in quantum sensing.

Load-bearing premise

That integrator preprocessing plus LUT-synthesized neural networks can keep classification accuracy high across the full range of real qubit states and noise found in superconducting devices.

What would settle it

Run the LUNA hardware design on actual superconducting qubit chips, measure readout fidelity over varied noise levels and qubit states, and compare the error rates directly against a full-precision DNN baseline.

Figures

Figures reproduced from arXiv: 2512.07808 by A. Arora, A. Rajagopala, G. Di Guglielmo, M. A. Farooq, N. Tran, V. A. Chhabria.

**Figure 2.** Figure 2: LUNA co-design flow: (1) enumerate, (2) prune, (3) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: A high level overview of the LUNA architecture. Values shown are demonstrative; taken from our [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Best cost trajectory across generations for each tar [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Qubit readout is a critical operation in quantum computing systems, which maps the analog response of qubits into discrete classical states. Deep neural networks (DNNs) have recently emerged as a promising solution to improve readout accuracy . Prior hardware implementations of DNN-based readout are resource-intensive and suffer from high inference latency, limiting their practical use in low-latency decoding and quantum error correction (QEC) loops. This paper proposes LUNA, a fast and efficient superconducting qubit readout accelerator that combines low-cost integrator-based preprocessing with Look-Up Table (LUT) based neural networks for classification. The architecture uses simple integrators for dimensionality reduction with minimal hardware overhead, and employs LogicNets (DNNs synthesized into LUT logic) to drastically reduce resource usage while enabling ultra-low-latency inference. We integrate this with a differential evolution based exploration and optimization framework to identify high-quality design points. Our results show up to a 10.95x reduction in area and 30% lower latency with little to no loss in fidelity compared to the state-of-the-art. LUNA enables scalable, low-footprint, and high-speed qubit readout, supporting the development of larger and more reliable quantum computing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LUNA pairs simple integrators with LUT-synthesized LogicNets to cut readout area and latency, but the fidelity numbers rest on limited noise coverage that needs more checks.

read the letter

LUNA pairs low-cost integrators for preprocessing with LogicNet LUT synthesis to deliver fast, small-footprint qubit readout. The headline numbers are a 10.95x area reduction and 30% latency improvement with minimal fidelity hit, which matters for keeping readout from bottlenecking error-correction cycles. The new piece is the concrete integration of minimal integrator reduction with LUT-mapped neural nets, plus the differential-evolution search to tune the design. Prior work on DNN readout was heavier on resources; this targets the hardware constraints directly and shows how to shrink the implementation while keeping inference quick. That is useful engineering progress for superconducting qubit systems. The results look plausible for the target domain. The architecture avoids heavy DSP blocks by leaning on simple integration and then mapping the classifier to pure LUT logic, which explains the area and speed gains. The optimization loop helps explore trade-offs without exhaustive search. The main soft spot is around fidelity preservation. The claim of little to no loss assumes the preprocessing and trained net generalize across the noise and state distributions seen in real devices. If the traces used for training and testing miss things like 1/f noise tails or crosstalk, the reported accuracy could be optimistic. The paper would be stronger with an ablation that measures information loss from the integrator alone and tests on held-out hardware-calibrated data. Without that, the generalization is not fully demonstrated. This work is for hardware designers and quantum control engineers who need low-resource readout blocks. Someone implementing FPGA-based decoders or looking for ASIC-friendly solutions would find the architecture and numbers directly applicable. It deserves peer review. The idea is grounded, the gains are quantified, and the gaps are fixable with additional experiments rather than fundamental flaws.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LUNA, a hardware accelerator for superconducting qubit readout that combines low-cost integrator-based preprocessing for dimensionality reduction with LUT-synthesized LogicNets for classification. A differential-evolution optimization framework is used to explore design points, with the central claim being up to 10.95× area reduction and 30% lower latency versus state-of-the-art DNN readout implementations while incurring little to no fidelity loss.

Significance. If the reported gains are substantiated with detailed, reproducible hardware measurements and fidelity validation on representative noise distributions, the work would offer a concrete path toward resource-efficient, low-latency readout suitable for quantum error-correction loops and larger-scale processors. The LogicNet synthesis approach for ultra-low-latency inference is a clear technical strength.

major comments (2)

[§4 and abstract] §4 (Results) and abstract: the headline claims of 10.95× area reduction and 30% latency reduction are stated without accompanying tables or figures that report absolute resource counts (LUTs, flip-flops, DSPs, BRAM), target device, or direct side-by-side numbers against the cited state-of-the-art baselines. This prevents verification of the magnitude of the improvement.
[§3.2 and §5] §3.2 (Optimization) and §5 (Evaluation): the fidelity-preservation claim rests on the untested assumption that integrator preprocessing plus LogicNet classification generalizes across the full distribution of qubit states and real-device noise (1/f noise, crosstalk, state-preparation errors). No ablation isolating integrator information loss or results on held-out hardware-calibrated traces are provided, leaving the central accuracy claim unsupported.

minor comments (2)

[Abstract] Abstract: replace the qualitative phrase 'little to no loss in fidelity' with a quantitative statement (e.g., 'fidelity within 0.3% of baseline') and cite the exact fidelity metric used.
[Figure 2] Figure 2 (architecture diagram): add explicit labels for the integrator output width and the LogicNet input format to clarify the data path.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate where we will revise the paper to improve clarity and support for our claims.

read point-by-point responses

Referee: [§4 and abstract] §4 (Results) and abstract: the headline claims of 10.95× area reduction and 30% latency reduction are stated without accompanying tables or figures that report absolute resource counts (LUTs, flip-flops, DSPs, BRAM), target device, or direct side-by-side numbers against the cited state-of-the-art baselines. This prevents verification of the magnitude of the improvement.

Authors: We agree that absolute resource counts and direct comparisons are necessary for full verification. In the revised manuscript we will add a table in §4 reporting absolute LUT, flip-flop, DSP, and BRAM utilization on the target FPGA device used for synthesis, together with side-by-side numbers against the cited DNN baselines. This will make the reported 10.95× area and 30% latency improvements directly verifiable. revision: yes
Referee: [§3.2 and §5] §3.2 (Optimization) and §5 (Evaluation): the fidelity-preservation claim rests on the untested assumption that integrator preprocessing plus LogicNet classification generalizes across the full distribution of qubit states and real-device noise (1/f noise, crosstalk, state-preparation errors). No ablation isolating integrator information loss or results on held-out hardware-calibrated traces are provided, leaving the central accuracy claim unsupported.

Authors: Our differential-evolution framework in §3.2 already explores a range of qubit-state distributions and synthetic noise models. We acknowledge that explicit ablations and broader noise coverage would strengthen the fidelity claim. In the revised §5 we will add an ablation isolating integrator information loss and additional results under 1/f noise, crosstalk, and state-preparation error models. Full hardware-calibrated traces from physical devices lie outside the current simulation-based evaluation; we will note this limitation and outline future experimental validation. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical optimization results are self-contained

full rationale

The paper proposes an architecture combining integrator preprocessing with LUT-synthesized LogicNets, then applies differential-evolution search to identify design points and reports measured area, latency, and fidelity outcomes relative to prior work. No derivation step reduces by construction to its own inputs: the optimization framework treats the neural-net parameters and integrator settings as searchable variables whose performance is evaluated externally rather than being redefined as the prediction. No self-citation chain, uniqueness theorem, or ansatz smuggling is invoked to force the headline claims. The central results remain falsifiable empirical measurements on the chosen benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. The differential-evolution optimizer almost certainly contains hyperparameters that function as free parameters, but none are named or quantified.

pith-pipeline@v0.9.0 · 5540 in / 1086 out tokens · 47932 ms · 2026-05-17T00:14:16.334466+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

simple integrators for dimensionality reduction ... LogicNets (DNNs synthesized into LUT logic) ... differential evolution based exploration
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

up to a 10.95x reduction in area and 30% lower latency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Retrieved November 01, 2025 from https://github.com/Xilinx/ logicnets/tree/master

2020.LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. Retrieved November 01, 2025 from https://github.com/Xilinx/ logicnets/tree/master

work page 2020
[2]

Noor Awad, Neeratyoy Mallik, and Frank Hutter. 2020. Differential evolution for neural architecture search (dehb). InInternational Conference on Machine Learning (ICLR) Neural Architecture Search (NAS) Workshop

work page 2020
[3]

2024.Data for "End-to-end workflow for machine learning-based qubit readout with QICK and hls4ml

Batao Du. 2024.Data for "End-to-end workflow for machine learning-based qubit readout with QICK and hls4ml. Retrieved November 11, 2025 from https://doi. org/10.5281/zenodo.14427490

work page doi:10.5281/zenodo.14427490 2024
[4]

Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jin- dariani, Nhan Tran, Luca Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier D...

work page 2021
[5]

Perdue, Nhan Tran, Omer Yesilyurt, and Daniel Bowring

Giuseppe Di Guglielmo, Botao Du, Javier Campos, Alexandra Boltasseva, Akash Dixit, Farah Fahim, Zhaxylyk Kudyshev, Santiago Lopez, Ruichao Ma, Gabriel N. Perdue, Nhan Tran, Omer Yesilyurt, and Daniel Bowring. 2025. End-to-End Workflow for Machine-Learning-Based Qubit Readout With QICK and hls4ml. IEEE Transactions on Quantum Engineering6 (2025), 1–10. doi...

work page doi:10.1109/tqe.2025 2025
[6]

Xiaorang Guo, Tigran Bunarjyan, Dai Liu, Benjamin Lienhard, and Martin Schulz

work page
[7]

In2025 62nd ACM/IEEE Design Automation Conference (DAC)

KLiNQ: Knowledge Distillation-Assisted Lightweight Neural Network for Qubit Readout on FPGA. In2025 62nd ACM/IEEE Design Automation Conference (DAC). 1–7. doi:10.1109/DAC63849.2025.11132854

work page doi:10.1109/dac63849.2025.11132854 2025
[8]

Johannes Heinsoo, Christian Kraglund Andersen, Ants Remm, Sebastian Krinner, Theodore Walter, Yves Salathé, Simone Gasparinetti, Jean-Claude Besse, Anton Potočnik, Andreas Wallraff, and Christopher Eichler. 2018. Rapid High-fidelity Multiplexed Readout of Superconducting Qubits.Phys. Rev. Appl.10 (Sep 2018), 034040. Issue 3. doi:10.1103/PhysRevApplied.10.034040

work page doi:10.1103/physrevapplied.10.034040 2018
[9]

Keysight Technologies. 2024. Quantum Control System. https://www.keysight. com/us/en/products/modular/pxi-products/quantum-control-system.html. Ac- cessed: 2024-03-05

work page 2024
[10]

Govia, Cole R

Benjamin Lienhard, Antti Vepsäläinen, Luke C.G. Govia, Cole R. Hoffer, Jack Y. Qiu, Diego Ristè, Matthew Ware, David Kim, Roni Winik, Alexander Melville, Bethany Niedzielski, Jonilyn Yoder, Guilhem J. Ribeill, Thomas A. Ohki, Hari K. Krovi, Terry P. Orlando, Simon Gustavsson, and William D. Oliver. 2022. Deep- Neural-Network Discrimination of Multiplexed ...

work page doi:10.1103/physrevapplied.17 2022
[11]

Oliver, Benjamin Lienhard, and Swamit Tannu

Satvik Maurya, Chaithanya Naik Mude, William D. Oliver, Benjamin Lienhard, and Swamit Tannu. 2023. Scaling Qubit Readout with Hardware Efficient Machine Learning Architectures. InProceedings of the 50th Annual Interna- tional Symposium on Computer Architecture(Orlando, FL, USA)(ISCA ’23). As- sociation for Computing Machinery, New York, NY, USA, Article 7...

work page doi:10.1145/3579371.3589042 2023
[12]

Miranda, Aman Arora, Zachary Susskind, Luis A.Q

Igor D.S. Miranda, Aman Arora, Zachary Susskind, Luis A.Q. Villon, Rafael F. Katopodis, Diego L.C. Dutra, Leandro S. De Araújo, Priscila M.V. Lima, Felipe M.G. França, Lizy K. John, and Mauricio Breternitz. 2022. LogicWiSARD: Memoryless Synthesis of Weightless Neural Networks. In2022 IEEE 33rd International Confer- ence on Application-specific Systems, Ar...

work page doi:10.1109/asap54787.2022.00014 2022
[13]

Schus- ter

Leandro Stefanazzi, Kenneth Treptow, Neal Wilcer, Chris Stoughton, Collin Bradford, Sho Uemura, Silvia Zorzetti, Salvatore Montella, Gustavo Can- celo, Sara Sussman, Andrew Houck, Shefali Saxena, Horacio Arnaldi, Ankur Agrawal, Helin Zhang, Chunyang Ding, and David I. Schus- ter. 2022. The QICK (Quantum Instrumentation Control Kit): Read- out and control ...

work page doi:10.1063/5.0076249/19817152/044709_1_online.pdf 2022
[14]

Rainer Storn and Kenneth Price. 1997. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces.Journal of Global Optimization11, 4 (01 Dec 1997), 341–359. doi:10.1023/A:1008202821328

work page doi:10.1023/a:1008202821328 1997
[15]

Zachary Susskind, Aman Arora, Igor D. S. Miranda, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Mauricio Breternitz, and Lizy K. John. 2023. Weightless Neural Networks for Efficient Edge Inference. InProceedings of the International Con- ference on Parallel Architectures and Comp...

work page doi:10.1145/3559009.3569680 2023
[16]

G. Turin. 1960. An introduction to matched filters.IRE Transactions on Information Theory6, 3 (1960), 311–329. doi:10.1109/TIT.1960.1057571

work page doi:10.1109/tit.1960.1057571 1960
[17]

Yaman Umuroglu, Yash Akhauri, Nicholas James Fraser, and Michaela Blott. 2020. LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In2020 30th International Conference on Field-Programmable Logic and Applications (FPL). 291–297. doi:10.1109/FPL50879.2020.00055

work page doi:10.1109/fpl50879.2020.00055 2020
[18]

Warm-start varia- tional quantum policy iteration

Neel R Vora, Yilun Xu, Akel Hasim, Neelay Fruitwala, Nam Nguyen, Haoran Liao, Jan Balewski, Abhi Rajagopala, Kasra Nowrouzi, Qing Ji, K. Brigitta Whaley, Irfan Siddiqi, Phuc Nguyen, and Gang Huang. 2024. QubiCML: ML-Powered Real-Time Quantum State Discrimination Enabling Mid-Circuit Measurements. In2024 IEEE International Conference on Quantum Computing a...

work page doi:10.1109/qce60285.2024.10332 2024
[19]

Davis, Peter Y

Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constan- tinides. 2020. LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference.IEEE Trans. Comput.69, 12 (2020), 1795–1808. doi:10.1109/TC.2020.2978817

work page doi:10.1109/tc.2020.2978817 2020
[20]

QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors,

Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I. Santiago, and Irfan Siddiqi. 2021. QubiC: An Open- Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors.IEEE Transactions on Quantum Engineering2 (2021), 1–11. doi:10.1109/TQE.2021.3116540

work page doi:10.1109/tqe.2021.3116540 2021
[21]

Zurich Instruments. 2024. Quantum Readout. https://www.zhinst.com/americas/ en/quantum-computing-systems/qubit-readout. Accessed: 2024-03-05

work page 2024