pith. machine review for the scientific record. sign in

arxiv: 2604.20013 · v1 · submitted 2026-04-21 · 🪐 quant-ph

Recognition: unknown

Assessing System Capabilities and Bottlenecks of an Early Fault-Tolerant Bicycle Architecture

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:55 UTC · model grok-4.3

classification 🪐 quant-ph
keywords modular quantum computingfault-tolerant quantum computingbivariate bicycle codesmagic state factorycompiler optimizationnon-Clifford gatescircuit failure probabilityquantum compilation
0
0 comments X

The pith

Synthesizing non-Clifford rotations at the magic state factory lowers estimated circuit failure probability by a factor of nine in modular fault-tolerant systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates bottlenecks in early modular fault-tolerant quantum computers that use bivariate bicycle codes. It finds that inter-module communication required by non-Clifford operations is the primary constraint limiting performance. The authors introduce a compilation pipeline with three optimizations, the most impactful being synthesis of arbitrary-angle rotations directly at the factory. This change reduces average estimated failure probability by a factor of 9.0 across non-Clifford benchmarks drawn from over forty categories. The gains remain stable when instruction costs, logical processing unit counts, and factory counts are varied.

Core claim

In modular architectures based on bivariate bicycle codes, inter-module communication induced by non-Clifford operations forms the dominant bottleneck. Synthesizing arbitrary-angle rotations at the magic state factory, combined with transvection-based Clifford deferral and targeted Clifford insertion, mitigates this bottleneck and yields a ninefold average reduction in estimated circuit failure probability for non-Clifford benchmarks while also shortening compile time and circuit duration.

What carries the argument

The syn@fac optimization, which performs synthesis of arbitrary-angle rotations at the magic state factory to avoid costly inter-module communication for non-Clifford operations.

If this is right

  • Transvection-based Clifford deferral reduces Clifford deferral compile time by 77 percent.
  • Clifford insertion shortens end-to-end circuit duration by 11.5 percent on average for MQTBench circuits.
  • The ninefold failure-probability reduction and other gains hold across sweeps of instruction cost ratios, LPU counts, and factory counts.
  • The optimizations extend evaluation to more than forty benchmark categories from quantum algorithms and Hamiltonian simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hardware designers may need to prioritize faster or larger magic state factories over further reductions in inter-module latency.
  • The same synthesis approach could be tested on other modular code families to check whether the factor-of-nine gain generalizes beyond bivariate bicycle codes.
  • Combining syn@fac with dynamic factory allocation policies might yield additional reductions in critical-path duration for very large circuits.

Load-bearing premise

The chosen model of instruction costs and inter-module communication latencies, together with bivariate bicycle codes as the error-correcting substrate, captures the dominant constraints of real early modular fault-tolerant hardware.

What would settle it

Implementing the syn@fac pipeline on physical hardware or a detailed simulator with the paper's assumed cost model and measuring whether the average failure probability reduction across the same non-Clifford benchmarks falls near or below a factor of nine.

Figures

Figures reproduced from arXiv: 2604.20013 by Ben Foxman, Gian-Luca R. Anselmetti, Kun Liu, Yongshan Ding.

Figure 1
Figure 1. Figure 1: FIG. 1. The reference modular bicycle architecture, based [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Compilation pipeline for the studied systems. We adopt the BB instruction set and the PBC-to-BB transformation [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Lowering single-module and multi-module PBC operations to BB instructions. Two components highlighted in the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. (a) Synthesize arbitrary-angle rotation by gate [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Synthesize an arbitrary-angle rotation [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. A segment is a consecutive sequence of in-module [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Visualization of the adopted compilation optimizations. (a-b) A PBC circuit, in form of a DAG, can be viewed as [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Overall performance comparison between baseline [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Ablation: syn@fac vs the baseline (syn@LPU) for the reference system. Breakdown of failure probability and [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. Cost-ratio sensitivity of syn@LPU versus syn@fac [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: FIG. 12. Factory count sensitivity around the ref [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: FIG. 13. Ablation: only Clifford insertion is applied vs the [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
read the original abstract

Early modular fault tolerant quantum computers remain constrained by costly inter-module communication and limited magic state factory service. Understanding such bottlenecks and investigating compiler optimizations most close the gap between algorithm requirements and hardware capabilities is a concrete and practically urgent systems problem. We study the modular architectures based on Bivariate Bicycle codes and identify the dominant bottleneck: inter-module communication induced by non-Clifford operations. We build a compilation pipeline to fill the missing parts of prior works and propose compiler optimizations: synthesizing arbitrary-angle rotations at the factory (syn@fac), transvection based Clifford deferral, and Clifford insertion for critical path duration reduction. We extend the evaluation scope of the prior work to 40+ benchmark categories drawn from PennyLane and MQTBench, including quantum algorithms and Hamiltonian simulations with varying sizes. Under the present instruction cost, syn@fac reduces estimated circuit failure probability by a factor of 9.0 on average across non-Clifford benchmarks. The robustness persists across sweeps of instruction cost ratios, LPU count, and factory count. Besides, transvection reduces Clifford deferral compile time by 77.04\%, while Clifford insertion reduces end-to-end circuit duration by 11.54\% on average on MQTBench, with smaller gains on Hamiltonian simulations. We hope this work inspires the studies on compiler optimizations for early modular FTQC systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper assesses bottlenecks in early modular fault-tolerant quantum computing architectures based on bivariate bicycle codes, identifying inter-module communication from non-Clifford operations as the dominant constraint. It introduces a compilation pipeline and three optimizations—synthesizing arbitrary-angle rotations at the factory (syn@fac), transvection-based Clifford deferral, and Clifford insertion for critical-path reduction—then evaluates them on 40+ benchmarks from PennyLane and MQTBench. Under the authors' present instruction-cost model, syn@fac yields an average 9× reduction in estimated circuit failure probability across non-Clifford benchmarks, with the benefit persisting across sweeps of instruction-cost ratios, LPU counts, and factory counts; the other optimizations cut compile time by 77% and circuit duration by 11.5% on average.

Significance. If the modeling assumptions hold, the work supplies concrete, actionable compiler techniques for mitigating communication bottlenecks in early modular FTQC and demonstrates their impact across a broad benchmark suite with parameter sweeps. The extension beyond prior limited evaluations and the explicit robustness checks are strengths that could guide hardware-aware compilation research.

major comments (2)
  1. [Abstract and evaluation sections describing the instruction-cost model and failure-probability estimation] The headline quantitative claim (factor-of-9 failure-probability reduction under the present instruction cost) is load-bearing for the paper's central thesis yet rests on an instruction-cost and latency model whose parameters are chosen internally and swept rather than derived from first-principles hardware measurements or external calibration. Without an explicit derivation or sensitivity analysis tied to physical device data, the reported benefit remains conditional on modeling assumptions that may not match real early-modular hardware (e.g., different communication overheads or additional error sources).
  2. [Abstract and the sections presenting the quantitative results and failure-probability model] The failure-probability model itself is stated without error bars, explicit derivation steps, or discussion of benchmark-selection criteria and possible post-hoc exclusions. This makes it difficult to assess the statistical reliability of the 9× average and the cross-benchmark robustness claims.
minor comments (2)
  1. [Abstract] The abstract contains a minor grammatical issue: 'investigating compiler optimizations most close the gap' should read 'that most closely close the gap' or similar for clarity.
  2. [Abstract and methods] Notation for the 'present instruction cost' operating point and the exact definition of the failure-probability estimator should be introduced earlier and used consistently to aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the modeling assumptions and evaluation details in our manuscript. We address each major comment below and have revised the paper to improve clarity and transparency where possible.

read point-by-point responses
  1. Referee: The headline quantitative claim (factor-of-9 failure-probability reduction under the present instruction cost) is load-bearing for the paper's central thesis yet rests on an instruction-cost and latency model whose parameters are chosen internally and swept rather than derived from first-principles hardware measurements or external calibration. Without an explicit derivation or sensitivity analysis tied to physical device data, the reported benefit remains conditional on modeling assumptions that may not match real early-modular hardware (e.g., different communication overheads or additional error sources).

    Authors: We agree that the instruction-cost model uses parameterized values chosen to represent plausible overheads in modular bivariate bicycle code architectures, rather than direct first-principles measurements from physical hardware. Such early fault-tolerant modular systems are not yet experimentally available, so the model is necessarily based on literature-derived estimates for inter-module communication and factory latencies. The manuscript already includes extensive sensitivity sweeps over instruction-cost ratios, LPU counts, and factory counts (Sections 5.3–5.5), which demonstrate that the syn@fac benefit persists across wide ranges. In the revision we have added a new subsection (3.2) that explicitly derives the baseline parameter choices from prior modular FTQC proposals, discusses potential deviations due to unmodeled error sources, and states the conditional nature of the quantitative claims more prominently in the abstract and conclusion. revision: partial

  2. Referee: The failure-probability model itself is stated without error bars, explicit derivation steps, or discussion of benchmark-selection criteria and possible post-hoc exclusions. This makes it difficult to assess the statistical reliability of the 9× average and the cross-benchmark robustness claims.

    Authors: The failure-probability model is a standard multiplicative approximation (detailed in Section 4.1) in which circuit failure probability is estimated from the number and cost of non-Clifford operations; we have now moved the full step-by-step derivation to a new Appendix B. Because the estimator is deterministic given the input parameters, statistical error bars are not applicable; instead, we report the full per-benchmark distribution of improvements (Figure 5 and supplementary tables) so readers can evaluate variability directly. Benchmark selection criteria are stated in Section 5.1: all circuits from PennyLane and MQTBench compilable within our resource limits were included, with no post-hoc exclusions. We have added an explicit paragraph in Section 5 confirming these criteria and the absence of exclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are computed outputs from an explicit parameterized model.

full rationale

The paper reports simulation-based estimates of circuit failure probability under a stated instruction-cost and latency model for bivariate bicycle codes, with the factor-of-9 reduction obtained by applying the syn@fac optimization inside that model. The central claims are not forced by construction from the inputs, nor do they rely on self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations whose validity is presupposed. Parameter sweeps are performed over the same modeling assumptions, but this does not create circularity; the work is an assessment study whose outputs are independent of the inputs once the model is fixed. No quoted derivation step reduces to its own premises.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on a domain-specific cost model for inter-module communication and on the choice of bivariate bicycle codes as the error-correcting substrate; no new physical entities are postulated.

free parameters (2)
  • instruction cost ratio
    Used to obtain the factor-of-9 result; the paper states robustness under sweeps but the headline number is reported at one operating point.
  • LPU count and factory count
    Architectural parameters that are swept but whose specific values affect the reported circuit durations and failure probabilities.
axioms (2)
  • domain assumption Bivariate bicycle codes are the error-correcting substrate for the modular architecture under study.
    Invoked as the basis for identifying inter-module communication as the dominant bottleneck.
  • domain assumption Non-Clifford operations dominate inter-module communication cost.
    Stated as the identified bottleneck that the compiler optimizations target.

pith-pipeline@v0.9.0 · 5545 in / 1498 out tokens · 48618 ms · 2026-05-10T01:55:53.295366+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. INJEQT: Improved Magic-State Injection Protocol for Fault-Tolerant Quantum Extractor Architectures

    quant-ph 2026-04 unverdicted novelty 6.0

    INJEQT reduces synthillation error by up to 22x, wall-clock time by 13x, and space-time cost by 7.2x in extractor FTQC architectures via auxiliary Rz synthesis and pre-fetching.

Reference graph

Works this paper leans on

55 extracted references · 40 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    DenoteS ℓ as the cyclic shift matrix of dimensionℓ×ℓ, where each rowiof the matrix has a 1 at columni+ 1 moduloℓ

    BB Codes BB codes can be described using cyclic shift matrices. DenoteS ℓ as the cyclic shift matrix of dimensionℓ×ℓ, where each rowiof the matrix has a 1 at columni+ 1 moduloℓ. Definex=S ℓ andy=S m. Polynomials in xandyrepresent two-dimensional shifts, e.g.,x pyq = Sp ℓ Sq m. A BB code picks two polynomialsA=A 1 + A2 +A 3 andB=B 1 +B 2 +B 3 built from mo...

  2. [2]

    measurePand, if the outcome is 1, apply a Pauli correctionQthat anticommutes withP; if the outcome is 0, do nothing

    Detailed PBC-to-BB Lowering Gadgets This subsection gives the full gadget-level derivation behind the condensed lowering story in Sec. II C. We now explain the circuit-level transformation from Fig. 2(c) to Fig. 2(d) using Fig. 3. Circuit language.Fig. 3 uses four basic symbols: unitary gates, Clifford rotations, Pauli measurements, andprojections, as sho...

  3. [3]

    Detailed In-Module Measurement Synthesis This subsection details the inherited in-module synthe- sis machinery summarized in Sec. II D. For gross code, extractors with practical cost have been developed [1], and they expose a small generating set of Pauli measurementsM=⟨X 0, X6, Z0, Z6⟩. BB codes support fault-tolerant shift automorphism unitariesA, which...

  4. [4]

    II C, we introduced the modular PBC compilation flow used in this paper

    Clifford Insertion Recall that in Sec. II C, we introduced the modular PBC compilation flow used in this paper. To optimize the duration of a PBC, we first build a directed acyclic graph (DAG) by representing each operation as a node and its temporal dependencies as directed edges. This is shown in Fig. 2(d). Notice that we omit the measure- ments on pivo...

  5. [5]

    Here we propose an algorithm to select multiplewindowsto maximize the reduction for the segment

    Multi-Window Selection For a segment, if we use only oneCto conjugate all operations, the longer the segment is, the harder it be- comes to reduce the overall cost of the segment. Here we propose an algorithm to select multiplewindowsto maximize the reduction for the segment. An intuitive metaphor for this algorithm is buying and selling stocks with trans...

  6. [6]

    T. J. Yoder, E. Schoute, P. Rall, E. Pritchett, J. M. Gam- betta, A. W. Cross, M. Carroll, and M. E. Beverland, Tour de gross: A modular quantum computer based on bivariate bicycle codes (2025), arXiv:2506.03094 [quant- ph]

  7. [7]

    Optimizing Logical Mappings for Quantum Low-Density Parity Check Codes,

    S. Sethi, S. Khan, M. Poster, A. Anand, and J. M. Baker, Optimizing Logical Mappings for Quantum Low-Density Parity Check Codes (2026), arXiv:2603.17167 [quant-ph]

  8. [8]

    Bravyi, A

    S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, and T. J. Yoder, High-threshold and low- overhead fault-tolerant quantum memory, Nature627, 778 (2024)

  9. [9]

    Q. Xu, J. P. B. Ataides, C. A. Pattison, N. Raveendran, D. Bluvstein, J. Wurtz, B. Vasic, M. D. Lukin, L. Jiang, and H. Zhou, Constant-Overhead Fault-Tolerant Quan- tum Computation with Reconfigurable Atom Arrays (2023), arXiv:2308.08648 [quant-ph]

  10. [10]

    Z. Du, S. Kan, S. Stein, Z. Liang, A. Li, and Y. Mao, Hardware-aware Compilation for Chip-to-Chip Coupler-Connected Modular Quantum Systems (2025), arXiv:2505.09036 [quant-ph]

  11. [11]

    M. J. Jeng, N. V. Maruszewski, C. Selna, M. Gavrincea, K. N. Smith, and N. Hardavellas, Modular Com- pilation for Quantum Chiplet Architectures (2025), arXiv:2501.08478 [quant-ph]

  12. [12]

    S. Sang, L. Hour, and Y. Han, Toward Scalable Quan- tum Compilation for Modular Architecture: Qubit Map- ping and Reuse via Deep Reinforcement Learning (2025), arXiv:2506.09323 [quant-ph]

  13. [13]

    Cross, Z

    A. Cross, Z. He, P. Rall, and T. Yoder, Improved QLDPC Surgery: Logical Measurements and Bridging Codes (2024), arXiv:2407.18393 [quant-ph]

  14. [14]

    D. J. Williamson and T. J. Yoder, Low-overhead fault- tolerant quantum computation by gauging logical opera- tors (2024)

  15. [15]

    Universal adapters between quantum LDPC codes,

    E. Swaroop, T. Jochym-O’Connor, and T. J. Yoder, Uni- versal adapters between quantum LDPC codes (2024), arXiv:2410.03628 [quant-ph]

  16. [16]

    How to factor 2048 bit RSA integers with less than a million noisy qubits

    C. Gidney, How to factor 2048 bit RSA integers with less than a million noisy qubits (2025), arXiv:2505.15917 [quant-ph]

  17. [17]

    Gidney, N

    C. Gidney, N. Shutty, and C. Jones, Magic state cultiva- tion: growing T states as cheap as CNOT gates (2024)

  18. [18]

    E. T. Campbell and J. O’Gorman, An efficient magic state approach to small angle rotations, Quantum Sci- ence and Technology1, 015007 (2016), arXiv:1603.04230 [quant-ph]

  19. [19]

    PennyLane: Automatic differentiation of hybrid quantum-classical computations

    V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Linaje, B. Akash- Narayanan, A. Asadi, J. M. Arrazola, U. Azad, S. Ban- ning, C. Blank, T. R. Bromley, B. A. Cordier, J. Ceroni, A. Delgado, O. D. Matteo, A. Dusko, T. Garg, D. Guala, 15 A. Hayes, R. Hill, A. Ijaz, T. Isacsson, D. Ittah, S. Ja- hangiri, P. Jain, E. Jia...

  20. [20]

    Quetschlich, L

    N. Quetschlich, L. Burgholzer, and R. Wille, MQT Bench: Benchmarking Software and Design Automation Tools for Quantum Computing, Quantum7, 1062 (2023), arXiv:2204.13719 [quant-ph]

  21. [21]

    Krishna and D

    A. Krishna and D. Poulin, Fault-tolerant gates on hy- pergraph product codes, Physical Review X11, 011023 (2021), arXiv:1909.07424 [quant-ph]

  22. [22]

    Panteleev and G

    P. Panteleev and G. Kalachev, Degenerate Quantum LDPC Codes With Good Finite Length Performance, Quantum5, 585 (2021), arXiv:1904.02703 [quant-ph]

  23. [23]

    Panteleev and G

    P. Panteleev and G. Kalachev, Quantum LDPC Codes with Almost Linear Minimum Distance, IEEE Transactions on Information Theory68, 213 (2022), arXiv:2012.04068 [quant-ph]

  24. [24]

    Quantum LDPC codes with positive rate and minimum distance proportional to n^{1/2}

    J.-P. Tillich and G. Zemor, Quantum LDPC codes with positive rate and minimum distance proportional to nˆ{1/2}, IEEE Transactions on Information Theory60, 1193 (2014), arXiv:0903.0566 [quant-ph]

  25. [25]

    A. A. Kovalev and L. P. Pryadko, Quantum Kronecker sum-product low-density parity-check codes with finite rate, Physical Review A88, 012311 (2013)

  26. [26]

    A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

    D. Litinski, A Game of Surface Codes: Large-Scale Quan- tum Computing with Lattice Surgery, Quantum3, 128 (2019), arXiv:1808.02892 [cond-mat, physics:quant-ph]

  27. [27]

    T. H. Haug, T. Hillmann, A. F. Kockum, and R. V. Laer, Lattice surgery with Bell measurements: Modular fault- tolerant quantum computation at low entanglement cost (2025), arXiv:2510.13541 [quant-ph]

  28. [28]

    J. Kim, D. Min, J. Cho, H. Jeong, I. Byun, J. Choi, J. Hong, and J. Kim, A Fault-Tolerant Million Qubit- Scale Distributed Quantum Computer, inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Sys- tems, Volume 2(ACM, La Jolla CA USA, 2024) pp. 1–19

  29. [29]

    S. F. Lin, J. Viszlai, K. N. Smith, G. S. Ravi, C. Yuan, F. T. Chong, and B. J. Brown, Codesign of quantum error-correcting codes and modular chiplets in the pres- ence of defects, inProceedings of the 29th ACM Inter- national Conference on Architectural Support for Pro- gramming Languages and Operating Systems, Volume 2 (ACM, La Jolla CA USA, 2024) pp. 216–231

  30. [30]

    Modu- lar architectures and entanglement schemes for error-corrected distributed quantum computation,

    S. Singh, F. Gu, S. de Bone, E. Villase˜ nor, D. Elkouss, and J. Borregaard, Modular Architectures and Entangle- ment Schemes for Error-Corrected Distributed Quantum Computation (2024), arXiv:2408.02837 [quant-ph]

  31. [31]

    X. Wu, Y. J. Joshi, H. Yan, G. Andersson, A. Anferov, C. R. Conner, B. Karimi, A. M. King, S. Li, H. L. Malc, J. M. Miller, H. Mishra, H. Qiao, M. Ryu, S. Xing, J. Shi, and A. N. Cleland, Mitigating cosmic ray-like cor- related events with a modular quantum processor (2025), arXiv:2505.15919 [quant-ph]

  32. [32]

    Q. Xu, A. Seif, H. Yan, N. Mannucci, B. O. Sane, R. Van Meter, A. N. Cleland, and L. Jiang, Dis- tributed quantum error correction for chip-level catas- trophic errors, Physical Review Letters129, 240502 (2022), arXiv:2203.16488 [quant-ph]

  33. [33]

    L. S. Herzog, L. Berent, A. Kubica, and R. Wille, Lattice Surgery Compilation Beyond the Surface Code (2025), arXiv:2504.10591 [quant-ph]

  34. [34]

    Hirano and K

    Y. Hirano and K. Fujii, Locality-aware Pauli-based computation for local magic state preparation (2025), arXiv:2504.12091 [quant-ph]

  35. [35]

    S. Kan, Z. Du, C. Liu, M. Wang, Y. Ding, A. Li, Y. Mao, and S. Stein, SPARO: Surface-code Pauli-based Architec- tural Resource Optimization for Fault-tolerant Quantum Computing (2025), arXiv:2504.21854 [quant-ph]

  36. [36]

    Kobori, Y

    T. Kobori, Y. Suzuki, Y. Ueno, T. Tanimoto, S. Todo, and Y. Tokunaga, LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault- Tolerant Quantum Computing (2024), arXiv:2412.20486 [quant-ph]

  37. [37]

    M. Wang, C. Liu, S. Stein, Y. Ding, P. Das, P. J. Nair, and A. Li, Optimizing FTQC Programs through QEC Transpiler and Architecture Codesign (2024), arXiv:2412.15434 [quant-ph]

  38. [38]

    M. Wang, C. Liu, S. Garner, S. Stein, Y. Ding, P. J. Nair, and A. Li, Tableau-Based Framework for Efficient Logical Quantum Compilation (2025), arXiv:2509.02721 [quant-ph]

  39. [39]

    Mengoni, W

    R. Mengoni, W. Nadalin, M. Rennela, J. Rotureau, T. Darras, J. Laurat, E. Diamanti, and I. Lavdas, Effi- cient Gate Reordering for Distributed Quantum Compil- ing in Data Centers (2025), arXiv:2507.01090 [quant-ph]

  40. [40]

    A. Wu, H. Zhang, G. Li, A. Shabani, Y. Xie, and Y. Ding, AutoComm: A Framework for Enabling Efficient Com- munication in Distributed Quantum Programs (2022), arXiv:2207.11674 [quant-ph]

  41. [41]

    A. Wu, Y. Ding, and A. Li, CollComm: Enabling Effi- cient Collective Quantum Communication Based on EPR buffering (2022), arXiv:2208.06724 [quant-ph]

  42. [42]

    A. Wu, Y. Ding, and A. Li, QuComm: Optimizing Col- lective Communication for Distributed Quantum Com- puting, inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’23 (Association for Computing Machinery, New York, NY, USA, 2023) pp. 479–493

  43. [43]

    N. J. Ross and P. Selinger, Optimal ancilla-free Clifford+T approximation of z-rotations (2016), arXiv:1403.2975 [quant-ph]

  44. [44]

    Selinger and N

    P. Selinger and N. J. Ross, pygridsynth: Python ver- sion gridsynth program computes approximations of Z- rotations over the Clifford+T gate set (2018)

  45. [45]

    Rengaswamy, R

    N. Rengaswamy, R. Calderbank, S. Kadhe, and H. D. Pfister, Synthesis of Logical Clifford Operators via Sym- plectic Geometry, in2018 IEEE International Sympo- sium on Information Theory (ISIT)(IEEE Press, Vail, CO, USA, 2018) pp. 791–795, arXiv:1803.06987 [cs]

  46. [46]

    Z. Chen, J. O. Weinberg, and N. Rengaswamy, Fault Tol- erant Quantum Simulation via Symplectic Transvections (2025), arXiv:2504.11444 [quant-ph]

  47. [47]

    Ruh and S

    J. Ruh and S. Devitt, Quantum Circuit Optimisation and MBQC Scheduling with a Pauli Tracking Library (2024), arXiv:2405.03970 [quant-ph]

  48. [48]

    Dehaene and B

    J. Dehaene and B. D. Moor, The Clifford group, stabilizer states, and linear and quadratic operations over GF(2), 16 Physical Review A68, 042318 (2003), arXiv:quant- ph/0304125

  49. [49]

    Koenig and J

    R. Koenig and J. A. Smolin, How to efficiently select an arbitrary Clifford group element, Journal of Mathemat- ical Physics55, 122202 (2014), arXiv:1406.2170 [quant- ph]

  50. [50]

    Architectures for heterogeneous quantum error correction codes,

    S. Stein, S. Xu, A. W. Cross, T. J. Yoder, A. Javadi- Abhari, C. Liu, K. Liu, Z. Zhou, C. Guinn, Y. Ding, Y. Ding, and A. Li, Architectures for Het- erogeneous Quantum Error Correction Codes (2024), arXiv:2411.03202 [quant-ph]

  51. [51]

    Sahay, P.-K

    K. Sahay, P.-K. Tsai, K. Chang, Q. Su, T. B. Smith, S. Singh, and S. Puri, Fold-transversal surface code cul- tivation (2025), arXiv:2509.05212 [quant-ph]

  52. [52]

    Bravyi and A

    S. Bravyi and A. Kitaev, Fermionic quantum computa- tion, Annals of Physics298, 210 (2002), arXiv:quant- ph/0003137

  53. [53]

    Suzuki, General theory of fractal path integrals with applications to many-body theories and statistical physics, Journal of Mathematical Physics32, 400 (1991)

    M. Suzuki, General theory of fractal path integrals with applications to many-body theories and statistical physics, Journal of Mathematical Physics32, 400 (1991)

  54. [54]

    F. C. R. Peres and E. F. Galv˜ ao, Quantum circuit com- pilation and hybrid computation using Pauli-based com- putation, Quantum7, 1126 (2023)

  55. [55]

    Chamberland and E

    C. Chamberland and E. T. Campbell, A circuit-level pro- tocol and analysis for twist-based lattice surgery, Physi- cal Review Research4, 023090 (2022), arXiv:2201.05678 [quant-ph]