Optimizing Parallel Execution of Commuting Pauli Product Rotations

Devika Nambisan; Jonathan Mark Baker; Sayam Sethi

arxiv: 2605.23738 · v1 · pith:H5ZIQELInew · submitted 2026-05-22 · 🪐 quant-ph

Optimizing Parallel Execution of Commuting Pauli Product Rotations

Sayam Sethi , Devika Nambisan , Jonathan Mark Baker This is my paper

Pith reviewed 2026-05-25 04:04 UTC · model grok-4.3

classification 🪐 quant-ph

keywords fault-tolerant quantum computationPauli product rotationscircuit depth optimizationcommuting operatorssurface codeport constraintsheuristic schedulingQASMBench

0 comments

The pith

Two heuristics reduce hardware-limited depth of commuting Pauli product rotation circuits by 10-20% on average.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that per-qubit port limits on fault-tolerant hardware force many mutually commuting Pauli product rotations to be split across time steps, increasing circuit depth. Clique reshuffling permutes the rotations inside each commuting set so that new groups fit inside the port budget, while generator restructuring rewrites each group as an equivalent set that exerts less port pressure per qubit. When the two steps are combined on circuits compiled from QASMBench, the resulting schedules show 10-20% average depth savings and up to 50% in favorable cases. These savings grow as the port budget per qubit increases and level off near twenty ports. A reader would care because shallower depth directly shortens the time to solution on any hardware whose access points are the main bottleneck.

Core claim

By combining clique reshuffling, which re-forms port-constrained groups through permutation of commuting products, and generator restructuring, which produces equivalent generating sets with lower per-qubit port usage, the hardware-limited depth of Pauli product rotation circuits can be reduced by 10-20% on average and up to 50% on QASMBench benchmarks relative to a non-reordering baseline; the gains scale with port budget and saturate near twenty ports per qubit.

What carries the argument

clique reshuffling and generator restructuring applied to commuting Pauli product rotation groups to reduce the number of groups that must be serialized by port limits

If this is right

Average depth reduction of 10-20% with peaks at 50% on the tested benchmarks.
Observed gains increase as the per-qubit port budget grows and level off near 20 ports.
The heuristics remain useful even as hardware provides modestly more access points per qubit.
The method applies directly to any set of commuting Pauli product rotations extracted from a compiled circuit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reordering and rewriting steps could be tried on other quantum error-correcting codes that impose similar per-qubit access limits.
Embedding the heuristics inside existing compilers might produce further depth gains when combined with gate synthesis or layout choices.
Measuring the overhead of the rewriting step itself on very large circuits would show whether the reported savings persist beyond the current benchmark sizes.

Load-bearing premise

The two heuristics can be applied to arbitrary commuting groups extracted from compiled circuits without creating compensating overheads that erase the reported depth savings.

What would settle it

Take any QASMBench circuit compiled to Pauli product rotations, apply the two heuristics, schedule the resulting groups under the stated port constraints, and compare the final depth against the same circuit scheduled without the heuristics; if depth does not decrease, the central claim is false.

Figures

Figures reproduced from arXiv: 2605.23738 by Devika Nambisan, Jonathan Mark Baker, Sayam Sethi.

**Figure 1.** Figure 1: Overview of this work. (a) Pauli Product Measurements (PPMs) with disjoint supports, i.e., acting on disjoint qubits can [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Compiling an input Clifford + Rz/T circuit into [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Algorithm sensitivity to different circuit parameters Depth reduction percentage of different strategies relative to the baseline across parameter sweeps. The sweep independently varies the density (left), number of qubits (center), and input depth (right) of randomly generated PPMs. 1 3 5 9 24 Number of Passes 0 5 10 15 20 25 Depth Reduction (%) Depth Reduction Distribution vs Number of Passes Optimizatio… view at source ↗

**Figure 5.** Figure 5: Sensitivity to the number of reshuffling passes. (where k1 = 1) and Gi = [Pki , . . . , Pki+1−1] with PlPm = PmPl for each l, m ∈ Gi . Ideally, these groups are maximal so that k is minimal. When unconstrained by hardware, this should produce a circuit of minimal depth since we need to perform exactly k many steps of parallel measurements. These groups can be formed in a straight-forward greedy fashion. Be… view at source ↗

**Figure 6.** Figure 6: Depth reduction across benchmarks. Percentage reduction in circuit depth achieved by the three optimization methods against the baseline approach. D. Restructuring the Groups To further minimize the total number of groups in the hardware-restricted case, we need to minimize the frequency of each non-identity characters in the strings. To do so, we first recognize that the products in a mutually commuting g… view at source ↗

**Figure 7.** Figure 7: Optimization performance under varying hardware ports. Depth reduction (top row) and weight reduction (bottom row) versus the baseline as functions of the available hardware ports (Max X/Z Ports). Error bars denote the min/max range. replacement. This severely restricts the search space. However, choosing the optimal pairs of Paulis to multiply still requires searching over all possible orderings within th… view at source ↗

read the original abstract

Fault-Tolerant Quantum Computation (FTQC) permits parallel execution of mutually commuting Pauli Product Rotations (PPRs), but per-qubit access point/port limits (e.g. two X and two Z edges on the surface code) force commuting groups that exceed the budget to be split, inflating circuit depth. We propose two heuristics for reducing this hardware-limited depth: 1. clique reshuffling, which permutes commuting products and re-forms port-constrained groups, and 2. generator restructuring, which rewrites each group as an equivalent generating set with reduced per-qubit port pressure. On QASMBench circuits compiled to PPRs, we combine the two heuristics and observe an average hardware-limited depth reduction of $10-20\%$ over a non-reordering baseline, with up to $50\%$ reduction. These observed gains scale with the per-qubit port budget and saturate near $20$ ports, suggesting these heuristics remain relevant as hardware exposes more access points.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two heuristics cut hardware-limited PPR depth by 10-20% on QASMBench but the abstract leaves the Clifford overhead from restructuring unaddressed.

read the letter

The main takeaway is that clique reshuffling plus generator restructuring together reduce the number of sequential port-constrained PPR layers by 10-20% on average, with peaks near 50%, and that the gains are larger at tighter port budgets before saturating around 20 ports. The paper is new in spelling out this particular pair of heuristics for the port-limit setting in FTQC. It does well by evaluating on the external QASMBench suite rather than self-generated instances and by reporting how performance changes with port count, which is the kind of scaling data that matters for hardware planning. The central claim holds up at the level of the reported numbers. The soft spot is the generator-restructuring step. Rewriting a group to a lower-port generating set usually requires additional Clifford gates to adjust bases or conjugates, yet the abstract gives no sign that those gates are folded into the depth metric or shown to run in parallel without adding layers. If the overhead is comparable to the claimed savings, the net improvement shrinks or disappears. The results are stated at a high level with no pseudocode or implementation notes, so it is impossible to judge robustness from the abstract alone. This paper is for people who compile circuits for surface-code or similar hardware with limited per-qubit access points. A reader working on practical scheduling under realistic constraints would get usable numbers and scaling behavior from it. It deserves peer review so the overhead accounting and the concrete heuristics can be checked in detail.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes two heuristics—clique reshuffling, which permutes commuting Pauli products to re-form port-constrained groups, and generator restructuring, which rewrites each group as an equivalent generating set with reduced per-qubit port pressure—for reducing hardware-limited depth in fault-tolerant circuits of commuting Pauli Product Rotations (PPRs). On QASMBench circuits compiled to PPRs, the combined heuristics are reported to yield an average 10-20% reduction in the number of sequential port-constrained PPR groups (up to 50%), with gains increasing with per-qubit port budget and saturating near 20 ports.

Significance. If the depth reductions are net of any Clifford overhead and the heuristics apply without introducing compensating costs, the results would be relevant for compilation targeting hardware with limited per-qubit access points (e.g., surface-code edge ports). The evaluation on an external benchmark suite supports reproducibility, though the absence of pseudocode or tabulated per-circuit data in the provided description limits immediate verification of the scaling claims.

major comments (1)

Abstract: The hardware-limited depth metric is defined strictly as the number of sequential port-constrained PPR groups. Generator restructuring produces an equivalent generating set with lower port pressure, but realizing this set on the surface code requires additional Clifford gates to conjugate Paulis or change measurement bases. The manuscript provides no indication that these gates are folded into the depth count or shown to be schedulable without increasing the total number of layers; if their cost is comparable to the reported 10-20% savings, the net improvement is eliminated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We respond to the major comment below.

read point-by-point responses

Referee: Abstract: The hardware-limited depth metric is defined strictly as the number of sequential port-constrained PPR groups. Generator restructuring produces an equivalent generating set with lower port pressure, but realizing this set on the surface code requires additional Clifford gates to conjugate Paulis or change measurement bases. The manuscript provides no indication that these gates are folded into the depth count or shown to be schedulable without increasing the total number of layers; if their cost is comparable to the reported 10-20% savings, the net improvement is eliminated.

Authors: We agree that the manuscript defines hardware-limited depth strictly as the count of sequential port-constrained PPR groups and does not analyze or include any Clifford overhead arising from generator restructuring. The paper does not demonstrate that basis-change or conjugation operations can be scheduled without adding layers. We will revise the manuscript to explicitly address this point, either by showing that the required Cliffords can be absorbed into existing layers in the surface-code compilation pipeline or by quantifying their contribution to total depth. This revision will clarify whether the reported 10-20% reductions remain net positive. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation on external benchmarks is self-contained

full rationale

The paper introduces two heuristics (clique reshuffling and generator restructuring) for reducing hardware-limited depth of commuting PPR groups and reports measured reductions on QASMBench circuits compiled to PPRs. These results are obtained by applying the heuristics to an external benchmark suite and counting sequential port-constrained groups; the reported 10-20% average (up to 50%) reductions are direct experimental outcomes rather than quantities defined or fitted inside the paper. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the abstract or described experimental setup. The derivation chain consists of heuristic definitions followed by independent measurement on outside data, satisfying the criteria for a self-contained empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.0 · 5701 in / 1200 out tokens · 27813 ms · 2026-05-25T04:04:24.036038+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 7 internal anchors

[1]

Surface code compilation via edge-disjoint paths,

M. Beverland, V . Kliuchnikov, and E. Schoute, “Surface code compilation via edge-disjoint paths,”PRX Quantum, vol. 3, no. 2, p. 020342, May 2022, arXiv:2110.11493 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2110.11493

work page arXiv 2022
[2]

Parallel Logical Measurements via Quantum Code Surgery

A. Cowtan, Z. He, D. J. Williamson, and T. J. Yoder, “Parallel Logical Measurements via Quantum Code Surgery,” Jan. 2026, arXiv:2503.05003 [quant-ph]. [Online]. Available: http://arxiv.org/abs/ 2503.05003

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Restrictions on Transversal Encoded Quantum Gate Sets

B. Eastin and E. Knill, “Restrictions on Transversal Encoded Quantum Gate Sets,”Physical Review Letters, vol. 102, no. 11, p. 110502, Mar. 2009, arXiv:0811.4262 [quant-ph]. [Online]. Available: http://arxiv.org/abs/0811.4262

work page internal anchor Pith review Pith/arXiv arXiv 2009
[4]

Time-optimal quantum computation

A. G. Fowler, “Time-optimal quantum computation,” Feb. 2013, arXiv:1210.4626 [quant-ph]. [Online]. Available: http://arxiv.org/abs/ 1210.4626

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Surface codes: Towards practical large-scale quantum computation

A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, “Surface codes: Towards practical large-scale quantum computation,” Physical Review A - Atomic, Molecular , and Optical Physics, vol. 86, no. 3, Aug. 2012. [Online]. Available: http://arxiv.org/abs/1208.0928

work page internal anchor Pith review Pith/arXiv arXiv 2012
[6]

Surface code quantum computing by lattice surgery,

D. Horsman, A. G. Fowler, S. Devitt, and R. Van Meter, “Surface code quantum computing by lattice surgery,”New Journal of Physics, vol. 14, no. 12, p. 123011, Dec. 2012, arXiv:1111.4022 [quant-ph]. [Online]. Available: http://arxiv.org/abs/1111.4022

work page arXiv 2012
[7]

Fault-tolerant logical measurements via homological measurement,

B. Ide, M. G. Gowda, P. J. Nadkarni, and G. Dauphinais, “Fault-tolerant logical measurements via homological measurement,”Physical Review X, vol. 15, no. 2, p. 021088, Jun. 2025, arXiv:2410.02753 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2410.02753

work page arXiv 2025
[8]

Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory,

S. Khan, A. Anand, K. R. Brown, and J. M. Baker, “Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory,” Nov. 2025, arXiv:2511.15910 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2511.15910

work page arXiv 2025
[9]

QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation,

A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation,” May 2022, arXiv:2005.13018 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2005.13018

work page arXiv 2022
[10]

Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices

G. Li, Y . Ding, and Y . Xie, “Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices,” May 2019, arXiv:1809.02573 [cs]. [Online]. Available: http://arxiv.org/abs/1809.02573

work page internal anchor Pith review Pith/arXiv arXiv 2019
[11]

A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

D. Litinski, “A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery,”Quantum, vol. 3, Aug. 2018. [Online]. Available: http://arxiv.org/abs/1808.02892

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Universal adapters between quantum LDPC codes,

E. Swaroop, T. Jochym-O’Connor, and T. J. Yoder, “Universal adapters between quantum LDPC codes,” Oct. 2024. [Online]. Available: https://arxiv.org/abs/2410.03628v4

work page arXiv 2024
[13]

Matching Generalized-Bicycle Codes to Neutral Atoms for Low-Overhead Fault-Tolerance,

J. Viszlai, W. Yang, S. F. Lin, J. Liu, N. Nottingham, J. M. Baker, and F. T. Chong, “Matching Generalized-Bicycle Codes to Neutral Atoms for Low-Overhead Fault-Tolerance,” in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 01, Aug. 2025, pp. 688–699. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/11250335

work page arXiv 2025
[14]

Low-overhead fault-tolerant quantum computation by gauging logical operators

D. J. Williamson and T. J. Yoder, “Low-overhead fault-tolerant quantum computation by gauging logical operators,”Nature Physics, vol. 22, no. 4, pp. 598–603, Apr. 2026, arXiv:2410.02213 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2410.02213

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Harnessing the Power of Long-Range Entan- glement for Clifford Circuit Synthesis,

W. Yang and P. Rall, “Harnessing the Power of Long-Range Entan- glement for Clifford Circuit Synthesis,” Jun. 2023, arXiv:2302.06537 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2302.06537

work page arXiv 2023

[1] [1]

Surface code compilation via edge-disjoint paths,

M. Beverland, V . Kliuchnikov, and E. Schoute, “Surface code compilation via edge-disjoint paths,”PRX Quantum, vol. 3, no. 2, p. 020342, May 2022, arXiv:2110.11493 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2110.11493

work page arXiv 2022

[2] [2]

Parallel Logical Measurements via Quantum Code Surgery

A. Cowtan, Z. He, D. J. Williamson, and T. J. Yoder, “Parallel Logical Measurements via Quantum Code Surgery,” Jan. 2026, arXiv:2503.05003 [quant-ph]. [Online]. Available: http://arxiv.org/abs/ 2503.05003

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

Restrictions on Transversal Encoded Quantum Gate Sets

B. Eastin and E. Knill, “Restrictions on Transversal Encoded Quantum Gate Sets,”Physical Review Letters, vol. 102, no. 11, p. 110502, Mar. 2009, arXiv:0811.4262 [quant-ph]. [Online]. Available: http://arxiv.org/abs/0811.4262

work page internal anchor Pith review Pith/arXiv arXiv 2009

[4] [4]

Time-optimal quantum computation

A. G. Fowler, “Time-optimal quantum computation,” Feb. 2013, arXiv:1210.4626 [quant-ph]. [Online]. Available: http://arxiv.org/abs/ 1210.4626

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Surface codes: Towards practical large-scale quantum computation

A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, “Surface codes: Towards practical large-scale quantum computation,” Physical Review A - Atomic, Molecular , and Optical Physics, vol. 86, no. 3, Aug. 2012. [Online]. Available: http://arxiv.org/abs/1208.0928

work page internal anchor Pith review Pith/arXiv arXiv 2012

[6] [6]

Surface code quantum computing by lattice surgery,

D. Horsman, A. G. Fowler, S. Devitt, and R. Van Meter, “Surface code quantum computing by lattice surgery,”New Journal of Physics, vol. 14, no. 12, p. 123011, Dec. 2012, arXiv:1111.4022 [quant-ph]. [Online]. Available: http://arxiv.org/abs/1111.4022

work page arXiv 2012

[7] [7]

Fault-tolerant logical measurements via homological measurement,

B. Ide, M. G. Gowda, P. J. Nadkarni, and G. Dauphinais, “Fault-tolerant logical measurements via homological measurement,”Physical Review X, vol. 15, no. 2, p. 021088, Jun. 2025, arXiv:2410.02753 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2410.02753

work page arXiv 2025

[8] [8]

Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory,

S. Khan, A. Anand, K. R. Brown, and J. M. Baker, “Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory,” Nov. 2025, arXiv:2511.15910 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2511.15910

work page arXiv 2025

[9] [9]

QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation,

A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation,” May 2022, arXiv:2005.13018 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2005.13018

work page arXiv 2022

[10] [10]

Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices

G. Li, Y . Ding, and Y . Xie, “Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices,” May 2019, arXiv:1809.02573 [cs]. [Online]. Available: http://arxiv.org/abs/1809.02573

work page internal anchor Pith review Pith/arXiv arXiv 2019

[11] [11]

A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

D. Litinski, “A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery,”Quantum, vol. 3, Aug. 2018. [Online]. Available: http://arxiv.org/abs/1808.02892

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Universal adapters between quantum LDPC codes,

E. Swaroop, T. Jochym-O’Connor, and T. J. Yoder, “Universal adapters between quantum LDPC codes,” Oct. 2024. [Online]. Available: https://arxiv.org/abs/2410.03628v4

work page arXiv 2024

[13] [13]

Matching Generalized-Bicycle Codes to Neutral Atoms for Low-Overhead Fault-Tolerance,

J. Viszlai, W. Yang, S. F. Lin, J. Liu, N. Nottingham, J. M. Baker, and F. T. Chong, “Matching Generalized-Bicycle Codes to Neutral Atoms for Low-Overhead Fault-Tolerance,” in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 01, Aug. 2025, pp. 688–699. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/11250335

work page arXiv 2025

[14] [14]

Low-overhead fault-tolerant quantum computation by gauging logical operators

D. J. Williamson and T. J. Yoder, “Low-overhead fault-tolerant quantum computation by gauging logical operators,”Nature Physics, vol. 22, no. 4, pp. 598–603, Apr. 2026, arXiv:2410.02213 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2410.02213

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

Harnessing the Power of Long-Range Entan- glement for Clifford Circuit Synthesis,

W. Yang and P. Rall, “Harnessing the Power of Long-Range Entan- glement for Clifford Circuit Synthesis,” Jun. 2023, arXiv:2302.06537 [quant-ph]. [Online]. Available: http://arxiv.org/abs/2302.06537

work page arXiv 2023