arxiv: 2604.11599 · v1 · submitted 2026-04-13 · 🪐 quant-ph · cs.ET· cs.PF

Recognition: 2 theorem links

· Lean Theorem

Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages

Vinooth Kulkarni , Jaehyun Lee , Adam Hutchings , Anas Albahri , Jai Nana , Shuai Xu , Vipin Chaudhary

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:45 UTC · model grok-4.3

classification 🪐 quant-ph cs.ETcs.PF

keywords OpenQASM 3.0dynamic quantum circuitstranspilationCUDA-Qmid-circuit measurementclassical feedforwardNISQ algorithmscontrol flow

0 comments

The pith

A transpilation pipeline converts OpenQASM 3.0 dynamic circuits into optimized CUDA-Q C++ kernels by mapping control structures directly to host-language flow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a transpilation method that turns OpenQASM 3.0 programs containing mid-circuit measurements, conditionals, and bounded loops into CUDA-Q kernels. Dynamic circuits matter for near-term tasks like error mitigation and adaptive phase estimation because they allow classical feedback during execution. The method avoids expanding branches into static copies, which keeps circuits shorter and faster. It also improves readability by using C++ control flow instead of duplicated quantum operations. Tests on conditional resets, multi-bit predicates, and VQE-style circuits confirm these gains in depth, speed, and clarity.

Core claim

Mapping OpenQASM 3.0 conditionals, loops, and predicates to CUDA-Q host-language control flow together with native mid-circuit measurement produces kernels that reduce circuit depth by avoiding branch duplication, improve execution efficiency through low-latency classical feedback, and increase code readability through direct structural correspondence.

What carries the argument

The transpilation pipeline that directly translates OpenQASM 3.0 classical control structures (conditionals, bounded loops, multi-bit predicates) to C++ control flow inside CUDA-Q kernels while preserving mid-circuit measurement semantics without static expansion.

If this is right

Circuit depth drops because branches are not duplicated into separate static paths.
Execution speeds up from low-latency classical feedback without framework round-trips.
Code becomes more readable by keeping OpenQASM 3.0 control structures as native C++ constructs.
The approach supports portable OpenQASM 3.0 specifications while delivering CUDA-Q performance for NISQ dynamic algorithms.
Parameterized circuits with runtime optimization become directly executable without additional unrolling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same direct-mapping idea could reduce overhead when moving dynamic patterns between other quantum frameworks that currently force full unrolling.
Hybrid quantum-classical loops in variational algorithms might run with fewer synchronization points once control flow lives in the host language.
Extending the pipeline to additional feedforward patterns beyond the tested set would be a direct next step for broader NISQ use.

Load-bearing premise

The semantic mapping from OpenQASM 3.0 control structures to CUDA-Q host-language control flow preserves exact circuit behavior and performance for every possible dynamic pattern.

What would settle it

A concrete test case in which the transpiled CUDA-Q kernel yields a different final measurement distribution or uses more gates than the original OpenQASM 3.0 circuit for a multi-bit predicate or sequential feedforward pattern.

Figures

Figures reproduced from arXiv: 2604.11599 by Adam Hutchings, Anas Albahri, Jaehyun Lee, Jai Nana, Shuai Xu, Vinooth Kulkarni, Vipin Chaudhary.

**Figure 1.** Figure 1: Architecture of the Proposed Transpilation Framework. (a) User OpenQASM 3.0 programs are parsed by the pyqasm frontend, transpiled into QIR-compliant CUDA-Q kernels, and executed by CUDA-Q. (b) QASM is lowered from AST to MLIR (classical-control + Quake dialects), then translated to LLVM IR/QIR and emitted as executable kernels. (c) CUDA-Q executes on qpp (CPU state-vector), cuQuantum (GPU state-vector), a… view at source ↗

**Figure 2.** Figure 2: Schematic of the Conditional Reset Protocol. A qubit is prepared in |+⟩ using a Hadamard gate, measured mid-circuit, and classically controlled: if the outcome is ‘1’, a Pauli-X gate resets the state to |0⟩. control flow, while Quake/C++ rely on a statically typed compilation pipeline. Our work addresses this translation gap by mapping OpenQASM 3.0 constructs—including classical logic—into executable CUDA-… view at source ↗

read the original abstract

Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eigensolvers (VQE), yet transpiling these programs across frameworks remains challenging due to inconsistent support for control flow and measurement semantics. We present a transpilation pipeline that converts OpenQASM 3.0 programs with classical control structures (conditionals and bounded loops) into optimized CUDA-Q C++ kernels, leveraging CUDA-Q's native mid-circuit measurement and host-language control flow to translate dynamic patterns without static circuit expansion. Our open-source framework is validated on comprehensive test suites derived from IBM Quantum's classical feedforward guide, including conditional reset, if-else branching, multi-bit predicates, and sequential feedforward, and on VQE-style parameterized circuits with runtime parameter optimization. Experiments show that the resulting CUDA-Q kernels reduce circuit depth by avoiding branch duplication, improve execution efficiency via low-latency classical feedback, and enhance code readability by directly mapping OpenQASM 3.0 control structures to C++ control flow, thereby bridging OpenQASM 3.0's portable circuit specification with CUDA-Q's performance-oriented execution model for NISQ-era applications requiring dynamic circuit capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a working transpiler that maps OpenQASM 3.0 dynamic circuits straight into CUDA-Q C++ kernels without static branch expansion, but the performance and correctness claims rest on high-level description rather than shown data.

read the letter

The main thing to know is that the authors built a transpiler taking OpenQASM 3.0 programs that use mid-circuit measurements, if-else, bounded loops, and multi-bit predicates and turning them into CUDA-Q kernels that use native host-language control flow. This avoids duplicating circuit branches for every possible outcome, which should keep depth lower on NISQ hardware that supports feedforward. They also claim better execution speed from low-latency classical feedback and easier-to-read code because the structure stays close to the original OpenQASM. The code is open source and they ran it on IBM-derived test cases plus some VQE-style circuits with runtime parameter updates. That is the concrete contribution: a practical bridge between two frameworks that handle dynamic circuits differently. It is useful engineering work for anyone who needs to move code across these tools without losing the dynamic parts. The soft spot is the missing evidence. The abstract states that the kernels reduce depth and improve efficiency, yet it gives no numbers, no comparison baselines, and no error analysis. Without those, it is difficult to judge whether the gains are real or just expected from the approach. The stress-test point about semantic preservation is reasonable to raise; if nested or data-dependent branches were not exhaustively checked for collapse statistics or added host-device latency, the correctness claim stays unproven in the summary. A reader working on quantum software stacks or NISQ algorithms that rely on classical feedback would find this paper worth reading for the tool itself. It is not a broad methodological advance, but it solves a narrow interoperability problem that comes up in practice. The work shows clear thinking about the mapping and honest engagement with existing test suites, so it deserves a serious referee. I would send it to peer review and ask for quantitative benchmarks plus a short section on how they verified equivalence on the more complex patterns.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a transpilation pipeline that converts OpenQASM 3.0 programs containing dynamic elements—mid-circuit measurements, conditionals, bounded loops, and multi-bit predicates—into CUDA-Q C++ kernels. It exploits CUDA-Q's native mid-circuit measurement support and host-language control flow to avoid static unrolling of branches. The pipeline is validated on IBM Quantum-derived test suites (conditional reset, if-else, multi-bit predicates, sequential feedforward) and VQE-style parameterized circuits, with claims of reduced circuit depth, improved execution efficiency from low-latency classical feedback, and enhanced code readability.

Significance. If the semantic mapping preserves exact quantum behavior and the performance gains are reproducible, the work would provide a practical bridge between the portable OpenQASM 3.0 standard and CUDA-Q's execution model, aiding NISQ applications that rely on dynamic circuits such as error mitigation and adaptive algorithms. The open-source framework is a clear strength that supports reproducibility.

major comments (3)

[Abstract] Abstract: The central claims that the resulting CUDA-Q kernels 'reduce circuit depth by avoiding branch duplication' and 'improve execution efficiency via low-latency classical feedback' are stated without any quantitative metrics, tables, figures, depth comparisons, timing data, or baseline transpiler results. This is load-bearing because the abstract's primary contribution rests on these performance and expressiveness advantages.
[Validation and Experiments] Validation section: No formal semantics, equivalence proof, or concrete verification procedure (e.g., matching measurement statistics or collapse probabilities) is provided for the mapping of OpenQASM 3.0 conditionals, bounded loops, and multi-bit predicates to CUDA-Q host control flow, especially for nested or data-dependent branches interleaved with mid-circuit measurements. This directly affects the correctness claim for all tested patterns.
[Experiments] Experiments: The manuscript reports no measurements of host-device synchronization costs, classical feedback latency, or overhead introduced by the transpilation for sequential feedforward patterns. Without these data, the asserted efficiency gains over static expansion cannot be evaluated.

minor comments (2)

[Abstract] Abstract: Consider including one or two key quantitative results (e.g., average depth reduction or latency improvement) to make the performance claims concrete for readers.
[Code availability] Code and reproducibility: Ensure the open-source repository link includes clear build instructions, test-suite reproduction steps, and example kernels so that the validation claims can be independently checked.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, with clear indications of planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that the resulting CUDA-Q kernels 'reduce circuit depth by avoiding branch duplication' and 'improve execution efficiency via low-latency classical feedback' are stated without any quantitative metrics, tables, figures, depth comparisons, timing data, or baseline transpiler results. This is load-bearing because the abstract's primary contribution rests on these performance and expressiveness advantages.

Authors: We agree that the abstract would be strengthened by referencing specific quantitative support for the claims. The Experiments section contains depth comparisons, execution results, and figures for the test suites. In the revised manuscript we will update the abstract to briefly cite key observed outcomes (depth reductions and efficiency indicators) and point to the relevant figures and tables. revision: yes
Referee: [Validation and Experiments] Validation section: No formal semantics, equivalence proof, or concrete verification procedure (e.g., matching measurement statistics or collapse probabilities) is provided for the mapping of OpenQASM 3.0 conditionals, bounded loops, and multi-bit predicates to CUDA-Q host control flow, especially for nested or data-dependent branches interleaved with mid-circuit measurements. This directly affects the correctness claim for all tested patterns.

Authors: The current validation consists of empirical execution of the IBM Quantum-derived test suites on simulators, comparing measurement statistics and output distributions between the original OpenQASM 3.0 circuits and the transpiled CUDA-Q kernels. We have expanded the Validation section to describe this concrete verification procedure in detail, including coverage of nested and data-dependent branches. A formal semantic equivalence proof is outside the scope of the present practical transpilation work. revision: partial
Referee: [Experiments] Experiments: The manuscript reports no measurements of host-device synchronization costs, classical feedback latency, or overhead introduced by the transpilation for sequential feedforward patterns. Without these data, the asserted efficiency gains over static expansion cannot be evaluated.

Authors: Our experiments focus on circuit-depth reduction achieved by direct control-flow translation rather than static unrolling. We will add a discussion of the architectural reasons for expected low-latency feedback in CUDA-Q and clarify that the efficiency claims are tied to the measured depth savings. Platform-specific synchronization and latency benchmarks are hardware-dependent and were not performed; we note this limitation explicitly in the revised text. revision: partial

standing simulated objections not resolved

Absence of a formal semantics or equivalence proof for the transpilation mapping of dynamic constructs.
Direct quantitative measurements of host-device synchronization costs and classical feedback latency for the sequential feedforward patterns.

Circularity Check

0 steps flagged

No circularity: implementation and benchmarking paper with no derivations

full rationale

The paper describes a transpilation pipeline from OpenQASM 3.0 to CUDA-Q kernels, validated experimentally on IBM feedforward suites and VQE-style circuits. No equations, parameters, or mathematical derivations are present in the abstract or described content. Performance claims (depth reduction, efficiency, readability) rest on direct implementation and measurement rather than any chain that reduces to fitted inputs or self-citations by construction. The contribution is an engineering mapping and benchmark, self-contained against external test suites without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the engineering assumption that OpenQASM 3.0 control-flow semantics can be faithfully and efficiently realized in CUDA-Q's host-language model. No numerical free parameters, physical constants, or new postulated entities are introduced.

axioms (1)

domain assumption OpenQASM 3.0 conditional and loop constructs have well-defined semantics that can be directly mapped to CUDA-Q C++ control flow without loss of observable behavior.
The pipeline's correctness depends on this equivalence holding for all supported patterns including multi-bit predicates and sequential feedforward.

pith-pipeline@v0.9.0 · 5550 in / 1306 out tokens · 41672 ms · 2026-05-10T15:45:33.784947+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

transpiler that parses OpenQASM 3.0 source code—including complex gate modifiers, classical feedback, and control flow—and generates optimized both C++ and Python-based CUDA-Q kernels
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mapping OpenQASM 3.0 IfStatement nodes directly to CUDA-Q’s c_if construct

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Quantum Computing in the NISQ era and beyond.Quantum, 2:79, August 2018

J. Preskill, “Quantum Computing in the NISQ era and beyond,” Quantum, vol. 2, p. 79, Aug. 2018. [Online]. Available: https: //doi.org/10.22331/q-2018-08-06-79

work page internal anchor Pith review doi:10.22331/q-2018-08-06-79 2018
[2]

Cross, A

A. Cross, A. Javadi-Abhari, T. Alexander, N. De Beaudrap, L. S. Bishop, S. Heidel, C. A. Ryan, P. Sivarajah, J. Smolin, J. M. Gambetta, and B. R. Johnson, “Openqasm 3: A broader and deeper quantum assembly language,”ACM Transactions on Quantum Computing, vol. 3, no. 3, Sep. 2022. [Online]. Available: https://doi.org/10.1145/3505636

work page doi:10.1145/3505636 2022
[3]

Transpiling openqasm 3.0 programs to cuda-q ker- nels,

A. C. Arulandu, “Transpiling openqasm 3.0 programs to cuda-q ker- nels,” https://arulandu.com/assets/pdf/cs252-qasm-cudaq-transpiler.pdf, Dec. 2024, accessed: 2026-04-13

2024
[4]

Cuda quantum: The platform for integrated quantum-classical computing,

J.-S. Kim, A. McCaskey, B. Heim, M. Modani, S. Stanwyck, and T. Costa, “Cuda quantum: The platform for integrated quantum-classical computing,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–4

2023
[5]

Open Quantum Assembly Language

A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, “Open quantum assembly language,” 2017. [Online]. Available: https://arxiv.org/abs/1707.03429

work page Pith review arXiv 2017
[6]

A. Geller. (2022) Introducing quantum intermediate representation (qir). [Online]. Available: https://quantum.microsoft.com/en-us/insights/ blogs/qir/introducing-quantum-intermediate-representation-qir

2022
[7]

LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation,

C. Lattner and V . Adve, “LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation,” inProceedings of the 2004 Inter- national Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, Mar 2004

2004
[8]

cuquantum sdk: A high-performance library for accelerating quantum science,

H. Bayraktar, A. Charara, D. Clark, S. Cohen, T. Costa, Y .-L. L. Fang, Y . Gao, J. Guan, J. Gunnels, A. Haidar, A. Hehn, M. Hohnerbach, M. Jones, T. Lubowe, D. Lyakh, S. Morino, P. Springer, S. Stanwyck, I. Terentyev, S. Varadhan, J. Wong, and T. Yamaguchi, “cuquantum sdk: A high-performance library for accelerating quantum science,”
[9]

Bayraktar, A

[Online]. Available: https://arxiv.org/abs/2308.01999

work page arXiv
[10]

Efficient long-range entanglement using dynamic circuits,

E. B ¨aumer, V . Tripathi, D. S. Wang, P. Rall, E. H. Chen, S. Majumder, A. Seif, and Z. K. Minev, “Efficient long-range entanglement using dynamic circuits,”PRX Quantum, vol. 5, no. 3, Aug. 2024. [Online]. Available: http://dx.doi.org/10.1103/PRXQuantum.5.030339

work page doi:10.1103/prxquantum.5.030339 2024
[11]

A variational eigenvalue solver on a photonic quantum processor,

A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, Jul. 2014. [Online]. Available: http://dx.doi.org/10.1038/ ncomms5213

2014
[12]

PyQASM: Python toolkit for OpenQASM program analysis and compilation

H. Gupta and R. J. Hill, “PyQASM: Python toolkit for OpenQASM program analysis and compilation.” https://github.com/qBraid/pyqasm, Mar. 2025, version 0.3.0, Apache-2.0 License

2025