Quantum Data Loading for Carleman Linearized Systems: Application to the Lattice-Boltzmann Equation

Abeynaya Gnanasekaran; Amit Surana; Daniel Gunlycke; Reuben Demirdjian; Thomas Hogancamp

arxiv: 2605.00302 · v4 · pith:JPXPUA5Snew · submitted 2026-05-01 · 🪐 quant-ph

Quantum Data Loading for Carleman Linearized Systems: Application to the Lattice-Boltzmann Equation

Reuben Demirdjian , Thomas Hogancamp , Abeynaya Gnanasekaran , Amit Surana , Daniel Gunlycke This is my paper

Pith reviewed 2026-05-21 00:27 UTC · model grok-4.3

classification 🪐 quant-ph

keywords quantum computingCarleman linearizationlattice Boltzmann equationlinear combination of unitariesquantum simulationfluid dynamicsnonlinear systemsmatrix decomposition

0 comments

The pith

A matrix decomposition turns arbitrary operators into linear combinations of unitaries whose term count stays independent of grid size for Carleman-linearized systems like the lattice Boltzmann equation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a strategy that breaks any square matrix into a linear combination of non-unitaries, then embeds each piece inside a unitary to produce a linear combination of unitaries with exactly the same number of terms. This construction supplies a general framework for autonomous dynamical systems that have been Carleman-linearized around a polynomial nonlinearity. When applied to the three-dimensional lattice Boltzmann equation, the resulting linear combination of unitaries contains a number of terms that grows only with the square of the truncation order and the square of the number of discrete velocities, remaining constant even as the number of spatial or temporal grid points increases. A reader would care because removing dependence on discretization points removes one major barrier to using quantum computers for high-resolution simulations of fluids or similar nonlinear processes.

Core claim

The authors introduce a decomposition that converts an arbitrary square matrix into a linear combination of non-unitaries and then embeds each non-unitary term inside a unitary, yielding a linear combination of unitaries with an equal number of terms. They show that this decomposition produces a generalized linear combination of unitaries framework for any Carleman-linearized autonomous dynamical system possessing a polynomial nonlinearity. For the three-dimensional Carleman-linearized lattice Boltzmann equation the number of terms scales as O(alpha squared Q squared), where alpha is the truncation order and Q is the number of discrete velocities, and this count is completely independent of

What carries the argument

The decomposition of an arbitrary square matrix into a linear combination of non-unitaries that each embed into unitaries to form a linear combination of unitaries with the same term count.

If this is right

The linear combination of unitaries for the three-dimensional Carleman-linearized lattice Boltzmann equation has a term count that scales as O(alpha squared Q squared).
The term count remains independent of both the number of temporal and spatial discretization points.
Combined with preparation and selection oracles the T-gate cost scales as O(alpha cubed Q squared times the square of the log of the number of grid points).
When paired with the variational quantum linear solver the approach requires a number of circuits per iteration equal to the square of the term count times a logarithmic factor involving time steps and the Carleman order.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition approach could be applied to other nonlinear partial differential equations that admit Carleman linearization, such as reaction-diffusion or Navier-Stokes systems.
If the term count truly stays independent of grid size, quantum resource estimates for fluid simulations would improve markedly when moving to finer meshes.
Small-scale numerical tests of the decomposition on toy matrices drawn from the lattice Boltzmann system could directly verify the claimed independence from discretization.

Load-bearing premise

An arbitrary square matrix can be decomposed into a linear combination of non-unitaries with a term count that does not grow with the number of spatial grid points or time steps.

What would settle it

Apply the decomposition to the system matrix for two different spatial grid resolutions and check whether the number of terms in the resulting linear combination of unitaries remains unchanged.

Figures

Figures reproduced from arXiv: 2605.00302 by Abeynaya Gnanasekaran, Amit Surana, Daniel Gunlycke, Reuben Demirdjian, Thomas Hogancamp.

**Figure 4.1.** Figure 4.1: Embeddings for the two nontrivial L (e) 1 terms of (62): the emebedded L1,2 term (left) and the embedded L1,3 term (right). The |a⟩ wire is a single ancillary qubit required to embed each term into a unitary operation. A control operation or a single qubit gate on a multi-qubit register should be interpreted as the respective operation being applied to every wire within that register. The vertical dashed… view at source ↗

**Figure 4.2.** Figure 4.2: Embedding for L lin,1 λ as defined in (65b) using λ = (2, 1, 0, x, +1, m) for any m ∈ {1, . . . , NEη }, which is the most expensive circuit among any λ ∈ Λ1. The σf(η,m) block is the mth term in the Pauli decomposition of x-component of (52), and is therefore a tensor product of log Q Pauli gates. The S n +1 block is the incrementer circuit from (24). Then, by combining (25) with the nonlinear component… view at source ↗

**Figure 4.3.** Figure 4.3: Embedding for L lin,2 λ as defined in (68b) using λ = (2, 1, 0, m) for any m ∈ {1, . . . , NR}, which is the most expensive circuit among any λ ∈ Λ2. The σg(m) gate is the mth term in the Pauli decomposition used in (55), and is therefore a tensor product of log Q Pauli gates. The VR and WR circuits come from the SVD in (55) and depend on the specific lattice structure used. where ˜Ii is defined in Secti… view at source ↗

**Figure 4.4.** Figure 4.4: Embedding for L nlin λ as defined in (71b) using λ = (3, 2, α−2, α−3, q, m) for any m ∈ {1, . . . , NΓq } and q ∈ {1, . . . , Q}, which is the most expensive circuit among any λ ∈ Λ3. The circuits for P3, B3,q and the commutation matrix K(a,b) for integers a and b are provided in Appendices E.1, E.4 and E.5, respectively. The σh(q,m) gate is the mth term in the Pauli decomposition used in (59) and is the… view at source ↗

**Figure 5.1.** Figure 5.1: T gate count to encode the Carleman linearized LBE matrix using the PREP and SELECT view at source ↗

**Figure 5.2.** Figure 5.2: The number of circuits (NoC) and maximum T cost per circuit to encode the Carleman linearized view at source ↗

read the original abstract

Nonlinear ordinary and partial differential equations are ubiquitous in science and engineering, yet finding their solutions is often computationally intractable for classical hardware. To determine if quantum computers can offer a practical advantage, one critical challenge that must be solved is determining how to efficiently load exponentially sized matrices onto quantum hardware. In this article, we introduce an alternative linear combination of unitaries (LCU) strategy which relies on an intermediate linear combination of non-unitaries (LCNU) and a systematic embedding procedure. One advantage of this LCU strategy is that it maintains the exact number of terms as in the LCNU. Therefore, this approach offers a data loading framework for matrices that lack an efficient decomposition using the standard LCU alone. Using this approach, we construct a generalized LCNU framework for any Carleman linearized autonomous dynamical system having a polynomial nonlinearity. To demonstrate the effectiveness of our approach, we construct an LCNU for the 3D Carleman linearized lattice Boltzmann equation (LBE). Here, we find that the number of terms in the decomposition scales like $N_s\sim\mathcal{O}(\alpha^2Q^2)$, where $\alpha$ is the Carleman truncation order and $Q$ is the number of discrete velocities. Importantly, $N_s$ is independent of the number of spatial and temporal discretization points. We then perform a resource estimation of our LCNU's T gate cost when combined with the (1) PREP and SELECT block encoding oracles, and (2) variational quantum linear solver. In the former, the T cost scales like $\mathcal{O}(\alpha^3Q^2(\log_2n)^2)$, where $n$ is the total number of spatial grid points. The latter requires exactly $N_s^2(\log_2 (2n_tn^\alpha)+1)$ circuits per iteration for $n_t$ time steps, with a worst case T gate cost of $\mathcal{O}(\alpha (\log_2Qn)^2)$ among them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a matrix-to-LCNU decomposition that feeds into an LCU with matching term count, then applies it to Carleman-linearized 3D LBE to claim O(α² Q²) terms independent of grid size n.

read the letter

The main point is a claimed decomposition that turns any square matrix into a linear combination of non-unitaries, each embedded in a unitary, so the resulting LCU has exactly the same number of terms. They generalize this to Carleman-truncated polynomial autonomous systems and then specialize to the 3D lattice Boltzmann equation, where the term count stays O(α² Q²) with no dependence on spatial grid points or time steps. They also give T-gate cost estimates for both PREP/SELECT block encoding and the variational quantum linear solver. That grid independence would matter for anyone trying to push quantum fluid simulations to higher resolution without the usual blow-up in operator complexity. The concrete scaling and cost numbers are the parts that feel most usable right now. The soft spot is the general decomposition step. The abstract presents it as working for arbitrary matrices first, then says the LBE scaling follows. For a generic matrix whose dimension scales as n^α after Carleman expansion, a standard decomposition into non-unitaries would normally pick up factors that grow with dimension unless the construction uses special structure in the LBE streaming and collision operators. If the paper only demonstrates the cancellation inside the LBE case and does not show a dimension-independent general method, the headline independence claim rests more on the specific application than on the advertised general strategy. The stress-test concern lands until the explicit construction is checked. This is for people working on quantum algorithms for PDEs and fluid dynamics who already know LCU and Carleman methods. A reader looking for new scaling tricks in that area would get something concrete to test. It deserves peer review because the potential payoff on grid scaling is worth the time to verify the derivation, even if the proofs need more detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces a strategy to decompose an arbitrary square matrix into a linear combination of non-unitaries (LCNU), each embeddable in a unitary, yielding an LCU with equal term count. This is generalized to any Carleman-linearized autonomous dynamical system with polynomial nonlinearity and applied to construct an LCU for the 3D Carleman-linearized lattice Boltzmann equation (LBE), with term count Ns scaling as O(α² Q²) independent of spatial grid points n and time steps nt. T-gate cost estimates are given for PREP/SELECT block encodings and the variational quantum linear solver.

Significance. If the central decomposition holds with the claimed scaling, the work would provide a route to quantum simulation of fluid dynamics via LBE that avoids exponential growth in operator terms with discretization size, a potentially important step toward practical quantum advantage in computational fluid dynamics. The explicit T-cost scaling for both fault-tolerant and variational settings adds concrete utility if the underlying LCNU construction is rigorous.

major comments (2)

[Abstract] Abstract (paragraph beginning 'Herein, we introduce a strategy...'): The general LCNU decomposition for an arbitrary square matrix is presented as achieving term count equal to the final LCU without additional dimension-dependent factors. For the Carleman-linearized LBE the operator dimension scales as n^α; a generic decomposition into non-unitaries would normally produce term count growing with this dimension unless the construction explicitly cancels the n dependence via the tensor-product structure of the LBE streaming and collision operators. The manuscript must supply the explicit construction and proof that this cancellation occurs for arbitrary matrices before the independence claim can be accepted.
[LBE application section] Section on application to 3D LBE (following the generalized framework): The reported Ns ∼ O(α² Q²) independent of n and nt is load-bearing for the headline result. Without a concrete decomposition of the Carleman-expanded LBE operator that demonstrates how the general LCNU avoids introducing factors proportional to n^α or nt, the scaling cannot be verified and may rest on an unstated structural cancellation specific to LBE rather than the claimed generality.

minor comments (2)

[T-gate cost estimates] The T-gate cost expressions (O(α³ Q² (log₂ n)²) for PREP/SELECT and O(α (log₂ Q n)²) worst-case for VQLS) would benefit from an explicit breakdown of the PREP and SELECT oracle implementations and how the LCNU terms enter the circuit depth.
Notation for the Carleman truncation order α and discrete velocity count Q should be defined at first use with a brief reminder of their relation to the underlying LBE discretization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each of the major comments below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: The general LCNU decomposition for an arbitrary square matrix is presented as achieving term count equal to the final LCU without additional dimension-dependent factors. For the Carleman-linearized LBE the operator dimension scales as n^α; a generic decomposition into non-unitaries would normally produce term count growing with this dimension unless the construction explicitly cancels the n dependence via the tensor-product structure of the LBE streaming and collision operators. The manuscript must supply the explicit construction and proof that this cancellation occurs for arbitrary matrices before the independence claim can be accepted.

Authors: We agree that an explicit construction is essential to substantiate the scaling claim. Our LCNU approach begins with a decomposition of the original system operators, which for the LBE are highly structured (local collision and streaming terms with tensor-product form across grid points). The Carleman linearization preserves this structure in a block-wise manner, allowing the LCNU to be applied to the base operators and extended via Kronecker products without introducing additional factors of n or nt. The generalized framework in Section 2 details this for any polynomial system, and Section 3 applies it to LBE showing the O(α² Q²) scaling. To address the concern, we will add an appendix with the full proof of the term count independence, including the explicit LCNU terms for the LBE operator. revision: yes
Referee: The reported Ns ∼ O(α² Q²) independent of n and nt is load-bearing for the headline result. Without a concrete decomposition of the Carleman-expanded LBE operator that demonstrates how the general LCNU avoids introducing factors proportional to n^α or nt, the scaling cannot be verified and may rest on an unstated structural cancellation specific to LBE rather than the claimed generality.

Authors: The concrete decomposition is provided in the LBE application section, where we expand the Carleman-linearized operator and identify the distinct non-unitary components arising from the velocity discretizations and polynomial terms up to order α. Because the spatial grid enters only through identical local operators tensored across sites, the number of unique terms does not grow with n. Similarly, the time discretization does not affect the operator itself in the linear system formulation. We acknowledge that the presentation could be more explicit, and we will revise the section to include a table or diagram listing the LCNU terms and their embeddings, thereby verifying the claimed scaling directly from the construction rather than relying on the general framework alone. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation presents explicit algorithmic construction independent of target scaling.

full rationale

The paper introduces a decomposition strategy for arbitrary square matrices into LCNU (with equal-term LCU embedding) as a new contribution, then applies the resulting generalized LCU framework to Carleman-linearized polynomial systems and specifically to the 3D LBE operator. The claimed Ns ~ O(α² Q²) scaling independent of n and nt is stated as following directly from the operator structure under this framework. No equation or section reduces a prediction to a fitted parameter, self-citation chain, or definitional renaming; the central result is an explicit construction rather than a re-expression of inputs. This is the normal case of a self-contained algorithmic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard quantum-computing primitives (block encoding, LCU, variational linear solvers) and the mathematical validity of Carleman linearization; no new physical entities or fitted parameters are introduced in the abstract.

axioms (2)

domain assumption Carleman linearization of a polynomial nonlinearity yields a finite-dimensional linear system after truncation at order α
Invoked when the authors apply the framework to the LBE
ad hoc to paper Any square matrix admits a decomposition into a linear combination of non-unitaries that can each be embedded in a unitary with equal term count
The core technical step stated in the opening sentence of the abstract

pith-pipeline@v0.9.0 · 5859 in / 1477 out tokens · 35831 ms · 2026-05-21T00:27:15.395609+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct a generalized LCU framework for any Carleman linearized autonomous dynamical system with a polynomial nonlinearity... Ns ∼ O(α² Q²) ... completely independent of both the number of temporal and spatial discretization points.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Decomposition of the F1 Matrix ... Decomposition of the F2 and F3 Matrices ... Explicit Circuit Constructions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.