arxiv: 2602.08606 · v3 · submitted 2026-02-09 · 🧮 math.OC · cs.LG· math.AP· math.PR

Recognition: no theorem link

Constructive conditional normalizing flows

Borjan Geshkovski , Dom\`enec Ruiz-Balet

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:45 UTC · model grok-4.3

classification 🧮 math.OC cs.LGmath.APmath.PR

keywords conditional normalizing flowscontinuity equationdiffeomorphismLagrange interpolantpolar decompositionshear flowsneural velocity fieldspushforward measure

0 comments

The pith

A polar-like decomposition of the Lagrange interpolant yields explicit neural flows that approximate any diffeomorphism and its pushforward measure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to build a velocity field for the continuity equation as a perceptron neural network with piecewise constant weights so that the resulting flow approximates both a given diffeomorphism φ and the pushed-forward measure φ#μ at the same time. The construction begins with the Lagrange interpolant of φ and splits it into a compressible part given by the gradient of a convex function, which is realized exactly, and an incompressible part that is approximated by permutations and then realized exactly by shear flows. For smoother maps such as the Knöthe-Rosenblatt rearrangement, a probabilistic alternative keeps the number of weight discontinuities from growing with dimension. The approach supplies a concrete recipe for conditional sampling rather than relying on general existence theorems for neural flows.

Core claim

Given a probability measure μ and a diffeomorphism φ, the flow of a continuity equation whose velocity field is a perceptron neural network with piecewise constant weights simultaneously approximates both φ and φ#μ. The explicit construction rests on a polar-like decomposition of the Lagrange interpolant of φ into a compressible component realized by the gradient of a convex function and an incompressible component realized through shear flows after permutation approximation.

What carries the argument

Polar-like decomposition of the Lagrange interpolant of φ into a compressible gradient-of-convex-function component realized exactly and an incompressible component implemented by shear flows after permutation approximation, together forming the neural velocity field.

Load-bearing premise

The incompressible component after permutation approximation can be realized exactly through shear flows of the continuity equation and the overall velocity field remains a perceptron neural network with piecewise constant weights.

What would settle it

A direct computation or low-dimensional simulation for an incompressible rotation showing that the constructed velocity field fails to preserve the measure or forces the number of weight discontinuities to increase with dimension.

Figures

Figures reproduced from arXiv: 2602.08606 by Borjan Geshkovski, Dom\`enec Ruiz-Balet.

**Figure 1.** Figure 1: Let 𝐺 = 𝑆3, where (1, 2, 3) is the identity. Take the symmetric generating set 𝑆 = {(2, 1, 3),(1, 3, 2)} (the transpositions (12) and (23) in cycle notation), so 𝑆 −1 = 𝑆. The Cayley graph Cay(𝐺, 𝑆) has vertex set 𝐺 and an (undirected) edge between 𝑔1, 𝑔2 iff 𝑔2 = 𝑠𝑔1 for some 𝑠 ∈ 𝑆. We color the edge blue if 𝑔2 = (1, 3, 2)𝑔1 and green if 𝑔2 = (2, 1, 3) 𝑔1. The resulting graph is a 2-regular connected grap… view at source ↗

**Figure 2.** Figure 2: Theorem 1.1 allows us to approximate, in particular, the optimal transport map ϕ between µ and a target absolutely continuous measure ν by an approximate map ϕ𝜀. However, even if the densities ϕ#µ and ϕ𝜀#µ are close in TV and the maps are close, the trajectory 𝑡 ↦→ µ 𝜀 (𝑡) given by (1.3) is not close to ((1 − 𝑡)id + 𝑡ϕ)#µ for all times 𝑡. The recent work [KPS25] also focuses on the approximate controllabil… view at source ↗

**Figure 3.** Figure 3: A triangulation and its image by a piecewise affine map ϕL. Proposition 2.3. Let Ω ⊂ R 𝑑 be a bounded orthotope. Let ϕL : Ω → ϕL(Ω) ⊂ R 𝑑 denote the Lagrange interpolation of a 𝐶 1 -diffeomorphism ϕ : Ω → ϕ(Ω) ⊂ R 𝑑 12 [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: The decomposition of ϕL per Proposition 2.3. Proof. Since ϕL is piecewise affine on Ω and Ω is bounded, we may write Ω = ⋃︁𝑛 𝑗=1 △𝑗 , where {△𝑗} 𝑛 𝑗=1 is a finite triangulation of Ω into non-overlapping 𝑑-simplices (we ignore boundaries, which are null sets). On each simplex △𝑗 , the map ϕL is affine: ϕL(𝑥) = 𝐴𝑗𝑥 + 𝑏𝑗 𝑥 ∈ △𝑗 , and since ϕL is a homeomorphism onto its image, each 𝐴𝑗 is invertible and |ϕL(△𝑗… view at source ↗

**Figure 5.** Figure 5: Lemma 2.8: the measure preserving map 𝑚 swaps the colored squares and leaves the whites invariant. Proof. It suffices to construct a divergence-free flow that swaps two adjacent hypercubes; arbitrary pairs can then be swapped via a finite composition of adjacent transpositions along a grid path. Since ℎ, δ are fixed, we write □𝑖 := □𝑖ℎδ. Without loss of generality we treat the case 𝑗 = 𝑖 + ℎ𝑒1. We will b… view at source ↗

**Figure 6.** Figure 6: An illustration of Claim 1. Proof of Claim 1. We employ Lemma 2.5: for a unit vector 𝑒ℓ , thresholds 𝑎 < 𝑏 with 𝑏 − 𝑎 ⩾ δ, and a vector τ𝑒𝑟 with 𝑟 ̸= ℓ, there exists a divergence-free flow whose time–1 map equals the translation 𝑥 ↦→ 𝑥 + τ𝑒𝑟 on {𝑥ℓ ⩾ 𝑏}, equals the identity on {𝑥ℓ ⩽ 𝑎}, and is measure-preserving (and smooth) in the strip 𝑎 < 𝑥ℓ < 𝑏. Define ϕ as the composition ϕ = (ϕ𝑑,2 ∘ ϕ𝑑,1) ∘ (ϕ𝑑−1,2 ∘… view at source ↗

**Figure 7.** Figure 7: An illustration of Claim 2. Proof of Claim 2. Set τ := ℎ. Using again Lemma 2.5, define ψ := φ4 ∘ φ3 ∘ φ2 ∘ φ1, where each φℓ is the time–1 map of a divergence-free vector field and is the identity on {𝑥2 ⩽ 𝑖2 − δ}. Concretely: φ1(𝑥) = {︃ 𝑥 + τ𝑒1, 𝑥2 ⩾ 𝑖2, 𝑥, 𝑥2 ⩽ 𝑖2 − δ, (2.4) φ2(𝑥) = {︃ 𝑥 + ℎ𝑒2, 𝑥2 ⩾ 𝑖2 and 𝑥1 ⩾ 𝑖1 + 2ℎ, 𝑥, 𝑥2 ⩽ 𝑖2 − δ or 𝑥1 ⩽ 𝑖1 + 2ℎ − δ, (2.5) φ3(𝑥) = {︃ 𝑥 − 2ℎ𝑒1, 𝑥2 ⩾ 𝑖2 + ℎ, 𝑥, 𝑥2 ⩽… view at source ↗

read the original abstract

Motivated by applications in conditional sampling, given a probability measure $\mu$ and a diffeomorphism $\phi$, we consider the problem of simultaneously approximating $\phi$ and the pushforward $\phi_{\#}\mu$ by means of the flow of a continuity equation whose velocity field is a perceptron neural network with piecewise constant weights. We provide an explicit construction based on a polar-like decomposition of the Lagrange interpolant of $\phi$. The latter involves a compressible component, given by the gradient of a particular convex function, which can be realized exactly, and an incompressible component, which -- after approximating via permutations -- can be implemented through shear flows intrinsic to the continuity equation. For more regular maps $\phi$ -- such as the Kn\"othe-Rosenblatt rearrangement -- we provide an alternative, probabilistic construction inspired by the Maurey empirical method, in which the number of discontinuities in the weights doesn't scale inversely with the ambient dimension.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Explicit constructions for conditional flows via polar-like decomposition and shear flows are the main new element, but missing proofs leave the exact realizability claims unverified.

read the letter

The main takeaway is that the paper supplies explicit constructions for approximating a diffeomorphism and its pushforward measure using the flow of a continuity equation whose velocity is a perceptron network with piecewise constant weights. It does this through a polar-like decomposition of the Lagrange interpolant of the map, splitting it into a compressible gradient-of-convex part realized exactly and an incompressible part handled by permutation approximation plus shear flows. For smoother maps such as the Knöthe-Rosenblatt rearrangement, it adds a probabilistic construction that keeps the number of weight discontinuities from scaling badly with dimension. These are framed as direct, non-trained alternatives at the overlap of optimal transport, PDEs, and neural flows. That framing and the dimension-robust alternative are the clearest additions relative to standard training-based normalizing flows. The work is clear about the problem setup and connects the pieces without circular definitions or fitted parameters. The soft spots sit in the verification. The abstract states the constructions but gives no proofs, error bounds, or step-by-step checks, so the claim that the approximated incompressible component stays exactly realizable by shear flows while preserving the perceptron structure with fixed piecewise constant weights rests on unshown details. The stress-test concern about the permutation step potentially breaking that structure looks plausible without further restrictions or derivations. If the full paper supplies those missing pieces, the gap shrinks; otherwise it remains the central limitation. This is aimed at readers working on constructive methods in generative modeling and measure transport rather than purely empirical training. A theorist interested in explicit neural ODEs or OT-based sampling would get usable ideas even if the details need tightening. I would send it to peer review so the proofs and any implementation checks can be examined properly.

Referee Report

3 major / 2 minor

Summary. The manuscript develops explicit constructions for simultaneously approximating a diffeomorphism φ and the pushforward measure φ#μ via the flow of a continuity equation whose velocity field is realized by a perceptron neural network with piecewise-constant weights. The primary construction decomposes the Lagrange interpolant of φ in a polar-like manner into a compressible part (gradient of a convex function, realized exactly) and an incompressible part (approximated by permutations and realized via shear flows intrinsic to the continuity equation). An alternative probabilistic construction, inspired by the Maurey empirical method, is given for regular maps such as the Knöthe-Rosenblatt rearrangement, with the property that the number of weight discontinuities does not scale inversely with ambient dimension.

Significance. If the constructions are shown to preserve the required neural-network structure and deliver the claimed exact realizability, the results would supply a concrete, non-variational route to conditional normalizing flows with controlled approximation properties. The explicit decomposition and the dimension-independent discontinuity scaling constitute clear technical strengths for applications in high-dimensional sampling and optimal transport.

major comments (3)

[explicit construction / polar-like decomposition] Abstract and the section presenting the explicit construction: the claim that the incompressible component, after permutation approximation, 'can be implemented through shear flows' while the overall velocity field remains a perceptron neural network with piecewise-constant weights is asserted without a supporting argument or verification. Shear flows typically introduce time-dependent or spatially varying structures; it is not shown that these can be absorbed into the fixed piecewise-constant-weight perceptron form without additional restrictions on the permutation step.
[alternative probabilistic construction] Abstract and the section on the alternative construction: the statement that 'the number of discontinuities in the weights doesn't scale inversely with the ambient dimension' is given without an explicit bound, theorem, or scaling analysis. A precise estimate relating the number of discontinuities to dimension and approximation tolerance is required to substantiate the claimed advantage over the primary construction.
[main constructions] Throughout the constructions: no error bounds, convergence rates, or verification steps are supplied for the simultaneous approximation of φ and φ#μ. The central claim of exact realizability of the components therefore rests on unshown details that are load-bearing for any quantitative guarantee.

minor comments (2)

[introduction / construction setup] The interpolation points and polynomial degree used for the Lagrange interpolant of φ are not specified in the abstract or early sections; this should be stated explicitly when the decomposition is introduced.
[preliminaries] Notation for the perceptron network (activation functions, layer widths, and the precise meaning of 'piecewise constant weights') should be fixed once at the beginning to avoid ambiguity when the shear-flow realization is described.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate planned revisions to strengthen the arguments and add missing details.

read point-by-point responses

Referee: [explicit construction / polar-like decomposition] Abstract and the section presenting the explicit construction: the claim that the incompressible component, after permutation approximation, 'can be implemented through shear flows' while the overall velocity field remains a perceptron neural network with piecewise-constant weights is asserted without a supporting argument or verification. Shear flows typically introduce time-dependent or spatially varying structures; it is not shown that these can be absorbed into the fixed piecewise-constant-weight perceptron form without additional restrictions on the permutation step.

Authors: We acknowledge that a detailed verification is missing. The shear flows arising from the permutation approximation are piecewise constant in both space and time, allowing representation as a perceptron with weights that are constant on each time interval. In the revision we will add an explicit lemma constructing the corresponding perceptron weights and showing that the overall velocity field remains within the required class without further restrictions on the permutation step. revision: yes
Referee: [alternative probabilistic construction] Abstract and the section on the alternative construction: the statement that 'the number of discontinuities in the weights doesn't scale inversely with the ambient dimension' is given without an explicit bound, theorem, or scaling analysis. A precise estimate relating the number of discontinuities to dimension and approximation tolerance is required to substantiate the claimed advantage over the primary construction.

Authors: The referee is correct that no explicit bound appears. The Maurey-type probabilistic construction yields a number of discontinuities bounded by a quantity depending only on the approximation tolerance and independent of dimension. We will insert a new theorem providing the precise estimate (O(1/ε²) discontinuities for tolerance ε) and confirming the dimension-independent scaling. revision: yes
Referee: [main constructions] Throughout the constructions: no error bounds, convergence rates, or verification steps are supplied for the simultaneous approximation of φ and φ#μ. The central claim of exact realizability of the components therefore rests on unshown details that are load-bearing for any quantitative guarantee.

Authors: The constructions realize the compressible part exactly and the incompressible part exactly once the permutation is fixed; the only approximation error therefore originates from the permutation step. We will add a proposition that supplies explicit error bounds on both φ and φ#μ in terms of the permutation error, together with verification steps confirming that the continuity-equation flow preserves the push-forward property under the constructed velocity field. revision: yes

Circularity Check

0 steps flagged

Explicit construction via polar-like decomposition is self-contained with no reduction to inputs

full rationale

The paper's central claim is an explicit construction of a velocity field for the continuity equation that approximates both a diffeomorphism φ and its pushforward, obtained by decomposing the Lagrange interpolant of φ into a compressible gradient-of-convex-function term (realized exactly) and an incompressible term (approximated by permutations then realized via shear flows). This decomposition and realization are presented as direct mathematical steps without any fitted parameters being relabeled as predictions, without self-definitional loops, and without load-bearing reliance on self-citations whose validity would need to be assumed. The alternative probabilistic construction for regular maps is likewise independent. No equation or step reduces by construction to the target result itself, so the derivation chain remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The constructions rest on standard results from analysis and probability theory with no new free parameters or invented entities introduced in the abstract.

axioms (2)

standard math Existence and properties of diffeomorphisms and their Lagrange interpolants
Invoked to enable the polar-like decomposition of φ.
domain assumption Well-posedness of continuity equations and pushforward measures under the given velocity fields
Required for the flow to approximate both φ and φ#μ.

pith-pipeline@v0.9.0 · 5460 in / 1416 out tokens · 49622 ms · 2026-05-16T05:45:31.499602+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

[1]

An approximation theory framework for measure-transport sampling algorithms

[BHK+25] Ricardo Baptista, Bamdad Hosseini, Nikola Kovachki, Youssef Mar- zouk, and Amir Sagiv. An approximation theory framework for measure-transport sampling algorithms. Mathematics of Computa- tion, 94(354):1863–1909,

work page 1909
[2]

Knothe–Rosenblatt Maps via Soft-Constrained Op- timal Transport

[BHNZ25] Ricardo Baptista, Franca Hoffmann, Minh Van Hoang Nguyen, and Benjamin Zhang. Knothe–Rosenblatt Maps via Soft-Constrained Op- timal Transport. arXiv preprint arXiv:2511.04579,

work page arXiv
[3]

Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional brenier maps

[BPB+24] Ricardo Baptista, Aram-Alexandre Pooladian, Michael Brennan, Youssef Marzouk, and Jonathan Niles-Weed. Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional brenier maps. arXiv preprint arXiv:2411.07154,

work page arXiv
[4]

A min- imax optimal control approach for robust neural odes

[CSW24] Cristina Cipriani, Alessandro Scagliotti, and Tobias Wöhrer. A min- imax optimal control approach for robust neural odes. In 2024 Euro- pean Control Conference (ECC) , pages 58–64. IEEE,

work page 2024
[5]

Genericity of Polyak- Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs.arXiv preprint arXiv:2507.08486,

47 [DD25] Samuel Daudin and François Delarue. Genericity of Polyak- Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs.arXiv preprint arXiv:2507.08486,

work page arXiv
[6]

Large-time asymptotics in deep learning

[EGPZ20] Carlos Esteve, Borjan Geshkovski, Dario Pighin, and Enrique Zuazua. Large-time asymptotics in deep learning. arXiv preprint arXiv:2008.02491,

work page arXiv 2008
[7]

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

[GCB+18] Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynam- ics for scalable reversible generative models. arXiv preprint arXiv:1810.01367,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Measure-to-measure interpolation using transformers.arXiv preprint arXiv:2411.04551,

[GRRB24] Borjan Geshkovski, Philippe Rigollet, and Domènec Ruiz-Balet. Measure-to-measure interpolation using transformers. arXiv preprint arXiv:2411.04551,

work page arXiv
[9]

Layerwise goal-oriented adaptivity for neural odes: an optimal control perspec- tive

[HHK26] Michael Hintermüller, Michael Hinze, and Denis Korolev. Layerwise goal-oriented adaptivity for neural odes: an optimal control perspec- tive. arXiv preprint arXiv:2601.07397,

work page arXiv
[10]

Orbits and attainable hamiltonian diffeomorphisms of mechanical liouville equations

[KPS25] Bettina Kazandjian, Eugenio Pozzoli, and Mario Sigalotti. Orbits and attainable hamiltonian diffeomorphisms of mechanical liouville equations. arXiv preprint arXiv:2509.24960,

work page arXiv
[11]

A friendly introduction to triangular transport

50 [RSPM25] Maximilian Ramgraber, Daniel Sharp, Mathieu Le Provost, and Youssef Marzouk. A friendly introduction to triangular transport. arXiv preprint arXiv:2503.21673,

work page arXiv
[12]

On incompressible flows in dis- crete networks and Shnirelman’s inequality

[SZ24] Stefan Schiffer and Martina Zizza. On incompressible flows in dis- crete networks and Shnirelman’s inequality. arXiv:2410.01576,

work page arXiv
[13]

An alternative approach to Shnirelman’s inequality

[Ziz24] Martina Zizza. An alternative approach to Shnirelman’s inequality. arXiv:2407.09377,

work page arXiv