Reparameterization through Coverings and Topological Weight Priors

Maxim Beketov; Pavel Snopov

arxiv: 2604.23804 · v1 · submitted 2026-04-26 · 💻 cs.LG

Reparameterization through Coverings and Topological Weight Priors

Maxim Beketov , Pavel Snopov This is my paper

Pith reviewed 2026-05-08 06:21 UTC · model grok-4.3

classification 💻 cs.LG

keywords variational autoencodersreparameterization trickcovering mapstopological priorsKlein bottleBayesian neural networkslatent manifoldsELBO tractability

0 comments

The pith

Covering maps let the reparameterization trick work on latent spaces whose topology is not a Lie group, such as the Klein bottle, while keeping the KL term in the VAE objective tractable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to extend the reparameterization trick used in variational autoencoders so that the latent space can sit on manifolds with non-trivial topology. The extension works by lifting the sampling process through a covering map from a simpler manifold where a standard reparameterization technique is already known. When the covering satisfies a measure-preservation condition, an inequality bounds the KL divergence between the pushed-forward densities, rendering the KL term of the evidence lower bound analytically computable. The authors construct and train a concrete instance called KleinVAE whose latent space has Klein-bottle topology and successfully models an artificial dataset. They further indicate that the same construction can supply topologically informed priors for the weights of Bayesian neural networks, particularly convolutional vision models.

Core claim

What carries the argument

A covering map equipped with a measure-preservation property that produces a usable inequality on the KL divergence of the push-forward densities on the base manifold.

If this is right

VAEs become constructible on base manifolds whose topology is covered by spaces that already admit reparameterization, including manifolds that are not Lie groups.
The evidence lower bound remains analytically tractable for such non-trivial topologies once the measure-preservation condition is met.
A working Klein-bottle latent-space VAE can be trained on artificial data and used as a generative model.
The same construction supplies topology-aware weight priors for Bayesian learning, with possible relevance to convolutional networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same covering technique may be applied to other manifolds whose topology is known to appear in data or parameter spaces, even when no global Lie-group structure exists.
It could be tested whether the resulting priors yield better-calibrated uncertainty or improved generalization than isotropic Gaussian priors in vision tasks.
The inequality derived from measure preservation might be tightened or replaced by an equality under additional symmetry assumptions on the covering.

Load-bearing premise

The covering map must satisfy a specific measure-preservation property that lets the KL divergence of the pushed-forward densities be bounded from above.

What would settle it

An explicit covering map for which the claimed KL inequality between push-forward densities fails to hold, or a KleinVAE whose ELBO cannot be evaluated or optimized because the inequality does not deliver a tractable upper bound.

Figures

Figures reproduced from arXiv: 2604.23804 by Maxim Beketov, Pavel Snopov.

**Figure 1.** Figure 1: Two-sheet covering, see (2), f2→1 : T 2 → K of the Klein bottle K by a torus T 2 . Each y ∈ K has 2 pre-images x1, x2 ∈ T 2 . on RT in non-trivial topologies. As a model latent space, one can, in principle, consider any manifold (base of the covering) that can be covered with some other (cover) manifold, on the latter of which some technique for RT is available. In present work we consider the case when t… view at source ↗

**Figure 2.** Figure 2: Left: Gabor-Klein filters = discretizations of function F (6), with θ1,θ2 params, on a 3×3 grid. Upper half, θ1 ≤π, is the fund. region of F in the parameter space Θ, with Klein bottle topology. This is also an illustration of a 2-sheeted covering of K by a 2-torus: the whole square doubly covers the top half. Right: Persistence diagrams of 500 filters sampled from U[Θ]: left is PH over Z2; right is over Z… view at source ↗

**Figure 3.** Figure 3: Klein-Circles dataset. Left: samples of original data – circles on the Klein bottle with centers at different positions. Right: reconstructions of corresponding images with KleinVAE. White = intensity equals 1, black = zero. We generate small images (30×30 pixels) of a circle of radius 0.3 on a unit square. Boundary points of the square are identified in such a way that it has topology of K. With circle c… view at source ↗

**Figure 4.** Figure 4: Persistence diagrams of 500 images decoded by KleinVAE from images with uniformly view at source ↗

**Figure 5.** Figure 5: Training dynamics and topological metrics for all models. (a) Evidence Lower Bound, (b) view at source ↗

**Figure 6.** Figure 6: (a-b) Total bottleneck distance over Z2 and Z3. B APPENDIX II: OUTLOOK With the main claimed contribution of present work being a generalization of RT – with reparameterization via coverings shown possible, advocated to be used in generative modeling (not limited to VAEs – seen as basic latent models that can be used as building blocks for more expressive ones to figure out first), we’re not merely intere… view at source ↗

read the original abstract

We generalise the reparameterization trick applied in variational autoencoders (VAEs) letting these have latent spaces of non-trivial topology - i.e. that of base manifolds covered with other ones, on which some technique for RT is available. That is possible since covering maps are measurable - moreover, in case of particular measure preservation property holding for the covering, one can establish an inequality on KL-divergence between pushforward (PF) densities on the base latent manifold, making the KL-term of VAE's ELBO analytically tractable, despite the topological non-triviality of the supporting latent manifold. Our development follows a route close but somewhat alternative to reparameterization on Lie groups, the latest proposal for which is to reparameterize PFs of normal densities from the Lie algebra - "through" the exponential map, seen by us as sometimes a particular case of what we propose to call reparameterization through a covering. Covering maps need not be global diffeomorphisms (although Lie-exp maps, in general, need not either, but, to date only smooth ones were considered in this context, to the best of our knowledge), which makes many non-trivial topologies tamable to our proposed technique, that we detail on a particular such example. We demonstrate the working of our approach by constructing a VAE with the latent space of Klein bottle (not a Lie group) topology, which we call KleinVAE, successfully learning an appropriate artificial dataset. We discuss potential applicability of such topology-informed generative models as weight priors in Bayesian learning, particularly for convolutional vision models, where said manifold was peculiarly shown to have some relevance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper generalizes the reparameterization trick to non-Lie-group manifolds via coverings, with a Klein-bottle VAE example, but the KL tractability claim rests on an unshown measure-preservation condition.

read the letter

The main point is that they lift reparameterization to a covering space so VAEs can handle latent manifolds like the Klein bottle, then use an inequality to keep the KL term tractable. This is positioned as a broader alternative to Lie-group exponential maps, and they demonstrate it with a KleinVAE that trains on synthetic data. They also float the idea of using such models as topological weight priors for convolutional nets. That framing and the non-group example are the genuinely new pieces. The rest follows standard VAE objectives and measurable properties of coverings. The paper does a clean job stating the construction and showing that the approach can at least run on an artificial dataset without obvious collapse. The Klein-bottle case is a reasonable test since it is not a Lie group, so it shows the method is not limited to the cases already covered in the literature. The soft spot is the central inequality. It requires a specific measure-preservation property on the covering, yet the text only asserts the property rather than deriving the inequality or verifying it for the actual map used on the Klein bottle. Without that step, the claim that the KL term becomes analytically tractable does not fully land. The experiments stay qualitative with no metrics, baselines, or checks on bound tightness, so it is difficult to gauge practical gain. This is for people already working on manifold-valued latents or geometric VAEs. A reader looking for new ways to handle non-trivial topology could extract the covering idea if the math is filled in. It deserves a serious referee because the direction is distinct and the example is concrete, even though the current version is thin on the key derivation and validation. I would send it for review and ask for the explicit inequality plus quantitative results on the bound and on real data.

Referee Report

3 major / 2 minor

Summary. The paper proposes generalizing the reparameterization trick for VAEs to latent spaces with non-trivial topology (e.g., Klein bottle) via covering maps from manifolds where reparameterization is available. Under a specific (unnamed) measure-preservation property of the covering, an inequality relating KL divergences of pushforward densities is claimed to make the KL term in the VAE ELBO analytically tractable. The method is illustrated by constructing KleinVAE, which is shown to learn an artificial dataset, and potential use as topological weight priors for Bayesian learning (e.g., in convolutional models) is discussed.

Significance. If the measure-preservation condition can be rigorously stated, verified for standard coverings such as that of the Klein bottle, and the KL inequality derived without circularity, the approach would offer a route to VAEs on manifolds outside Lie groups, extending reparameterization techniques and enabling topology-informed priors. The synthetic demonstration provides initial evidence of feasibility but does not yet establish broader utility.

major comments (3)

[Abstract / KL-inequality section] Abstract and the section introducing the KL inequality: the central tractability claim rests on an inequality for KL divergences between pushforward densities that holds only under a 'particular measure preservation property' of the covering map. This property is neither explicitly defined nor shown to hold for the identification map used to construct the Klein bottle (e.g., from R^2 or the torus), so the reduction of the ELBO KL term to a tractable form remains unverified.
[KleinVAE experiments] Experimental demonstration (KleinVAE section): only qualitative success on an artificial dataset is reported. No quantitative metrics (e.g., ELBO values, reconstruction error, or comparison against a standard VAE), no ablation on the covering map, and no direct check that the asserted measure-preservation property is satisfied in the implemented model, leaving the practical tractability of the KL term unconfirmed.
[Relation to Lie groups] § on relation to Lie-group reparameterization: the claim that the exponential map is 'sometimes a particular case' of reparameterization through a covering is asserted without a precise statement of when the covering map coincides with the exp map or when the measure-preservation condition reduces to the usual Lie-algebra case, weakening the positioning relative to prior work.

minor comments (2)

[Method] Notation for pushforward densities and covering maps should be introduced with explicit definitions and a small diagram of the Klein-bottle covering to aid readability.
[Abstract] The abstract states the construction but supplies neither the explicit inequality derivation nor quantitative experimental results; moving a concise derivation or pseudocode to the main text would strengthen the presentation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's constructive and detailed comments on our manuscript. We address each major point below with the strongest honest defense possible, indicating where revisions will be made to improve clarity, rigor, and validation without misrepresenting the original claims.

read point-by-point responses

Referee: [Abstract / KL-inequality section] Abstract and the section introducing the KL inequality: the central tractability claim rests on an inequality for KL divergences between pushforward densities that holds only under a 'particular measure preservation property' of the covering map. This property is neither explicitly defined nor shown to hold for the identification map used to construct the Klein bottle (e.g., from R^2 or the torus), so the reduction of the ELBO KL term to a tractable form remains unverified.

Authors: We acknowledge that while the abstract and relevant section introduce the measure preservation property as a sufficient condition for the KL inequality to hold, a fully explicit definition and verification for the Klein bottle covering were not provided. In the revised manuscript we will add a precise definition: the covering map φ: M → N preserves measures in the sense that for suitable test functions f the integral equality ∫_M (f ∘ φ) dμ = ∫_N f dν holds, where μ and ν are the reference measures on the covering and base manifolds respectively. We will then prove that this property is satisfied by the standard identification map from the torus (or R^2 with periodic identifications) to the Klein bottle via direct computation of the pushforward densities and Jacobian factors. This will rigorously establish the applicability of the KL inequality and confirm tractability of the ELBO term. revision: yes
Referee: [KleinVAE experiments] Experimental demonstration (KleinVAE section): only qualitative success on an artificial dataset is reported. No quantitative metrics (e.g., ELBO values, reconstruction error, or comparison against a standard VAE), no ablation on the covering map, and no direct check that the asserted measure-preservation property is satisfied in the implemented model, leaving the practical tractability of the KL term unconfirmed.

Authors: We agree that the current experimental presentation is limited to qualitative visualization. In the revision we will augment the KleinVAE section with quantitative results: reported ELBO values and mean reconstruction errors on the artificial dataset, direct numerical comparison against a standard Euclidean VAE baseline trained on identical data, and an ablation varying the covering construction (e.g., different identification periods). We will also include an explicit numerical check of the measure-preservation property by Monte-Carlo approximation of the relevant integrals over the latent space in the implemented model, thereby confirming practical tractability of the KL term. revision: yes
Referee: [Relation to Lie groups] § on relation to Lie-group reparameterization: the claim that the exponential map is 'sometimes a particular case' of reparameterization through a covering is asserted without a precise statement of when the covering map coincides with the exp map or when the measure-preservation condition reduces to the usual Lie-algebra case, weakening the positioning relative to prior work.

Authors: We will strengthen the related-work discussion by adding a precise characterization: the exponential map coincides with a covering map precisely when it is a global covering (which occurs for certain compact Lie groups), and in those cases the measure-preservation property holds with equality, recovering the standard KL computation on the Lie algebra. For non-global cases the inequality version of our result still applies. This explicit reduction clarifies how our framework generalizes the Lie-group approach while encompassing additional topologies such as the Klein bottle. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on external properties of covering maps and standard VAE ELBO without reduction to fitted inputs or self-referential definitions

full rationale

The paper generalizes the reparameterization trick by invoking that covering maps are measurable and that, conditional on a particular measure preservation property, an inequality relating KL divergences of pushforward densities holds, thereby keeping the VAE ELBO's KL term tractable on non-trivial manifolds such as the Klein bottle. This step is presented as following from the assumed property of the covering rather than being internally derived or fitted; the KleinVAE construction and synthetic-data demonstration serve as an application and empirical check, not a statistical fit renamed as prediction. No self-citations appear load-bearing, no uniqueness theorems are imported from the authors' prior work, and no ansatz is smuggled via citation. The chain therefore remains self-contained against external facts about measurable coverings and the standard variational objective.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on two background properties of covering maps and one domain-specific assumption needed to obtain the KL inequality; no free parameters or new entities are introduced in the abstract.

axioms (2)

standard math Covering maps are measurable
Invoked to guarantee that pushforward densities exist on the base manifold.
domain assumption A particular measure preservation property holds for the covering
Required to establish the KL-divergence inequality that renders the ELBO tractable.

pith-pipeline@v0.9.0 · 5596 in / 1442 out tokens · 56657 ms · 2026-05-08T06:21:55.451245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[3]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[4]

2018 , publisher =

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.21105/joss.00925 2018

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[3] [3]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[4] [4]

2018 , publisher =

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.21105/joss.00925 2018