Multi-Component VAE with Gaussian Markov Random Field

Fouad Oubari; Mathilde Mougeot; Mohamed El-Baha; Raphael Meunier; Rodrigue D\'ecatoire

arxiv: 2507.12165 · v3 · submitted 2025-07-16 · 💻 cs.LG

Multi-Component VAE with Gaussian Markov Random Field

Fouad Oubari , Mohamed El-Baha , Raphael Meunier , Rodrigue D\'ecatoire , Mathilde Mougeot This is my paper

Pith reviewed 2026-05-19 04:36 UTC · model grok-4.3

classification 💻 cs.LG

keywords multi-component VAEGaussian Markov Random Fieldcross-component dependenciesstructural coherencegenerative modelingvariational autoencoderCopula datasetBIKED dataset

0 comments

The pith

Embedding Gaussian Markov Random Fields into both prior and posterior of a multi-component VAE explicitly models dependencies between data components.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses generative modeling for datasets consisting of multiple interacting components, such as industrial assemblies or multi-modal images. Standard multi-component VAEs rely on simple aggregation that often fails to preserve structural relationships across components. The proposed approach embeds Gaussian Markov Random Fields into the prior and posterior distributions to capture those cross-component dependencies directly. This change yields stronger performance on a synthetic Copula dataset built to test intricate relationships, competitive results on PolyMNIST, and notably better structural coherence on the real-world BIKED bicycle dataset. The work argues the resulting model is especially useful for applications that require realistic joint generation of interdependent parts.

Core claim

We introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder, a novel generative framework that embeds Gaussian Markov Random Fields into both prior and posterior distributions to explicitly model cross-component relationships, enabling richer representation and faithful reproduction of complex interactions.

What carries the argument

The Gaussian Markov Random Field Multi-Component Variational AutoEncoder (GMRF MCVAE), which integrates a Gaussian Markov Random Field structure into the variational prior and posterior to represent dependencies among data components.

If this is right

State-of-the-art results on a synthetic Copula dataset constructed to test complex component relationships.
Competitive performance on the PolyMNIST multi-component benchmark.
Significantly improved structural coherence when generating samples from the real-world BIKED dataset.
Particular suitability for practical tasks that demand consistent modeling of multi-component coherence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same GMRF embedding idea could be tested inside other generative architectures such as diffusion models for multi-part objects.
Application to domains with known physical constraints, like molecular assemblies, would reveal whether the Markov assumption aligns with domain structure.
If successful, the method might reduce reliance on separate post-processing steps that enforce consistency after generation.
Extending the GMRF to handle time-varying or hierarchical component relations could address sequential or nested multi-component data.

Load-bearing premise

That placing a Gaussian Markov Random Field structure inside the prior and posterior is sufficient to capture the full range of cross-component dependencies encountered in target domains.

What would settle it

A new multi-component dataset containing dependency patterns that violate the Gaussian Markov assumptions, such as strong non-Gaussian or non-Markovian interactions, on which the GMRF MCVAE shows no measurable gain in coherence over standard aggregation-based multi-component VAEs.

Figures

Figures reproduced from arXiv: 2507.12165 by Fouad Oubari, Mathilde Mougeot, Mohamed El-Baha, Raphael Meunier, Rodrigue D\'ecatoire.

**Figure 1.** Figure 1: A general MRF-based Multi-Component VAE: each component is assigned its own encoder-decoder pair, where the encoder learns unary potentials [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative results for the unconditional generations on the Copula dataset. Each subplot visualizes joint distributions for each pair of coordinates [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: PolyMNIST conditional generations. Each block corresponds to a model. In each column, the first image corresponds to the condition, followed by [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Conditional generation on BIKED. The first column shows the conditioning components, while each subsequent column presents the remaining [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative analysis of unconditional generations using the Copula dataset. Each subplot displays the marginal distributions for each coordinate: [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results of unconditional generations from the Copula dataset across three training iterations of the MVAE. Each subplot shows joint [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Multi-component datasets with intricate dependencies, like industrial assemblies or multi-modal imaging, challenge current generative modeling techniques. Existing Multi-component Variational AutoEncoders typically rely on simplified aggregation strategies, neglecting critical nuances and consequently compromising structural coherence across generated components. To explicitly address this gap, we introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder , a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions. This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions. Empirically, our GMRF MCVAE achieves state-of-the-art performance on a synthetic Copula dataset specifically constructed to evaluate intricate component relationships, demonstrates competitive results on the PolyMNIST benchmark, and significantly enhances structural coherence on the real-world BIKED dataset. Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GMRF embedding in both prior and posterior of an MCVAE is a reasonable structural addition for cross-component modeling, but the abstract's lack of equations and controls leaves the performance claims hard to assess.

read the letter

The main point is that this paper puts a Gaussian Markov Random Field into both the prior and posterior of a multi-component VAE to capture dependencies between parts more directly than simple aggregation methods do. That placement is presented as the distinguishing move over earlier work on MCVAEs. They test the idea on a synthetic Copula dataset built for complex relationships, report competitive numbers on PolyMNIST, and show gains in structural coherence on the BIKED assembly dataset. Those choices line up with the stated goal of handling industrial or multi-modal data where component interactions matter. The approach builds on standard VAE and GMRF tools without obvious circularity in the framing. The results suggest the method can improve coherence in practice, which is the kind of incremental progress that matters for applied generative modeling. The soft spots sit mostly in the missing technical layer. No equations, training details, metric definitions, or ablations appear in the abstract, so the state-of-the-art claim and the size of the coherence improvement cannot be checked yet. The stress-test concern about the Gaussian joint missing non-Gaussian or higher-order dependencies is reasonable to raise; if the paper does not include checks against copula mixtures or nonlinear alternatives, that assumption stays the weakest link. The weakest assumption in the work is that the GMRF structure by itself will be enough for the intricate dependencies in the target domains. This paper is aimed at people working on VAEs for structured multi-part data, especially in engineering or imaging settings. Readers who care about graphical-model extensions to generative methods would get the most from the experiments once the implementation is shown. It deserves a serious referee because the core design choice is clear, the application areas are relevant, and the empirical direction is falsifiable even if revisions will be needed for the details and controls. I would send it to review and ask for the derivations, ablations, and any non-Gaussian diagnostics.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Gaussian Markov Random Field Multi-Component Variational AutoEncoder (GMRF MCVAE), which embeds Gaussian Markov Random Fields into both the prior and posterior distributions of a multi-component VAE framework. This is proposed to explicitly model cross-component relationships and dependencies in complex multi-component datasets such as industrial assemblies. The authors report state-of-the-art results on a custom synthetic Copula dataset, competitive performance on the PolyMNIST benchmark, and significantly improved structural coherence on the real-world BIKED dataset.

Significance. If the central modeling choice and empirical results hold under scrutiny, the GMRF MCVAE would represent a targeted extension of existing multi-component VAEs by incorporating structured conditional independence via precision matrices, potentially aiding applications that require faithful reproduction of component interactions. The construction of a synthetic Copula dataset specifically to probe intricate dependencies is a methodological strength that supports falsifiable evaluation.

major comments (2)

[§3] §3 (model description): The claim that embedding GMRF structure into both prior and posterior is sufficient to capture intricate cross-component dependencies is load-bearing for the central contribution, yet the Gaussian joint assumption (via precision matrix) only enforces second-order linear conditional independencies and provides no explicit handling or ablation for higher-order, nonlinear, or heavy-tailed dependencies present in the Copula dataset and BIKED assemblies.
[§4] §4 (experiments): The state-of-the-art and coherence claims rest on reported metrics without accompanying ablation studies isolating the GMRF contribution, statistical significance tests, or comparisons against non-Gaussian alternatives (e.g., copula or mixture extensions), undermining verification of the design choice's necessity.

minor comments (2)

[Abstract] Abstract: The phrasing 'state-of-the-art performance' and 'significantly enhances structural coherence' would benefit from explicit metric names and baseline references even at this high level.
[§3] Notation: The distinction between the GMRF precision matrix in the prior versus the posterior should be clarified with explicit equations to avoid ambiguity in how cross-component relationships are parameterized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important aspects of the modeling assumptions and empirical validation that we will address in the revision. We respond to each major comment below.

read point-by-point responses

Referee: [§3] §3 (model description): The claim that embedding GMRF structure into both prior and posterior is sufficient to capture intricate cross-component dependencies is load-bearing for the central contribution, yet the Gaussian joint assumption (via precision matrix) only enforces second-order linear conditional independencies and provides no explicit handling or ablation for higher-order, nonlinear, or heavy-tailed dependencies present in the Copula dataset and BIKED assemblies.

Authors: We acknowledge that the GMRF formulation relies on a Gaussian joint distribution, which captures conditional independencies through the precision matrix and is therefore limited to second-order linear relationships. While the nonlinear mappings in the VAE encoders and decoders can indirectly accommodate some higher-order effects, this is not an explicit mechanism. The design prioritizes computational tractability and interpretability of sparse cross-component dependencies, which aligns with the needs of assembly-like datasets. In the revised manuscript we will add an explicit limitations paragraph in §3 discussing the Gaussian assumption and outlining potential non-Gaussian extensions. revision: partial
Referee: [§4] §4 (experiments): The state-of-the-art and coherence claims rest on reported metrics without accompanying ablation studies isolating the GMRF contribution, statistical significance tests, or comparisons against non-Gaussian alternatives (e.g., copula or mixture extensions), undermining verification of the design choice's necessity.

Authors: We agree that stronger isolation of the GMRF contribution would improve verifiability. In the revision we will add ablation experiments that remove the GMRF structure from both prior and posterior while keeping the multi-component VAE backbone fixed. We will also report statistical significance across multiple random seeds and include, where computationally feasible, comparisons against a copula-augmented baseline on the synthetic dataset. revision: yes

Circularity Check

0 steps flagged

No circularity: framework extends standard VAE/GMRF literature without self-referential reductions

full rationale

The manuscript presents the GMRF MCVAE as a novel embedding of Gaussian Markov Random Fields into VAE prior and posterior distributions to model cross-component dependencies. No equations, derivations, or fitted-parameter predictions appear in the provided abstract or reader summary that reduce any claimed result to its own inputs by construction. The approach is explicitly positioned as building on existing VAE and GMRF literature rather than deriving uniqueness or sufficiency from self-citations or ansatzes. Empirical claims rest on performance against external benchmarks (synthetic Copula, PolyMNIST, BIKED), which are independent of the modeling choice itself. This satisfies the criteria for a self-contained derivation with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; the central addition is the assumption that GMRF graphs can represent the relevant cross-component dependencies. No free parameters or new entities are explicitly introduced in the provided text.

axioms (2)

standard math Standard variational autoencoder assumptions on latent variable distributions and evidence lower bound optimization
Implicit foundation of any VAE-based method
domain assumption Gaussian Markov Random Field structure is an appropriate model for the cross-component dependencies in the datasets considered
Core modeling choice stated in the abstract to address the identified gap

pith-pipeline@v0.9.0 · 5704 in / 1482 out tokens · 34388 ms · 2026-05-19T04:36:29.984202+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we approximate both the prior and posterior using a Gaussian Markov Random Field... pGM RF(z) ∝ exp(η⊤z − 1/2 z⊤Λ z)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 5 internal anchors

[1]

The added diagnostic value of complementary gadoxetic acid-enhanced mri to 18 f-dopa-pet/ct for liver staging in medullary thyroid carcinoma,

D. Puhr-Westerheide, C. C. Cyran, J. Sargsyan-Bergmann, A. Todica, F.-J. Gildehaus, W. G. Kunz, R. Stahl, C. Spitzweg, J. Ricke, and P. M. Kazmierczak, “The added diagnostic value of complementary gadoxetic acid-enhanced mri to 18 f-dopa-pet/ct for liver staging in medullary thyroid carcinoma,” Cancer Imaging, vol. 19, pp. 1–10, 2019

work page 2019
[2]

Ct-mri dual information registration for the diagnosis of liver cancer: A pilot study using point- based registration,

A. Rahimi, A. Khalil, A. Faisal, and K. W. Lai, “Ct-mri dual information registration for the diagnosis of liver cancer: A pilot study using point- based registration,” Current medical imaging, vol. 18, no. 1, pp. 61–66, 2022

work page 2022
[3]

Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance,

Q. Xie, W. Han, X. Zhang, Y . Lai, M. Peng, A. Lopez-Lira, and J. Huang, “Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance,” Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[4]

Multimodal deep learning for finance: integrating and forecasting international stock markets,

S. I. Lee and S. J. Yoo, “Multimodal deep learning for finance: integrating and forecasting international stock markets,” The Journal of Supercomputing, vol. 76, pp. 8294–8312, 2020

work page 2020
[5]

Aircraftverse: a large- scale multimodal dataset of aerial vehicle designs,

A. Cobb, A. Roy, D. Elenius, F. Heim, B. Swenson, S. Whittington, J. Walker, T. Bapty, J. Hite, K. Ramani et al. , “Aircraftverse: a large- scale multimodal dataset of aerial vehicle designs,” Advances in Neural Information Processing Systems , vol. 36, pp. 44 524–44 543, 2023

work page 2023
[6]

A meta- vae for multi-component industrial systems generation,

F. Oubari, R. Meunier, R. D ´ecatoire, and M. Mougeot, “A meta- vae for multi-component industrial systems generation,” in Intelligent Computing, K. Arai, Ed. Cham: Springer Nature Switzerland, 2024, pp. 234–251

work page 2024
[7]

Multimodal generative models for scalable weakly-supervised learning,

M. Wu and N. Goodman, “Multimodal generative models for scalable weakly-supervised learning,” Advances in neural information processing systems, vol. 31, 2018

work page 2018
[8]

Variational mixture-of-experts autoen- coders for multi-modal deep generative models,

Y . Shi, B. Paige, P. Torr et al., “Variational mixture-of-experts autoen- coders for multi-modal deep generative models,” Advances in neural information processing systems , vol. 32, 2019

work page 2019
[9]

Koller and N

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009
[10]

Nvae: A deep hierarchical variational autoen- coder,

A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational autoen- coder,” Advances in neural information processing systems , vol. 33, pp. 19 667–19 679, 2020

work page 2020
[11]

Statistical guarantees for variational autoencoders using pac-bayesian theory,

S. D. Mbacke, F. Clerc, and P. Germain, “Statistical guarantees for variational autoencoders using pac-bayesian theory,”Advances in Neural Information Processing Systems , vol. 36, 2024

work page 2024
[12]

Joint Multimodal Learning with Deep Generative Models

M. Suzuki, K. Nakayama, and Y . Matsuo, “Joint multimodal learning with deep generative models,” arXiv preprint arXiv:1611.01891 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

Generative Models of Visually Grounded Imagination

R. Vedantam, I. Fischer, J. Huang, and K. Murphy, “Generative models of visually grounded imagination,” arXiv preprint arXiv:1705.10762 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Mmvae+: Enhancing the generative quality of multimodal vaes without compromises,

E. Palumbo, I. Daunhawer, and J. E. V ogt, “Mmvae+: Enhancing the generative quality of multimodal vaes without compromises,” in The Eleventh International Conference on Learning Representations . OpenReview, 2023

work page 2023
[15]

Generalized multimodal elbo,

T. M. Sutter, I. Daunhawer, and J. E. V ogt, “Generalized multimodal elbo,” arXiv preprint arXiv:2105.02470 , 2021

work page arXiv 2021
[16]

Multimodal generative learning utilizing jensen-shannon-divergence,

T. Sutter, I. Daunhawer, and J. V ogt, “Multimodal generative learning utilizing jensen-shannon-divergence,” Advances in neural information processing systems, vol. 33, pp. 6100–6110, 2020

work page 2020
[17]

Graphical models, exponential families, and variational inference,

M. J. Wainwright, M. I. Jordan et al. , “Graphical models, exponential families, and variational inference,” Foundations and Trends® in Ma- chine Learning, vol. 1, no. 1–2, pp. 1–305, 2008

work page 2008
[18]

K. P. Murphy, Machine learning: a probabilistic perspective . MIT press, 2012

work page 2012
[19]

Kindermann and J

R. Kindermann and J. L. Snell, Markov random fields and their applications. American Mathematical Society, 1980, vol. 1

work page 1980
[20]

On contrastive divergence learning,

M. A. Carreira-Perpinan and G. Hinton, “On contrastive divergence learning,” in International workshop on artificial intelligence and statis- tics. PMLR, 2005, pp. 33–40

work page 2005
[21]

Efficient learning of discrete graphical models,

M. Vuffray, S. Misra, and A. Lokhov, “Efficient learning of discrete graphical models,” Advances in Neural Information Processing Systems , vol. 33, pp. 13 575–13 585, 2020

work page 2020
[22]

Learning graphical models with mercer kernels,

F. Bach and M. Jordan, “Learning graphical models with mercer kernels,” Advances in Neural Information Processing Systems , vol. 15, 2002

work page 2002
[23]

Learning Graphical Models With Hubs

K. M. Tan, P. London, K. Mohan, S.-I. Lee, M. Fazel, and D. Witten, “Learning graphical models with hubs,”arXiv preprint arXiv:1402.7349, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[24]

Learning in markov random fields with contrastive free energies,

M. Welling and C. Sutton, “Learning in markov random fields with contrastive free energies,” in International Workshop on Artificial Intel- ligence and Statistics . PMLR, 2005, pp. 397–404

work page 2005
[25]

Perez et al., Markov random fields and images

P. Perez et al., Markov random fields and images . IRISA, 1998, vol. 469

work page 1998
[26]

Image completion using efficient belief propagation via priority scheduling and dynamic pruning,

N. Komodakis and G. Tziritas, “Image completion using efficient belief propagation via priority scheduling and dynamic pruning,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2649–2661, 2007

work page 2007
[27]

Efficient inference in fully connected crfs with gaussian edge potentials,

P. Kr ¨ahenb¨uhl and V . Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” Advances in neural information processing systems, vol. 24, 2011

work page 2011
[28]

A combined markov random field and wave-packet transform-based approach for image segmentation,

M. G. Bello, “A combined markov random field and wave-packet transform-based approach for image segmentation,” IEEE transactions on image processing , vol. 3, no. 6, pp. 834–846, 1994

work page 1994
[29]

Composing graphical models with neural networks for structured representations and fast inference,

M. J. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” Advances in neural information processing systems, vol. 29, 2016

work page 2016
[30]

Gumbolt: Extending gumbel trick to boltzmann priors,

A. H. Khoshaman and M. Amin, “Gumbolt: Extending gumbel trick to boltzmann priors,” Advances in Neural Information Processing Systems , vol. 31, 2018

work page 2018
[31]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

An introduction to variational autoencoders,

D. P. Kingma, M. Welling et al. , “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019

work page 2019
[33]

‘the formula that killed wall street’: The gaussian copula and modelling practices in investment banking,

D. MacKenzie and T. Spears, “‘the formula that killed wall street’: The gaussian copula and modelling practices in investment banking,” Social Studies of Science , vol. 44, no. 3, pp. 393–417, 2014

work page 2014
[34]

Gaussian copula modeling of extreme cold and weak-wind events over europe conditioned on winter weather regimes,

P. Tedesco, A. Lenkoski, H. C. Bloomfield, and J. Sillmann, “Gaussian copula modeling of extreme cold and weak-wind events over europe conditioned on winter weather regimes,” Environmental Research Let- ters, vol. 18, no. 3, p. 034008, 2023

work page 2023
[35]

Biked: A dataset for compu- tational bicycle design with machine learning benchmarks,

L. Regenwetter, B. Curry, and F. Ahmed, “Biked: A dataset for compu- tational bicycle design with machine learning benchmarks,” Journal of Mechanical Design, vol. 144, no. 3, p. 031706, 2022

work page 2022
[36]

The wasserstein distances,

C. Villani, “The wasserstein distances,” Optimal Transport: Old and New, pp. 93–111, 2009

work page 2009
[37]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems , vol. 30, 2017

work page 2017
[38]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[40]

The matrix cookbook,

K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark , vol. 7, no. 15, p. 510, 2008

work page 2008
[41]

Block diagonally dominant matrices and generalizations of the gerschgorin circle theorem

D. G. Feingold and R. S. Varga, “Block diagonally dominant matrices and generalizations of the gerschgorin circle theorem.” 1962

work page 1962
[42]

beta-vae: Learning basic vi- sual concepts with a constrained variational framework

I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic vi- sual concepts with a constrained variational framework.” ICLR (Poster), vol. 3, 2017. APPENDIX A. Conditional Sampling The conditional distribution of a normally distributed ran- dom variable given another is also normally ...

work page 2017
[43]

Let M be a complex matrix, and let E and F be two vector spaces equipped with norms ∥ · ∥ E and ∥ · ∥ F , respectively, such that for all x ∈ E, Mx ∈ F

Preliminaries: This section demonstrates how our archi- tecture can generate full diagonal block covariance matrices while ensuring symmetric positive definiteness. Let M be a complex matrix, and let E and F be two vector spaces equipped with norms ∥ · ∥ E and ∥ · ∥ F , respectively, such that for all x ∈ E, Mx ∈ F . The standard operator norm of M is def...

work page
[44]

To prove Theorem 1, we show that Σ is positive definite

Proof of Theorem 1: Proof. To prove Theorem 1, we show that Σ is positive definite

work page
[45]

Nonsingularity: From Theorem 2, the block strictly diagonal dominance of Σ guarantees that it is nonsingular

work page
[46]

Theorem 3 ensures that there exists at least one i ∈ {1,

Positivity: Let λ be an eigenvalue of Σ. Theorem 3 ensures that there exists at least one i ∈ {1, . . . , n} such that: ||(Σi,i − λI)−1|| −1 ≤ nX k=1 k̸=i ||Σi,k||. Since we consider in our theorem that || · || is the spectral norm, we have: ||(Σi,i − λI)−1|| = sup j∈{1,...,d} 1 σi j − λ , where (σi j)j are the eigenvalues of Σi,i. Let k ∈ {1, . . . , d} ...

work page
[47]

We use similar encoders/decoders architectures in both experiments with different parameters reported on table C2

PolyMNIST & BIKED Experiments: We employ consis- tent encoder/decoder architectures across all baseline models, using publicly available implementations for MV AE, MM- V AE, and MoPoE-V AE from [16], and for MMV AE+ from [14]. We use similar encoders/decoders architectures in both experiments with different parameters reported on table C2. Our GMRF MCV AE...

work page
[48]

X i<j ψp i,j(zi, zj) − ψq i,j(zi, zj) # − Eqϕ(z|X)

Copula Dataset Experiment: Table IV presents the ar- chitecture details for both the encoders and decoders used in the Copula experiment. To maintain consistency in latent capacities across different models, the GMRF MCV AE was configured with a latent dimension of 2 (yielding a total capac- ity of 44), while all other models used a latent dimension of 3 ...

work page

[1] [1]

The added diagnostic value of complementary gadoxetic acid-enhanced mri to 18 f-dopa-pet/ct for liver staging in medullary thyroid carcinoma,

D. Puhr-Westerheide, C. C. Cyran, J. Sargsyan-Bergmann, A. Todica, F.-J. Gildehaus, W. G. Kunz, R. Stahl, C. Spitzweg, J. Ricke, and P. M. Kazmierczak, “The added diagnostic value of complementary gadoxetic acid-enhanced mri to 18 f-dopa-pet/ct for liver staging in medullary thyroid carcinoma,” Cancer Imaging, vol. 19, pp. 1–10, 2019

work page 2019

[2] [2]

Ct-mri dual information registration for the diagnosis of liver cancer: A pilot study using point- based registration,

A. Rahimi, A. Khalil, A. Faisal, and K. W. Lai, “Ct-mri dual information registration for the diagnosis of liver cancer: A pilot study using point- based registration,” Current medical imaging, vol. 18, no. 1, pp. 61–66, 2022

work page 2022

[3] [3]

Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance,

Q. Xie, W. Han, X. Zhang, Y . Lai, M. Peng, A. Lopez-Lira, and J. Huang, “Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance,” Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[4] [4]

Multimodal deep learning for finance: integrating and forecasting international stock markets,

S. I. Lee and S. J. Yoo, “Multimodal deep learning for finance: integrating and forecasting international stock markets,” The Journal of Supercomputing, vol. 76, pp. 8294–8312, 2020

work page 2020

[5] [5]

Aircraftverse: a large- scale multimodal dataset of aerial vehicle designs,

A. Cobb, A. Roy, D. Elenius, F. Heim, B. Swenson, S. Whittington, J. Walker, T. Bapty, J. Hite, K. Ramani et al. , “Aircraftverse: a large- scale multimodal dataset of aerial vehicle designs,” Advances in Neural Information Processing Systems , vol. 36, pp. 44 524–44 543, 2023

work page 2023

[6] [6]

A meta- vae for multi-component industrial systems generation,

F. Oubari, R. Meunier, R. D ´ecatoire, and M. Mougeot, “A meta- vae for multi-component industrial systems generation,” in Intelligent Computing, K. Arai, Ed. Cham: Springer Nature Switzerland, 2024, pp. 234–251

work page 2024

[7] [7]

Multimodal generative models for scalable weakly-supervised learning,

M. Wu and N. Goodman, “Multimodal generative models for scalable weakly-supervised learning,” Advances in neural information processing systems, vol. 31, 2018

work page 2018

[8] [8]

Variational mixture-of-experts autoen- coders for multi-modal deep generative models,

Y . Shi, B. Paige, P. Torr et al., “Variational mixture-of-experts autoen- coders for multi-modal deep generative models,” Advances in neural information processing systems , vol. 32, 2019

work page 2019

[9] [9]

Koller and N

D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009

[10] [10]

Nvae: A deep hierarchical variational autoen- coder,

A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational autoen- coder,” Advances in neural information processing systems , vol. 33, pp. 19 667–19 679, 2020

work page 2020

[11] [11]

Statistical guarantees for variational autoencoders using pac-bayesian theory,

S. D. Mbacke, F. Clerc, and P. Germain, “Statistical guarantees for variational autoencoders using pac-bayesian theory,”Advances in Neural Information Processing Systems , vol. 36, 2024

work page 2024

[12] [12]

Joint Multimodal Learning with Deep Generative Models

M. Suzuki, K. Nakayama, and Y . Matsuo, “Joint multimodal learning with deep generative models,” arXiv preprint arXiv:1611.01891 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [13]

Generative Models of Visually Grounded Imagination

R. Vedantam, I. Fischer, J. Huang, and K. Murphy, “Generative models of visually grounded imagination,” arXiv preprint arXiv:1705.10762 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Mmvae+: Enhancing the generative quality of multimodal vaes without compromises,

E. Palumbo, I. Daunhawer, and J. E. V ogt, “Mmvae+: Enhancing the generative quality of multimodal vaes without compromises,” in The Eleventh International Conference on Learning Representations . OpenReview, 2023

work page 2023

[15] [15]

Generalized multimodal elbo,

T. M. Sutter, I. Daunhawer, and J. E. V ogt, “Generalized multimodal elbo,” arXiv preprint arXiv:2105.02470 , 2021

work page arXiv 2021

[16] [16]

Multimodal generative learning utilizing jensen-shannon-divergence,

T. Sutter, I. Daunhawer, and J. V ogt, “Multimodal generative learning utilizing jensen-shannon-divergence,” Advances in neural information processing systems, vol. 33, pp. 6100–6110, 2020

work page 2020

[17] [17]

Graphical models, exponential families, and variational inference,

M. J. Wainwright, M. I. Jordan et al. , “Graphical models, exponential families, and variational inference,” Foundations and Trends® in Ma- chine Learning, vol. 1, no. 1–2, pp. 1–305, 2008

work page 2008

[18] [18]

K. P. Murphy, Machine learning: a probabilistic perspective . MIT press, 2012

work page 2012

[19] [19]

Kindermann and J

R. Kindermann and J. L. Snell, Markov random fields and their applications. American Mathematical Society, 1980, vol. 1

work page 1980

[20] [20]

On contrastive divergence learning,

M. A. Carreira-Perpinan and G. Hinton, “On contrastive divergence learning,” in International workshop on artificial intelligence and statis- tics. PMLR, 2005, pp. 33–40

work page 2005

[21] [21]

Efficient learning of discrete graphical models,

M. Vuffray, S. Misra, and A. Lokhov, “Efficient learning of discrete graphical models,” Advances in Neural Information Processing Systems , vol. 33, pp. 13 575–13 585, 2020

work page 2020

[22] [22]

Learning graphical models with mercer kernels,

F. Bach and M. Jordan, “Learning graphical models with mercer kernels,” Advances in Neural Information Processing Systems , vol. 15, 2002

work page 2002

[23] [23]

Learning Graphical Models With Hubs

K. M. Tan, P. London, K. Mohan, S.-I. Lee, M. Fazel, and D. Witten, “Learning graphical models with hubs,”arXiv preprint arXiv:1402.7349, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[24] [24]

Learning in markov random fields with contrastive free energies,

M. Welling and C. Sutton, “Learning in markov random fields with contrastive free energies,” in International Workshop on Artificial Intel- ligence and Statistics . PMLR, 2005, pp. 397–404

work page 2005

[25] [25]

Perez et al., Markov random fields and images

P. Perez et al., Markov random fields and images . IRISA, 1998, vol. 469

work page 1998

[26] [26]

Image completion using efficient belief propagation via priority scheduling and dynamic pruning,

N. Komodakis and G. Tziritas, “Image completion using efficient belief propagation via priority scheduling and dynamic pruning,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2649–2661, 2007

work page 2007

[27] [27]

Efficient inference in fully connected crfs with gaussian edge potentials,

P. Kr ¨ahenb¨uhl and V . Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” Advances in neural information processing systems, vol. 24, 2011

work page 2011

[28] [28]

A combined markov random field and wave-packet transform-based approach for image segmentation,

M. G. Bello, “A combined markov random field and wave-packet transform-based approach for image segmentation,” IEEE transactions on image processing , vol. 3, no. 6, pp. 834–846, 1994

work page 1994

[29] [29]

Composing graphical models with neural networks for structured representations and fast inference,

M. J. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” Advances in neural information processing systems, vol. 29, 2016

work page 2016

[30] [30]

Gumbolt: Extending gumbel trick to boltzmann priors,

A. H. Khoshaman and M. Amin, “Gumbolt: Extending gumbel trick to boltzmann priors,” Advances in Neural Information Processing Systems , vol. 31, 2018

work page 2018

[31] [31]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

An introduction to variational autoencoders,

D. P. Kingma, M. Welling et al. , “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019

work page 2019

[33] [33]

‘the formula that killed wall street’: The gaussian copula and modelling practices in investment banking,

D. MacKenzie and T. Spears, “‘the formula that killed wall street’: The gaussian copula and modelling practices in investment banking,” Social Studies of Science , vol. 44, no. 3, pp. 393–417, 2014

work page 2014

[34] [34]

Gaussian copula modeling of extreme cold and weak-wind events over europe conditioned on winter weather regimes,

P. Tedesco, A. Lenkoski, H. C. Bloomfield, and J. Sillmann, “Gaussian copula modeling of extreme cold and weak-wind events over europe conditioned on winter weather regimes,” Environmental Research Let- ters, vol. 18, no. 3, p. 034008, 2023

work page 2023

[35] [35]

Biked: A dataset for compu- tational bicycle design with machine learning benchmarks,

L. Regenwetter, B. Curry, and F. Ahmed, “Biked: A dataset for compu- tational bicycle design with machine learning benchmarks,” Journal of Mechanical Design, vol. 144, no. 3, p. 031706, 2022

work page 2022

[36] [36]

The wasserstein distances,

C. Villani, “The wasserstein distances,” Optimal Transport: Old and New, pp. 93–111, 2009

work page 2009

[37] [37]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems , vol. 30, 2017

work page 2017

[38] [38]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[39] [39]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[40] [40]

The matrix cookbook,

K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark , vol. 7, no. 15, p. 510, 2008

work page 2008

[41] [41]

Block diagonally dominant matrices and generalizations of the gerschgorin circle theorem

D. G. Feingold and R. S. Varga, “Block diagonally dominant matrices and generalizations of the gerschgorin circle theorem.” 1962

work page 1962

[42] [42]

beta-vae: Learning basic vi- sual concepts with a constrained variational framework

I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic vi- sual concepts with a constrained variational framework.” ICLR (Poster), vol. 3, 2017. APPENDIX A. Conditional Sampling The conditional distribution of a normally distributed ran- dom variable given another is also normally ...

work page 2017

[43] [43]

Let M be a complex matrix, and let E and F be two vector spaces equipped with norms ∥ · ∥ E and ∥ · ∥ F , respectively, such that for all x ∈ E, Mx ∈ F

Preliminaries: This section demonstrates how our archi- tecture can generate full diagonal block covariance matrices while ensuring symmetric positive definiteness. Let M be a complex matrix, and let E and F be two vector spaces equipped with norms ∥ · ∥ E and ∥ · ∥ F , respectively, such that for all x ∈ E, Mx ∈ F . The standard operator norm of M is def...

work page

[44] [44]

To prove Theorem 1, we show that Σ is positive definite

Proof of Theorem 1: Proof. To prove Theorem 1, we show that Σ is positive definite

work page

[45] [45]

Nonsingularity: From Theorem 2, the block strictly diagonal dominance of Σ guarantees that it is nonsingular

work page

[46] [46]

Theorem 3 ensures that there exists at least one i ∈ {1,

Positivity: Let λ be an eigenvalue of Σ. Theorem 3 ensures that there exists at least one i ∈ {1, . . . , n} such that: ||(Σi,i − λI)−1|| −1 ≤ nX k=1 k̸=i ||Σi,k||. Since we consider in our theorem that || · || is the spectral norm, we have: ||(Σi,i − λI)−1|| = sup j∈{1,...,d} 1 σi j − λ , where (σi j)j are the eigenvalues of Σi,i. Let k ∈ {1, . . . , d} ...

work page

[47] [47]

We use similar encoders/decoders architectures in both experiments with different parameters reported on table C2

PolyMNIST & BIKED Experiments: We employ consis- tent encoder/decoder architectures across all baseline models, using publicly available implementations for MV AE, MM- V AE, and MoPoE-V AE from [16], and for MMV AE+ from [14]. We use similar encoders/decoders architectures in both experiments with different parameters reported on table C2. Our GMRF MCV AE...

work page

[48] [48]

X i<j ψp i,j(zi, zj) − ψq i,j(zi, zj) # − Eqϕ(z|X)

Copula Dataset Experiment: Table IV presents the ar- chitecture details for both the encoders and decoders used in the Copula experiment. To maintain consistency in latent capacities across different models, the GMRF MCV AE was configured with a latent dimension of 2 (yielding a total capac- ity of 44), while all other models used a latent dimension of 3 ...

work page