GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning

Fei Wang; Jianhua Zhu; Liangcai Gao; Tengfei Ma; Xingjian Hu; Zuoyu Yan

arxiv: 2604.25352 · v1 · submitted 2026-04-28 · 💻 cs.LG · cs.AI

GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning

Xingjian Hu , Zuoyu Yan , Jianhua Zhu , Liangcai Gao , Fei Wang , Tengfei Ma This is my paper

Pith reviewed 2026-05-07 16:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksmodality imputationpatchwork learningdistributed multi-modal learningunsupervised imputationelectronic health records

0 comments

The pith

GraphPL builds graphs from all observed modalities across clients and uses graph neural networks to impute missing ones unsupervised, remaining robust to noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In distributed multi-modal settings, different clients often hold different subsets of data modalities, so the task is to recover the missing ones without supervision. Prior approaches typically discard some of the available observations and therefore under-use the collective information. GraphPL instead represents the observed modalities as nodes in a graph and applies message passing to generate imputations that draw on every available signal. The method is also shown to tolerate noisy inputs without collapsing. When the claim holds, it produces imputations that improve downstream tasks on both synthetic benchmarks and real fragmented electronic health records.

Core claim

GraphPL combines graph neural networks with patchwork learning to flexibly integrate all observed modalities and remains robust with noisy inputs. It constructs a graph whose nodes are the observed modality instances across clients, then uses GNN layers to propagate information and produce unsupervised imputations for the missing modalities. On benchmark datasets this yields state-of-the-art imputation accuracy; on a real-world distributed electronic health record collection the imputed modalities yield stronger features for downstream disease prediction than earlier methods.

What carries the argument

Graph neural network message passing on a graph whose nodes encode the observed modalities from all clients, allowing every available signal to contribute to each imputation.

If this is right

Every observed modality contributes to the imputation rather than only a selected subset.
Performance stays high even when some client inputs contain noise.
Imputation quality reaches state-of-the-art levels on standard multi-modal benchmarks.
The resulting features support strong performance on real downstream tasks such as disease prediction from distributed health records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph construction could be applied to other heterogeneous data sources where full modality alignment is impossible.
Because only modality representations rather than raw records are exchanged, the approach may reduce the need for direct data sharing across clients.
One could test whether increasing the number of GNN layers or changing the graph construction rule further improves imputation when the number of modalities grows.

Load-bearing premise

A graph built from the observed modalities across clients contains enough structure for GNN message passing to recover accurate imputations even when some inputs are noisy.

What would settle it

On a controlled benchmark where modalities are artificially masked and noise is injected, the downstream task accuracy after GraphPL imputation is no higher than the accuracy obtained by the strongest baseline imputation method.

read the original abstract

Current research on distributed multi-modal learning typically assumes that clients can access complete information across all modalities, which may not hold in practice. In this paper, we explore patchwork learning, in which the modalities available to different clients vary, and the objective is to impute the missing modalities for each client in an unsupervised manner. Existing methods are shown not to fully utilize the modality information as they tend to rely on only a subset of the observed modalities. To address this issue, we propose GraphPL, which combines graph neural networks with patchwork learning to flexibly integrate all observed modalities and remains robust with noisy inputs. Experimental results show that GraphPL achieves SOTA performance on benchmark datasets. Our results on real-world distributed electronic health record dataset show GraphPL learns strong downstream features and enables tasks like disease prediction via superior modality imputation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphPL tries to use GNNs for unsupervised modality imputation in patchwork learning but supplies no experimental details to support its SOTA or robustness claims.

read the letter

This paper introduces GraphPL, which applies graph neural networks to impute missing modalities in patchwork learning for distributed multi-modal scenarios. It claims state-of-the-art results on benchmarks and strong downstream performance on electronic health records. The approach stands out for trying to integrate all available modalities via GNNs instead of subsets, while aiming for robustness to noise in an unsupervised setting. This matches a practical need where clients have varying modality access. It does well in identifying the limitation of prior methods and proposing a graph-based solution to leverage cross-client structure. The main concerns are the lack of any experimental details, such as graph construction method, model architecture, loss functions, comparison baselines, or numerical results with error bars. Without these, it's impossible to verify the SOTA performance or whether the GNN message passing actually produces accurate imputations when inputs are noisy. The stress-test point about potential lack of connectivity or noise propagation seems relevant given the absence of supporting evidence in the abstract. If the full paper has solid ablations and reproducible results, that would address this; as presented, the claims are hard to evaluate. This is relevant for researchers in multi-modal machine learning and federated settings, especially those handling incomplete data in real-world applications like healthcare. A reader focused on imputation methods might gain ideas from it, provided the experiments hold up under scrutiny. I think it deserves peer review to allow experts to check the technical soundness and experimental rigor, since the core idea has potential for practical impact.

Referee Report

2 major / 2 minor

Summary. The paper introduces GraphPL for patchwork learning in distributed multi-modal settings where clients have varying available modalities. It proposes constructing a graph from observed modalities across clients and applying GNN message passing for unsupervised imputation of missing modalities, claiming this flexibly integrates all observed data, remains robust to noisy inputs, outperforms prior methods, and yields strong downstream features on a real-world distributed EHR dataset for tasks such as disease prediction.

Significance. If the empirical claims hold with proper validation, the work could advance handling of incomplete multi-modal data in federated or distributed scenarios, especially privacy-sensitive domains like healthcare. The GNN-based integration of all observed modalities rather than subsets is a conceptually appealing direction, though its practical impact hinges on whether the graph structure reliably supports accurate imputation under noise without additional supervision.

major comments (2)

The central claim of SOTA performance and robustness to noise rests on GNN message passing over a graph built from observed modalities, yet the manuscript provides no details on graph construction (e.g., similarity metric, thresholding), the precise GNN architecture, or the unsupervised loss that supervises imputation of missing parts. This leaves the weakest assumption—that the resulting graph encodes sufficient cross-client correlations to recover values even when inputs are noisy—unexamined and potentially vulnerable to noise propagation during aggregation.
Experimental results are asserted to show SOTA on benchmarks and strong downstream performance on EHR data, but the provided text supplies no tables, figures, baselines, ablation studies, error bars, or statistical tests. Without these, it is impossible to assess whether the reported gains are load-bearing or attributable to the GNN component versus other design choices.

minor comments (2)

Notation for modalities, clients, and graph edges should be introduced with explicit definitions early in the method section to avoid ambiguity when describing the imputation process.
The abstract and introduction would benefit from a concise comparison table or bullet list highlighting how GraphPL differs from prior patchwork or multi-modal imputation baselines in its use of the full observed modality set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which identifies important areas for improving the clarity and empirical rigor of our manuscript. We address each major comment below and will incorporate the suggested changes in the revised version.

read point-by-point responses

Referee: The central claim of SOTA performance and robustness to noise rests on GNN message passing over a graph built from observed modalities, yet the manuscript provides no details on graph construction (e.g., similarity metric, thresholding), the precise GNN architecture, or the unsupervised loss that supervises imputation of missing parts. This leaves the weakest assumption—that the resulting graph encodes sufficient cross-client correlations to recover values even when inputs are noisy—unexamined and potentially vulnerable to noise propagation during aggregation.

Authors: We agree that the submitted manuscript did not include sufficient implementation details on these elements. In the revision we will add a dedicated subsection describing the graph construction procedure (including the similarity metric and thresholding approach), the exact GNN architecture and hyperparameters, and the form of the unsupervised loss. We will also include additional experiments that directly test robustness under controlled noise injection and provide analysis of how message passing leverages cross-client correlations, thereby examining the assumption more thoroughly. revision: yes
Referee: Experimental results are asserted to show SOTA on benchmarks and strong downstream performance on EHR data, but the provided text supplies no tables, figures, baselines, ablation studies, error bars, or statistical tests. Without these, it is impossible to assess whether the reported gains are load-bearing or attributable to the GNN component versus other design choices.

Authors: We acknowledge that the experimental presentation in the reviewed text was incomplete. The full manuscript contains comparative tables, figures, baseline methods, ablation studies isolating the GNN component, error bars from multiple runs, and statistical tests. To address the concern, we will expand the experimental section in the revision to prominently feature all of these elements, add further ablations on graph construction choices, and ensure the results clearly attribute gains to the proposed approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal is empirically driven without self-referential derivations

full rationale

The paper proposes GraphPL as a GNN-based extension to patchwork learning for unsupervised multi-modal imputation across clients with varying available modalities. No equations, derivations, or parameter-fitting steps appear in the provided abstract or description that could reduce any prediction to its own inputs by construction. The method is presented as a flexible integration of all observed modalities with claimed robustness to noise, justified directly by SOTA benchmark results and downstream performance on EHR data rather than by any self-citation chain, uniqueness theorem, or ansatz smuggled from prior work. The graph construction from observed modalities is an input to the GNN, not a fitted output renamed as a prediction, leaving the central claim externally falsifiable via experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no equations, experimental sections, or method details are present from which free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.0 · 5450 in / 1140 out tokens · 81779 ms · 2026-05-07T16:38:00.103972+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages · 3 internal anchors

[1]

GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning

INTRODUCTION Multi-modal learning [3] extracting rich information from various modalities. Diverse modalities offer complemen- tary analytical advantages via varied content, structure, and expression [4]. It typically uses paired multi-modal data, with inter-modal correspondences per sample, helping mod- els learn cross-modal associations for more compreh...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

The pipeline of GraphPL is illustrated in Fig

GRAPHPL GraphPL is designed for the distributed multi-modal patch- work learning scenario, where clients have incomplete and diverse visible modalities, and data sharing is prohibited by privacy constraints. The pipeline of GraphPL is illustrated in Fig. 3. We im- plement an individual V AE [14] for each modality. During the forward pass, the input modali...
[3]

EXPERIMENTS 3.1. Experiment Settings We evaluate GraphPL and baselines (MV AE [15], MM- V AE [13], MoPoE [16], CLAP [2]) on non-distributed bench- marks (PolyMNIST, MST, Quad-CelebA [16]) and real-world Electronic Health Record (EHR) dataset eICU [1]. The patch- work is constructed by randomly dropping some modalities on each client independently. Trainin...
[4]

Furthermore, compared to existing methods, GraphPL is better at balancing the use of infor- mation from different modalities and alleviates the modality collapse issue

CONCLUSIONS In this paper, we explore the use of GNNs for multi-modal feature fusion in patchwork learning, achieving an effec- tive and efficient patchwork learning method across various benchmark datasets. Furthermore, compared to existing methods, GraphPL is better at balancing the use of infor- mation from different modalities and alleviates the modal...
[5]

ACKNOWLEDGEMENT The work of Xingjian Hu, Jianhua Zhu and Liangcai Gao is supported by the projects of Beijing Nova Interdisciplinary Program (20240484647) and National Natural Science Foun- dation of China (No. 62376012), which is also a research achievement of State Key Laboratory of Multimedia Infor- mation Processing, National Engineering Research Cent...
[6]

The eicu collaborative research database, a freely available multi-center database for critical care research,

Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,”Scien- tific data, vol. 5, no. 1, pp. 1–13, 2018

2018
[7]

CLAP: Collaborative adaptation for patchwork learn- ing,

Sen Cui, Abudukelimu Wuerkaixi, Weishen Pan, Jian Liang, Lei Fang, Changshui Zhang, and Fei Wang, “CLAP: Collaborative adaptation for patchwork learn- ing,” inICLR, 2024

2024
[8]

Multimodal deep learning.,

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, Andrew Y Ng, et al., “Multimodal deep learning.,” inICML, 2011, vol. 11, pp. 689–696

2011
[9]

Multimodal machine learning: A survey and taxonomy,

Tadas Baltru ˇsaitis, Chaitanya Ahuja, and Louis-Philippe Morency, “Multimodal machine learning: A survey and taxonomy,”TPAMI, vol. 41, no. 2, pp. 423–443, 2018

2018
[10]

Fedmsplit: Correlation- adaptive federated multi-task learning across multi- modal split networks,

Jiayi Chen and Aidong Zhang, “Fedmsplit: Correlation- adaptive federated multi-task learning across multi- modal split networks,” inKDD, 2022, pp. 87–96

2022
[11]

Cross-modal federated human activity recognition via modality-agnostic and modality-specific representation learning,

Xiaoshan Yang, Baochen Xiong, Yi Huang, and Chang- sheng Xu, “Cross-modal federated human activity recognition via modality-agnostic and modality-specific representation learning,” inAAAI, 2022, vol. 36, pp. 3063–3071

2022
[12]

Patchwork learn- ing: A paradigm towards integrative analysis across diverse biomedical data sources,

Suraj Rajendran, Weishen Pan, Mert R Sabuncu, Yong Chen, Jiayu Zhou, and Fei Wang, “Patchwork learn- ing: A paradigm towards integrative analysis across diverse biomedical data sources,”arXiv preprint arXiv:2305.06217, 2023

work page arXiv 2023
[13]

Kame: Knowledge- based attention model for diagnosis prediction in health- care,

Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao, “Kame: Knowledge- based attention model for diagnosis prediction in health- care,” inCIKM, 2018, pp. 743–752

2018
[14]

Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks,

Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao, “Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks,” inKDD, 2017, pp. 1903–1911

2017
[15]

Mitigating modality collapse in multimodal vaes via impartial optimization,

Adri ´an Javaloy, Maryam Meghdadi, and Isabel Valera, “Mitigating modality collapse in multimodal vaes via impartial optimization,” inICML. PMLR, 2022, pp. 9938–9964

2022
[16]

Multimodal patient representation learning with miss- ing modalities and labels,

Zhenbang Wu, Anant Dadu, Nicholas Tustison, Brian Avants, Mike Nalls, Jimeng Sun, and Faraz Faghri, “Multimodal patient representation learning with miss- ing modalities and labels,” inICLR, 2024

2024
[17]

Training products of experts by minimizing contrastive divergence,

Geoffrey E Hinton, “Training products of experts by minimizing contrastive divergence,”Neural computa- tion, vol. 14, no. 8, pp. 1771–1800, 2002

2002
[18]

Variational mixture-of-experts autoencoders for multi-modal deep generative models,

Yuge Shi, Brooks Paige, Philip Torr, et al., “Variational mixture-of-experts autoencoders for multi-modal deep generative models,”NeurIPS, vol. 32, 2019

2019
[19]

Auto-Encoding Variational Bayes

Diederik P Kingma, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review arXiv 2013
[20]

Multimodal genera- tive models for scalable weakly-supervised learning,

Mike Wu and Noah Goodman, “Multimodal genera- tive models for scalable weakly-supervised learning,” NeurIPS, vol. 31, 2018

2018
[21]

arXiv preprint arXiv:2105.02470 , year=

Thomas M Sutter, Imant Daunhawer, and Julia E V ogt, “Generalized multimodal elbo,”arXiv preprint arXiv:2105.02470, 2021

work page arXiv 2021
[22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling, “Semi-supervised classification with graph convolutional networks,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review arXiv 2016
[23]

Shufflenet: An extremely efficient convolutional neural network for mobile devices,

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” inCVPR, 2018, pp. 6848–6856

2018
[24]

beta-vae: Learn- ing basic visual concepts with a constrained variational framework.,

Irina Higgins, Loic Matthey, Arka Pal, Christopher P Burgess, Xavier Glorot, Matthew M Botvinick, Shakir Mohamed, and Alexander Lerchner, “beta-vae: Learn- ing basic visual concepts with a constrained variational framework.,”ICLR, vol. 3, 2017

2017
[25]

Communication-efficient learning of deep networks from decentralized data,

Brendan McMahan, Eider Moore, Daniel Ram- age, Seth Hampson, and Blaise Aguera y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. PMLR, 2017, pp. 1273–1282

2017
[26]

Unite and conquer: Plug & play multi-modal synthesis using diffusion models,

Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, and Vishal M Patel, “Unite and conquer: Plug & play multi-modal synthesis using diffusion models,” inCVPR, 2023, pp. 6070–6079

2023
[27]

General facial representa- tion learning in a visual-linguistic manner.arXiv preprint arXiv:2112.03109, 2021

Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, and Fang Wen, “General facial represen- tation learning in a visual-linguistic manner,”arXiv preprint arXiv:2112.03109, 2021

work page arXiv 2021

[1] [1]

GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning

INTRODUCTION Multi-modal learning [3] extracting rich information from various modalities. Diverse modalities offer complemen- tary analytical advantages via varied content, structure, and expression [4]. It typically uses paired multi-modal data, with inter-modal correspondences per sample, helping mod- els learn cross-modal associations for more compreh...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

The pipeline of GraphPL is illustrated in Fig

GRAPHPL GraphPL is designed for the distributed multi-modal patch- work learning scenario, where clients have incomplete and diverse visible modalities, and data sharing is prohibited by privacy constraints. The pipeline of GraphPL is illustrated in Fig. 3. We im- plement an individual V AE [14] for each modality. During the forward pass, the input modali...

[3] [3]

EXPERIMENTS 3.1. Experiment Settings We evaluate GraphPL and baselines (MV AE [15], MM- V AE [13], MoPoE [16], CLAP [2]) on non-distributed bench- marks (PolyMNIST, MST, Quad-CelebA [16]) and real-world Electronic Health Record (EHR) dataset eICU [1]. The patch- work is constructed by randomly dropping some modalities on each client independently. Trainin...

[4] [4]

Furthermore, compared to existing methods, GraphPL is better at balancing the use of infor- mation from different modalities and alleviates the modality collapse issue

CONCLUSIONS In this paper, we explore the use of GNNs for multi-modal feature fusion in patchwork learning, achieving an effec- tive and efficient patchwork learning method across various benchmark datasets. Furthermore, compared to existing methods, GraphPL is better at balancing the use of infor- mation from different modalities and alleviates the modal...

[5] [5]

ACKNOWLEDGEMENT The work of Xingjian Hu, Jianhua Zhu and Liangcai Gao is supported by the projects of Beijing Nova Interdisciplinary Program (20240484647) and National Natural Science Foun- dation of China (No. 62376012), which is also a research achievement of State Key Laboratory of Multimedia Infor- mation Processing, National Engineering Research Cent...

[6] [6]

The eicu collaborative research database, a freely available multi-center database for critical care research,

Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,”Scien- tific data, vol. 5, no. 1, pp. 1–13, 2018

2018

[7] [7]

CLAP: Collaborative adaptation for patchwork learn- ing,

Sen Cui, Abudukelimu Wuerkaixi, Weishen Pan, Jian Liang, Lei Fang, Changshui Zhang, and Fei Wang, “CLAP: Collaborative adaptation for patchwork learn- ing,” inICLR, 2024

2024

[8] [8]

Multimodal deep learning.,

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, Andrew Y Ng, et al., “Multimodal deep learning.,” inICML, 2011, vol. 11, pp. 689–696

2011

[9] [9]

Multimodal machine learning: A survey and taxonomy,

Tadas Baltru ˇsaitis, Chaitanya Ahuja, and Louis-Philippe Morency, “Multimodal machine learning: A survey and taxonomy,”TPAMI, vol. 41, no. 2, pp. 423–443, 2018

2018

[10] [10]

Fedmsplit: Correlation- adaptive federated multi-task learning across multi- modal split networks,

Jiayi Chen and Aidong Zhang, “Fedmsplit: Correlation- adaptive federated multi-task learning across multi- modal split networks,” inKDD, 2022, pp. 87–96

2022

[11] [11]

Cross-modal federated human activity recognition via modality-agnostic and modality-specific representation learning,

Xiaoshan Yang, Baochen Xiong, Yi Huang, and Chang- sheng Xu, “Cross-modal federated human activity recognition via modality-agnostic and modality-specific representation learning,” inAAAI, 2022, vol. 36, pp. 3063–3071

2022

[12] [12]

Patchwork learn- ing: A paradigm towards integrative analysis across diverse biomedical data sources,

Suraj Rajendran, Weishen Pan, Mert R Sabuncu, Yong Chen, Jiayu Zhou, and Fei Wang, “Patchwork learn- ing: A paradigm towards integrative analysis across diverse biomedical data sources,”arXiv preprint arXiv:2305.06217, 2023

work page arXiv 2023

[13] [13]

Kame: Knowledge- based attention model for diagnosis prediction in health- care,

Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao, “Kame: Knowledge- based attention model for diagnosis prediction in health- care,” inCIKM, 2018, pp. 743–752

2018

[14] [14]

Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks,

Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao, “Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks,” inKDD, 2017, pp. 1903–1911

2017

[15] [15]

Mitigating modality collapse in multimodal vaes via impartial optimization,

Adri ´an Javaloy, Maryam Meghdadi, and Isabel Valera, “Mitigating modality collapse in multimodal vaes via impartial optimization,” inICML. PMLR, 2022, pp. 9938–9964

2022

[16] [16]

Multimodal patient representation learning with miss- ing modalities and labels,

Zhenbang Wu, Anant Dadu, Nicholas Tustison, Brian Avants, Mike Nalls, Jimeng Sun, and Faraz Faghri, “Multimodal patient representation learning with miss- ing modalities and labels,” inICLR, 2024

2024

[17] [17]

Training products of experts by minimizing contrastive divergence,

Geoffrey E Hinton, “Training products of experts by minimizing contrastive divergence,”Neural computa- tion, vol. 14, no. 8, pp. 1771–1800, 2002

2002

[18] [18]

Variational mixture-of-experts autoencoders for multi-modal deep generative models,

Yuge Shi, Brooks Paige, Philip Torr, et al., “Variational mixture-of-experts autoencoders for multi-modal deep generative models,”NeurIPS, vol. 32, 2019

2019

[19] [19]

Auto-Encoding Variational Bayes

Diederik P Kingma, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review arXiv 2013

[20] [20]

Multimodal genera- tive models for scalable weakly-supervised learning,

Mike Wu and Noah Goodman, “Multimodal genera- tive models for scalable weakly-supervised learning,” NeurIPS, vol. 31, 2018

2018

[21] [21]

arXiv preprint arXiv:2105.02470 , year=

Thomas M Sutter, Imant Daunhawer, and Julia E V ogt, “Generalized multimodal elbo,”arXiv preprint arXiv:2105.02470, 2021

work page arXiv 2021

[22] [22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling, “Semi-supervised classification with graph convolutional networks,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review arXiv 2016

[23] [23]

Shufflenet: An extremely efficient convolutional neural network for mobile devices,

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” inCVPR, 2018, pp. 6848–6856

2018

[24] [24]

beta-vae: Learn- ing basic visual concepts with a constrained variational framework.,

Irina Higgins, Loic Matthey, Arka Pal, Christopher P Burgess, Xavier Glorot, Matthew M Botvinick, Shakir Mohamed, and Alexander Lerchner, “beta-vae: Learn- ing basic visual concepts with a constrained variational framework.,”ICLR, vol. 3, 2017

2017

[25] [25]

Communication-efficient learning of deep networks from decentralized data,

Brendan McMahan, Eider Moore, Daniel Ram- age, Seth Hampson, and Blaise Aguera y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. PMLR, 2017, pp. 1273–1282

2017

[26] [26]

Unite and conquer: Plug & play multi-modal synthesis using diffusion models,

Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, and Vishal M Patel, “Unite and conquer: Plug & play multi-modal synthesis using diffusion models,” inCVPR, 2023, pp. 6070–6079

2023

[27] [27]

General facial representa- tion learning in a visual-linguistic manner.arXiv preprint arXiv:2112.03109, 2021

Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, and Fang Wen, “General facial represen- tation learning in a visual-linguistic manner,”arXiv preprint arXiv:2112.03109, 2021

work page arXiv 2021