Multimodal Graph Negative Learning

Guang Zeng; Guoren Wang; Hongchao Qin; Rong-Hua Li; Xunkai Li; Xu Wang; Zhengyu Wu

arxiv: 2606.12863 · v2 · pith:5XKZZUQ4new · submitted 2026-06-11 · 💻 cs.LG

Multimodal Graph Negative Learning

Zhengyu Wu , Xu Wang , Hongchao Qin , Xunkai Li , Guang Zeng , Rong-Hua Li , Guoren Wang This is my paper

Pith reviewed 2026-06-27 07:34 UTC · model grok-4.3

classification 💻 cs.LG

keywords multimodal attributed graphsnegative learningnode classificationbranch semantic imbalancegraph neural networksreliability arbitrationcross-branch guidance

0 comments

The pith

GraphMNL replaces forced imitation across multimodal graph branches with negative learning on unlikely classes to limit bias spread.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multimodal attributed graphs combine topology with text and image attributes, each acting as a separate branch for node representations. These branches often differ in reliability from node to node, creating semantic imbalance. Standard approaches force inferior branches to match a dominant prediction, which can copy errors when the dominant branch is itself biased. GraphMNL instead applies negative learning to mark classes a node is unlikely to belong to, using graph-aware checks to decide when such guidance is stable. Supervised losses handle the true target while negative learning only suppresses alternatives, preserving useful branch-specific semantics.

Core claim

GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes so that supervised losses learn the correct class while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable.

What carries the argument

Graph-aware reliability arbitration that selects when to apply target-preserving negative learning on non-target classes instead of positive imitation.

If this is right

Supervised losses remain focused on the true target class even when branches disagree.
Original semantics in any branch are less likely to be overwritten by a misleading dominant signal.
Performance gains appear on datasets where branch quality varies strongly per node.
The separation of target supervision from cross-branch guidance reduces error propagation across modalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same negative-learning decoupling could apply to other multi-view settings where view reliability varies by instance.
If arbitration proves robust, similar gating might improve ensemble methods that currently rely on agreement.
Scalability tests on larger multimodal graphs would show whether the branch library and arbitration add significant overhead.

Load-bearing premise

The reliability arbitration step can correctly label which branches are dominant versus inferior for each node without adding new bias.

What would settle it

Replace the arbitration scores with random branch selection on the same grocery and reddit datasets and measure whether accuracy and F1 fall below the imitation baselines.

Figures

Figures reproduced from arXiv: 2606.12863 by Guang Zeng, Guoren Wang, Hongchao Qin, Rong-Hua Li, Xunkai Li, Xu Wang, Zhengyu Wu.

**Figure 1.** Figure 1: Overview of GraphMNL. The framework contrasts positive alignment with negative learning, builds a modality and graph branch library, selects [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Synthetic accuracy trends across perturbation protocols. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Hyperparameter sensitivity of GraphMNL to exclusion temperature [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Multimodal attributed graphs (MAGs) integrate graph topology with heterogeneous modality attributes, such as text and images, thereby enabling richer modeling of complex relational systems. However, such expressiveness also makes learning on MAGs depend on multiple semantic sources, including structural topology, textual and visual attributes, each of which can be regarded as a branch for node representation. Node-level branch semantic imbalance arises when these branches differ across nodes in semantic informativeness and reliability: a branch that provides discriminative semantics for one node may mislead another due to bias in modality quality or structural context. Existing methods often mitigate such heterogeneity through cross-branch agreement or alignment, implicitly treating the dominant prediction as reliable supervision. When the dominant branch is biased, forced imitation may propagate its bias to other branches and suppress original semantics that are useful for classification. We propose GraphMNL, a graph-aware multimodal negative learning framework that addresses this issue by using Negative Learning as cross-branch guidance. Instead of forcing inferior branches to imitate a teacher prediction, the model teaches them which classes a node is unlikely to belong to. GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes. This design decouples target supervision from branch guidance so that supervised losses learn the correct class, while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable. Through the comprehensive experimental evaluation, GraphMNL achieves the best performance on Grocery datasets with 72.47% accuracy and 76.60 F1 score on Reddit M datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphMNL's negative learning for cross-branch guidance in multimodal graphs is a reasonable shift from alignment, but the arbitration step that makes the decoupling work is not shown in enough detail to judge.

read the letter

The main thing here is that the paper replaces forced imitation across branches with negative learning that tells inferior branches what classes to avoid. It adds a branch library and graph-aware reliability arbitration to decide when to apply that guidance, plus target-preserving supervision so the main loss still learns the right class. That combination is the actual novelty compared with standard cross-branch alignment.

The motivation is clear and the problem it targets shows up in real multimodal attributed graphs for recommendation and social data. The reported numbers on the Grocery and Reddit M sets are the best in their tables, which is at least consistent with the claim.

The soft spot is the arbitration itself. The abstract describes it as using graph topology plus modality attributes to label dominant versus inferior branches per node, then gating the negative learning. No equation, algorithm sketch, or ablation appears in the provided text, so there is no way to check whether the step actually separates signal from noise or just adds another source of error. If neighborhoods correlate with modality bias, the gating could either pass the wrong signal or suppress useful original features. The performance figures also come without error bars, baseline details, or data-split rules, which makes the gains hard to interpret.

This is for people already working on multimodal graph models in recsys or social networks who want to try negative learning as an alternative to alignment. A reader outside that niche will not get much. The work shows clear thinking about the bias-propagation issue and honest engagement with prior alignment methods, so it deserves a serious referee to see whether the full experiments and ablations close the gap on the arbitration claim.

Referee Report

2 major / 1 minor

Summary. The paper proposes GraphMNL, a graph-aware multimodal negative learning framework for multimodal attributed graphs (MAGs). It addresses node-level branch semantic imbalance by building a branch library, using graph-aware reliability arbitration to identify dominant versus inferior branches, gating unstable transfer, and applying target-preserving negative learning over non-target classes. This decouples supervised target loss from cross-branch negative guidance instead of forcing imitation. The manuscript reports state-of-the-art results, including 72.47% accuracy on Grocery datasets and 76.60 F1 on Reddit M datasets.

Significance. If the arbitration mechanism correctly identifies reliable branches from topology and modalities without systematic bias, the approach could provide a useful alternative to agreement-based multimodal fusion in graphs, reducing bias propagation while preserving original semantics. The reported gains on the cited datasets would indicate practical value for heterogeneous modality settings if supported by ablations and reproducibility details.

major comments (2)

[Abstract] Abstract: The central decoupling claim rests on graph-aware reliability arbitration to identify dominant/inferior branches per node, yet no equation, algorithm, pseudocode, or threshold definition is supplied for computing reliability scores from graph topology plus modality attributes. This is load-bearing, as erroneous arbitration would either propagate the wrong branch's bias or suppress useful semantics via the subsequent gating and target-preserving negative learning.
[Abstract] Abstract and Experiments section: Performance numbers (72.47% accuracy, 76.60 F1) are stated without reference to specific baselines, error bars, statistical tests, data splits, or exclusion rules. This prevents verification of the superiority claim and leaves the practical impact of the arbitration and negative-learning components unassessed.

minor comments (1)

[Abstract] Abstract: The term 'branch library' is introduced without a brief definition or cross-reference, which may confuse readers unfamiliar with the framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will make the requested revisions to improve methodological transparency and experimental reporting.

read point-by-point responses

Referee: [Abstract] Abstract: The central decoupling claim rests on graph-aware reliability arbitration to identify dominant/inferior branches per node, yet no equation, algorithm, pseudocode, or threshold definition is supplied for computing reliability scores from graph topology plus modality attributes. This is load-bearing, as erroneous arbitration would either propagate the wrong branch's bias or suppress useful semantics via the subsequent gating and target-preserving negative learning.

Authors: We agree that the abstract provides only a high-level description and does not include the explicit equations, algorithm, or threshold definitions for the graph-aware reliability arbitration. The full manuscript describes the framework conceptually but lacks the concrete implementation details needed for verification. We will add a new subsection (or algorithm box) in the revised manuscript that supplies the missing equations for reliability score computation from topology and modality attributes, the arbitration logic, gating thresholds, and pseudocode. This will directly support the decoupling claim. revision: yes
Referee: [Abstract] Abstract and Experiments section: Performance numbers (72.47% accuracy, 76.60 F1) are stated without reference to specific baselines, error bars, statistical tests, data splits, or exclusion rules. This prevents verification of the superiority claim and leaves the practical impact of the arbitration and negative-learning components unassessed.

Authors: The cited numbers are summary results from the Experiments section. We acknowledge that the current presentation omits the requested verification details. In the revision we will (i) add a footnote or parenthetical reference in the abstract to the exact baselines, (ii) expand the Experiments section with error bars (standard deviations over runs), statistical tests, explicit data-split descriptions, and exclusion criteria, and (iii) include additional ablation tables isolating the arbitration and negative-learning components. These changes will allow readers to assess the practical impact. revision: yes

Circularity Check

0 steps flagged

No equations or self-citations reduce claims to inputs by construction

full rationale

The provided abstract and framework description introduce GraphMNL as a conceptual architecture involving branch library construction, graph-aware reliability arbitration, gating, and target-preserving negative learning, but contain no mathematical derivations, equations, or parameter-fitting steps that could reduce outputs to inputs by construction. No self-citations are invoked as load-bearing premises, and the decoupling claim is presented at the level of design rationale rather than a closed-form derivation. This leaves the proposal self-contained as an empirical framework without detectable circularity in any derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on several introduced components whose correctness is not independently evidenced in the abstract; no free parameters are explicitly named but implicit thresholds for branch identification are required.

free parameters (1)

reliability arbitration thresholds
Used to classify branches as dominant or inferior; values not specified in abstract.

axioms (1)

domain assumption Graph topology supplies reliable context for determining branch informativeness
Invoked when building graph-aware reliability arbitration.

invented entities (2)

branch library no independent evidence
purpose: Stores and manages per-branch node representations for arbitration and negative learning
New component introduced to support the framework
target-preserving negative learning no independent evidence
purpose: Applies negative supervision only over non-target classes while preserving target supervision
Core mechanism to decouple target loss from branch guidance

pith-pipeline@v0.9.1-grok · 5824 in / 1294 out tokens · 16103 ms · 2026-06-27T07:34:43.360596+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 5 linked inside Pith

[1]

Neural col- laborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural col- laborative filtering,” inProceedings of the 26th International Conference on World Wide Web, 2017

2017
[2]

Bridging language and items for retrieval and recommendation: Benchmarking llms as semantic encoders,

Y . Hou, J. Li, Z. He, X. Fu, A. Yan, X. Chen, and J. McAuley, “Bridging language and items for retrieval and recommendation: Benchmarking llms as semantic encoders,”arXiv preprint arXiv:2403.03952, 2024

Pith/arXiv arXiv 2024
[3]

Async learned user embeddings for ads delivery optimization,

M. Tang, M. Liu, H. Li, J. Yang, C. Wei, B. Li, D. Li, R. Xu, Y . Xu, Z. Zhang, X. Wang, L. Liu, Y . Xie, C. Liu, L. Fawaz, L. Li, H. Wang, B. Zhu, and S. Reddy, “Async learned user embeddings for ads delivery optimization,” 2024. [Online]. Available: https://arxiv.org/abs/2406.05898

arXiv 2024
[4]

Mmgcn: Multi-modal graph convolution network for personalized recommenda- tion of micro-video,

Y . Wei, X. Wang, L. Nie, X. He, R. Hong, and T.-S. Chua, “Mmgcn: Multi-modal graph convolution network for personalized recommenda- tion of micro-video,” inProceedings of the 27th ACM International Conference on Multimedia, 2019

2019
[5]

Mgat: Multimodal graph attention network for recommendation,

Z. Tao, Y . Wei, X. Wang, X. He, X. Huang, and T.-S. Chua, “Mgat: Multimodal graph attention network for recommendation,” inInforma- tion Processing and Management, 2020

2020
[6]

Multimodal heterogeneous graph attention network,

Xiangen, M. Jia, Y . Jiang, F. Dong, H. Zhu, X. Lin, H. Yu, and Chen, “Multimodal heterogeneous graph attention network,”Neural Computing and Applications, vol. 35, no. 4, pp. 3357–3372, 2023

2023
[7]

Lgmrec: Local and global graph learning for multimodal recommendation,

Z. Guo, J. Li, G. Li, C. Wang, S. Shi, and B. Ruan, “Lgmrec: Local and global graph learning for multimodal recommendation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 8454–8462

2024
[8]

Unigraph2: Learning a unified embedding space to bind multimodal graphs,

Y . He, Y . Sui, X. He, Y . Liu, Y . Sun, and B. Hooi, “Unigraph2: Learning a unified embedding space to bind multimodal graphs,”arXiv preprint arXiv:2502.00806, 2025

arXiv 2025
[9]

Disentangling ho- mophily and heterophily in multimodal graph clustering,

Z. Guo, Z. Shen, X. Xie, L. Wen, and Z. Kang, “Disentangling ho- mophily and heterophily in multimodal graph clustering,”arXiv preprint arXiv:2507.15253, 2025

arXiv 2025
[10]

Cross-contrastive clustering for multimodal attributed graphs with dual graph filtering,

H. Zheng, R. Yang, H. Wang, and J. Xu, “Cross-contrastive clustering for multimodal attributed graphs with dual graph filtering,”arXiv preprint arXiv:2511.20030, 2025

arXiv 2025
[11]

Graph4mm: Weaving multi- modal learning with structural information,

X. Ning, D. Fu, T. Wei, W. Xu, and J. He, “Graph4mm: Weaving multi- modal learning with structural information,” inInternational Conference on Machine Learning, 2025

2025
[12]

Birds of a feather: Homophily in social networks,

M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather: Homophily in social networks,”Annual Review of SocioLoGy, vol. 27, no. 1, pp. 415–444, 2001

2001
[13]

Mixing patterns in networks,

M. E. J. Newman, “Mixing patterns in networks,”Physical Review E, vol. 67, no. 2, pp. 26 126–26 126, 2003

2003
[14]

Is homophily a necessity for graph neural networks?

Y . Ma, X. Liu, N. Shah, and J. Tang, “Is homophily a necessity for graph neural networks?”International Conference on Learning Representations, ICLR, 2021

2021
[15]

Revisiting heterophily for graph neural networks,

S. Luan, C. Hua, Q. Lu, J. Zhu, M. Zhao, S. Zhang, X.-W. Chang, and D. Precup, “Revisiting heterophily for graph neural networks,”Advances in neural information processing systems, NeurIPS, 2022

2022
[16]

Beyond homophily in graph neural networks: Current limitations and effective designs,

J. Zhu, Y . Yan, L. Zhao, M. Heimann, L. Akoglu, and D. Koutra, “Beyond homophily in graph neural networks: Current limitations and effective designs,”Advances in Neural Information Processing Systems, NeurIPS, 2020

2020
[17]

Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods,

D. Lim, F. Hohne, X. Li, S. L. Huang, V . Gupta, O. Bhalerao, and S. N. Lim, “Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods,” inarXiv, 2021

2021
[18]

Multimodal negative learning,

B. Gong, X. Gao, P. Zhu, Q. Hu, and B. Cao, “Multimodal negative learning,”arXiv preprint arXiv:2510.20877, 2025

arXiv 2025
[19]

Smil: Multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “Smil: Multimodal learning with severely missing modality,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 2302–2310

2021
[20]

Are multi- modal transformers robust to missing modality?

M. Ma, J. Ren, L. Zhao, D. Testuggine, and X. Peng, “Are multi- modal transformers robust to missing modality?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 177–18 186

2022
[21]

Deep multimodal learning with missing modality: A survey,

Renjie, H. Wu, H.-T. Wang, G. Chen, and Carneiro, “Deep multimodal learning with missing modality: A survey,” https://arxiv.org/abs/2409.07825, 2024

Pith/arXiv arXiv 2024
[22]

When graph meets multimodal: Benchmarking and meditating on multimodal attributed graphs learning,

H. Yan, C. Li, J. Yin, Z. Yu, W. Han, M. Li, Z. Zeng, H. Sun, and S. Wang, “When graph meets multimodal: Benchmarking and meditating on multimodal attributed graphs learning,” 2025. [Online]. Available: https://arxiv.org/abs/2410.09132

arXiv 2025
[23]

Mo- saic of modalities: A comprehensive benchmark for multimodal graph learning,

J. Zhu, Y . Zhou, S. Qian, Z. He, T. Zhao, N. Shah, and D. Koutra, “Mo- saic of modalities: A comprehensive benchmark for multimodal graph learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 14 215–14 224

2025
[24]

Mlaga: Multimodal large language and graph assistant,

D. Fan, Y . Fang, J. Liu, D. Difallah, and Q. Tan, “Mlaga: Multimodal large language and graph assistant,”arXiv preprint arXiv:2506.02568, 2025

Pith/arXiv arXiv 2025
[25]

GRAPHGPT- O: Synergistic multimodal comprehension and generation on graphs,

Y . Fang, B. Jin, J. Shen, S. Ding, Q. Tan, and J. Han, “GRAPHGPT- O: Synergistic multimodal comprehension and generation on graphs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 19 467–19 476

2025
[26]

Ntsformer: A self-teaching graph transformer for multimodal isolated cold-start node classification,

J. Hu, Y . He, Y . Li, B. Hooi, and B. He, “Ntsformer: A self-teaching graph transformer for multimodal isolated cold-start node classification,” arXiv preprint arXiv:2507.04870, 2025

arXiv 2025
[27]

Multi-modal learning with missing modality via shared-specific feature modelling,

H. Wang, Y . Chen, C. Ma, J. Avery, L. Hull, and G. Carneiro, “Multi-modal learning with missing modality via shared-specific feature modelling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 878–15 887

2023
[28]

Mmanet: Margin-aware distillation and modality-aware regularization for incomplete multimodal learning,

Y . L. Shicai Wei, Chunbo Luo, “Mmanet: Margin-aware distillation and modality-aware regularization for incomplete multimodal learning,” Proceedings of the AAAI Conference on Artificial Intelligence, 2023

2023
[29]

Leveraging foundation models for multi-modal federated learning with incomplete modality,

Liwei, J. Che, X. Wang, F. Liu, and Ma, “Leveraging foundation models for multi-modal federated learning with incomplete modality,”Machine Learning and Knowledge Discovery in Databases. Research Track, pp. 401–417, 2024

2024
[30]

Fedmm: Federated multimodal learning with modality heterogeneity in computational pathology,

J. X. Yuanzhe Peng, Jieming Bian, “Fedmm: Federated multimodal learning with modality heterogeneity in computational pathology,” ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1696–1700, 2024

2024
[31]

Learning from comple- mentary labels,

T. Ishida, G. Niu, W. Hu, and M. Sugiyama, “Learning from comple- mentary labels,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017
[32]

Learning from multiple complementary labels,

F. Lei, K. Takuo, H. Bo, N. Gang, A. Bo, and M. Sugiyama, “Learning from multiple complementary labels,” inInternational Conference on Machine Learning (ICML), 2018

2018
[33]

NLNL: Negative learning for noisy labels,

Y . Kim, J. Yim, J. Yun, and J. Kim, “NLNL: Negative learning for noisy labels,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019
[34]

Openmag: A comprehensive benchmark for multimodal- attributed graph,

C. Wan, X. Li, Y . Zuo, H. Deng, S. Li, B. Fan, H. Qin, R. Li, and G. Wang, “Openmag: A comprehensive benchmark for multimodal- attributed graph,”arXiv preprint arXiv:2602.05576, 2026

arXiv 2026
[35]

Pitfalls of graph neural network evaluation,

O. Shchur, M. Mumme, A. Bojchevski, and S. G ¨unnemann, “Pitfalls of graph neural network evaluation,”arXiv preprint arXiv:1811.05868, 2018

Pith/arXiv arXiv 2018
[36]

Open graph benchmark: Datasets for machine learning on graphs,

W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 22 118–22 133

2020
[37]

Graph- saint: Graph sampling based inductive learning method,

H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V . Prasanna, “Graph- saint: Graph sampling based inductive learning method,” inInternational conference on learning representations, ICLR, 2020

2020
[38]

Roberta: A robustly optimized bert pretraining approach,

Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” inarXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907
[39]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the 38th International Conference on Machine Learning (ICML), ser. PMLR, vol. 139, 2021, pp. 8748–8763

2021

[1] [1]

Neural col- laborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural col- laborative filtering,” inProceedings of the 26th International Conference on World Wide Web, 2017

2017

[2] [2]

Bridging language and items for retrieval and recommendation: Benchmarking llms as semantic encoders,

Y . Hou, J. Li, Z. He, X. Fu, A. Yan, X. Chen, and J. McAuley, “Bridging language and items for retrieval and recommendation: Benchmarking llms as semantic encoders,”arXiv preprint arXiv:2403.03952, 2024

Pith/arXiv arXiv 2024

[3] [3]

Async learned user embeddings for ads delivery optimization,

M. Tang, M. Liu, H. Li, J. Yang, C. Wei, B. Li, D. Li, R. Xu, Y . Xu, Z. Zhang, X. Wang, L. Liu, Y . Xie, C. Liu, L. Fawaz, L. Li, H. Wang, B. Zhu, and S. Reddy, “Async learned user embeddings for ads delivery optimization,” 2024. [Online]. Available: https://arxiv.org/abs/2406.05898

arXiv 2024

[4] [4]

Mmgcn: Multi-modal graph convolution network for personalized recommenda- tion of micro-video,

Y . Wei, X. Wang, L. Nie, X. He, R. Hong, and T.-S. Chua, “Mmgcn: Multi-modal graph convolution network for personalized recommenda- tion of micro-video,” inProceedings of the 27th ACM International Conference on Multimedia, 2019

2019

[5] [5]

Mgat: Multimodal graph attention network for recommendation,

Z. Tao, Y . Wei, X. Wang, X. He, X. Huang, and T.-S. Chua, “Mgat: Multimodal graph attention network for recommendation,” inInforma- tion Processing and Management, 2020

2020

[6] [6]

Multimodal heterogeneous graph attention network,

Xiangen, M. Jia, Y . Jiang, F. Dong, H. Zhu, X. Lin, H. Yu, and Chen, “Multimodal heterogeneous graph attention network,”Neural Computing and Applications, vol. 35, no. 4, pp. 3357–3372, 2023

2023

[7] [7]

Lgmrec: Local and global graph learning for multimodal recommendation,

Z. Guo, J. Li, G. Li, C. Wang, S. Shi, and B. Ruan, “Lgmrec: Local and global graph learning for multimodal recommendation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 8454–8462

2024

[8] [8]

Unigraph2: Learning a unified embedding space to bind multimodal graphs,

Y . He, Y . Sui, X. He, Y . Liu, Y . Sun, and B. Hooi, “Unigraph2: Learning a unified embedding space to bind multimodal graphs,”arXiv preprint arXiv:2502.00806, 2025

arXiv 2025

[9] [9]

Disentangling ho- mophily and heterophily in multimodal graph clustering,

Z. Guo, Z. Shen, X. Xie, L. Wen, and Z. Kang, “Disentangling ho- mophily and heterophily in multimodal graph clustering,”arXiv preprint arXiv:2507.15253, 2025

arXiv 2025

[10] [10]

Cross-contrastive clustering for multimodal attributed graphs with dual graph filtering,

H. Zheng, R. Yang, H. Wang, and J. Xu, “Cross-contrastive clustering for multimodal attributed graphs with dual graph filtering,”arXiv preprint arXiv:2511.20030, 2025

arXiv 2025

[11] [11]

Graph4mm: Weaving multi- modal learning with structural information,

X. Ning, D. Fu, T. Wei, W. Xu, and J. He, “Graph4mm: Weaving multi- modal learning with structural information,” inInternational Conference on Machine Learning, 2025

2025

[12] [12]

Birds of a feather: Homophily in social networks,

M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather: Homophily in social networks,”Annual Review of SocioLoGy, vol. 27, no. 1, pp. 415–444, 2001

2001

[13] [13]

Mixing patterns in networks,

M. E. J. Newman, “Mixing patterns in networks,”Physical Review E, vol. 67, no. 2, pp. 26 126–26 126, 2003

2003

[14] [14]

Is homophily a necessity for graph neural networks?

Y . Ma, X. Liu, N. Shah, and J. Tang, “Is homophily a necessity for graph neural networks?”International Conference on Learning Representations, ICLR, 2021

2021

[15] [15]

Revisiting heterophily for graph neural networks,

S. Luan, C. Hua, Q. Lu, J. Zhu, M. Zhao, S. Zhang, X.-W. Chang, and D. Precup, “Revisiting heterophily for graph neural networks,”Advances in neural information processing systems, NeurIPS, 2022

2022

[16] [16]

Beyond homophily in graph neural networks: Current limitations and effective designs,

J. Zhu, Y . Yan, L. Zhao, M. Heimann, L. Akoglu, and D. Koutra, “Beyond homophily in graph neural networks: Current limitations and effective designs,”Advances in Neural Information Processing Systems, NeurIPS, 2020

2020

[17] [17]

Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods,

D. Lim, F. Hohne, X. Li, S. L. Huang, V . Gupta, O. Bhalerao, and S. N. Lim, “Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods,” inarXiv, 2021

2021

[18] [18]

Multimodal negative learning,

B. Gong, X. Gao, P. Zhu, Q. Hu, and B. Cao, “Multimodal negative learning,”arXiv preprint arXiv:2510.20877, 2025

arXiv 2025

[19] [19]

Smil: Multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “Smil: Multimodal learning with severely missing modality,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 2302–2310

2021

[20] [20]

Are multi- modal transformers robust to missing modality?

M. Ma, J. Ren, L. Zhao, D. Testuggine, and X. Peng, “Are multi- modal transformers robust to missing modality?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 177–18 186

2022

[21] [21]

Deep multimodal learning with missing modality: A survey,

Renjie, H. Wu, H.-T. Wang, G. Chen, and Carneiro, “Deep multimodal learning with missing modality: A survey,” https://arxiv.org/abs/2409.07825, 2024

Pith/arXiv arXiv 2024

[22] [22]

When graph meets multimodal: Benchmarking and meditating on multimodal attributed graphs learning,

H. Yan, C. Li, J. Yin, Z. Yu, W. Han, M. Li, Z. Zeng, H. Sun, and S. Wang, “When graph meets multimodal: Benchmarking and meditating on multimodal attributed graphs learning,” 2025. [Online]. Available: https://arxiv.org/abs/2410.09132

arXiv 2025

[23] [23]

Mo- saic of modalities: A comprehensive benchmark for multimodal graph learning,

J. Zhu, Y . Zhou, S. Qian, Z. He, T. Zhao, N. Shah, and D. Koutra, “Mo- saic of modalities: A comprehensive benchmark for multimodal graph learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 14 215–14 224

2025

[24] [24]

Mlaga: Multimodal large language and graph assistant,

D. Fan, Y . Fang, J. Liu, D. Difallah, and Q. Tan, “Mlaga: Multimodal large language and graph assistant,”arXiv preprint arXiv:2506.02568, 2025

Pith/arXiv arXiv 2025

[25] [25]

GRAPHGPT- O: Synergistic multimodal comprehension and generation on graphs,

Y . Fang, B. Jin, J. Shen, S. Ding, Q. Tan, and J. Han, “GRAPHGPT- O: Synergistic multimodal comprehension and generation on graphs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 19 467–19 476

2025

[26] [26]

Ntsformer: A self-teaching graph transformer for multimodal isolated cold-start node classification,

J. Hu, Y . He, Y . Li, B. Hooi, and B. He, “Ntsformer: A self-teaching graph transformer for multimodal isolated cold-start node classification,” arXiv preprint arXiv:2507.04870, 2025

arXiv 2025

[27] [27]

Multi-modal learning with missing modality via shared-specific feature modelling,

H. Wang, Y . Chen, C. Ma, J. Avery, L. Hull, and G. Carneiro, “Multi-modal learning with missing modality via shared-specific feature modelling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 878–15 887

2023

[28] [28]

Mmanet: Margin-aware distillation and modality-aware regularization for incomplete multimodal learning,

Y . L. Shicai Wei, Chunbo Luo, “Mmanet: Margin-aware distillation and modality-aware regularization for incomplete multimodal learning,” Proceedings of the AAAI Conference on Artificial Intelligence, 2023

2023

[29] [29]

Leveraging foundation models for multi-modal federated learning with incomplete modality,

Liwei, J. Che, X. Wang, F. Liu, and Ma, “Leveraging foundation models for multi-modal federated learning with incomplete modality,”Machine Learning and Knowledge Discovery in Databases. Research Track, pp. 401–417, 2024

2024

[30] [30]

Fedmm: Federated multimodal learning with modality heterogeneity in computational pathology,

J. X. Yuanzhe Peng, Jieming Bian, “Fedmm: Federated multimodal learning with modality heterogeneity in computational pathology,” ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1696–1700, 2024

2024

[31] [31]

Learning from comple- mentary labels,

T. Ishida, G. Niu, W. Hu, and M. Sugiyama, “Learning from comple- mentary labels,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017

[32] [32]

Learning from multiple complementary labels,

F. Lei, K. Takuo, H. Bo, N. Gang, A. Bo, and M. Sugiyama, “Learning from multiple complementary labels,” inInternational Conference on Machine Learning (ICML), 2018

2018

[33] [33]

NLNL: Negative learning for noisy labels,

Y . Kim, J. Yim, J. Yun, and J. Kim, “NLNL: Negative learning for noisy labels,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019

[34] [34]

Openmag: A comprehensive benchmark for multimodal- attributed graph,

C. Wan, X. Li, Y . Zuo, H. Deng, S. Li, B. Fan, H. Qin, R. Li, and G. Wang, “Openmag: A comprehensive benchmark for multimodal- attributed graph,”arXiv preprint arXiv:2602.05576, 2026

arXiv 2026

[35] [35]

Pitfalls of graph neural network evaluation,

O. Shchur, M. Mumme, A. Bojchevski, and S. G ¨unnemann, “Pitfalls of graph neural network evaluation,”arXiv preprint arXiv:1811.05868, 2018

Pith/arXiv arXiv 2018

[36] [36]

Open graph benchmark: Datasets for machine learning on graphs,

W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 22 118–22 133

2020

[37] [37]

Graph- saint: Graph sampling based inductive learning method,

H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V . Prasanna, “Graph- saint: Graph sampling based inductive learning method,” inInternational conference on learning representations, ICLR, 2020

2020

[38] [38]

Roberta: A robustly optimized bert pretraining approach,

Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” inarXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907

[39] [39]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the 38th International Conference on Machine Learning (ICML), ser. PMLR, vol. 139, 2021, pp. 8748–8763

2021