pith. machine review for the scientific record. sign in

arxiv: 2605.00670 · v1 · submitted 2026-05-01 · 💻 cs.IR · cs.SI

Recognition: unknown

Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:35 UTC · model grok-4.3

classification 💻 cs.IR cs.SI
keywords multimodal recommendationmodality completiongraph retrievalsubgraph selectiongraph transformermissing featuresrecommendation systems
0
0 comments X

The pith

Retrieving relevant subgraphs and jointly encoding them with a graph transformer allows better completion of missing modalities in multimodal recommendation systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve modality incompleteness in recommendation graphs by going beyond simple neighbor-based inference. It introduces a framework that retrieves semantically relevant subgraphs from the whole graph for each query node with missing features. These subgraphs are then jointly processed with the query node using a graph transformer under global attention. A sparse-routing codebook helps regularize the embeddings. Experiments show consistent outperformance over existing methods, suggesting that richer contextual information improves robustness.

Core claim

GRE-MC selects modality-aware subgraphs to provide richer context, then uses a graph transformer to jointly encode the query node and subgraph for completing missing features, regularized by a learnable sparse-routing codebook.

What carries the argument

The modality-aware subgraph retrieval mechanism that selects semantically relevant subgraphs, combined with a graph transformer for joint global attention encoding and a learnable sparse-routing codebook for regularization.

If this is right

  • Multimodal recommendation systems become more reliable when handling incomplete data from various sources.
  • Performance improves on standard benchmarks by capturing non-local semantic cues.
  • The joint encoding allows better integration of retrieved context for feature reconstruction.
  • Regularization via codebook leads to more compact and robust latent representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar retrieval-enhanced completion could be applied to other graph learning tasks with missing node features, such as node classification in knowledge graphs.
  • Testing the approach on datasets with varying degrees of modality missingness would reveal the conditions under which subgraph retrieval provides the most benefit.
  • Integrating this with other completion techniques might yield hybrid systems that are even more resilient to data incompleteness.

Load-bearing premise

That semantically relevant context in the graph contains valuable cues non-trivial to capture through simple neighborhood aggregation, and that the modality-aware subgraph retrieval can reliably select such subgraphs for any query node with missing features.

What would settle it

An experiment on a multimodal recommendation dataset where GRE-MC with subgraph retrieval performs no better than a baseline using only direct neighbors for completion would indicate that the additional context does not provide unique value.

Figures

Figures reproduced from arXiv: 2605.00670 by Bingsheng He, Bryan Hooi, Jiaxin Jiang, Jun Hu, Yuan Li.

Figure 2
Figure 2. Figure 2: Motivation for graph retrieval–enhanced modality view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of subgraph relevance between view at source ↗
Figure 4
Figure 4. Figure 4: Overview of GRE-MC for graph retrieval-enhanced modality completion. The framework consists of two modules: (1) view at source ↗
Figure 7
Figure 7. Figure 7: Impact of different graph retrieval methods. view at source ↗
Figure 8
Figure 8. Figure 8: Performance under different missing rates. view at source ↗
Figure 6
Figure 6. Figure 6: Impact of the number of anchors and codebook size. view at source ↗
Figure 9
Figure 9. Figure 9: Relevance comparison between neighbor and re view at source ↗
Figure 10
Figure 10. Figure 10: Impact of codebook soft usage weight 𝜆usage. Notably, compared with MoDiCF, our GRE-MC incurs significantly lower computational costs in both completion and recommenda￾tion, as MoDiCF relies on computationally intensive modules such as diffusion-based completion and counterfactual recommendation. 5.3.7 Impact of Codebook Regularization. We analyze how the two regularization terms, i.e., Lusage and Lload, … view at source ↗
read the original abstract

Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constraints, which substantially degrade model performance and reliability. One effective solution to address this issue is modality completion, which reconstructs missing features to provide modality-complete graphs for downstream tasks. Given a query node with missing multimodal features, existing modality completion methods typically infer information from the node itself or its neighbors to reconstruct the missing modality. However, these methods may overlook semantically relevant context in the graph, which contains valuable cues that are non-trivial to capture through simple methods like neighborhood aggregation. In this work, we propose GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework, to overcome these limitations. By introducing a modality-aware subgraph retrieval mechanism, GRE-MC selects semantically relevant subgraphs from the entire graph, providing richer contextual information for completing missing modalities. Subsequently, a graph transformer jointly encodes the query node and the retrieved subgraph via global attention to complete the missing features, while a learnable sparse-routing codebook regularizes latent embeddings into compact bases for improved robustness. Extensive experiments on multimodal recommendation benchmarks demonstrate that GRE-MC consistently outperforms state-of-the-art methods, validating the effectiveness of subgraph retrieval and joint-encoding graph transformer for robust modality completion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework for multimodal recommendation under modality incompleteness. It introduces a modality-aware subgraph retrieval mechanism to select semantically relevant subgraphs from the full graph (beyond local neighbors), a graph transformer that jointly encodes the query node and retrieved subgraph via global attention for feature completion, and a learnable sparse-routing codebook to regularize latent embeddings into compact bases. The central claim is that this yields robust modality completion and consistent outperformance over state-of-the-art methods on multimodal recommendation benchmarks.

Significance. If the results hold, the work provides a practical extension of retrieval-augmented methods to modality completion in graphs, potentially addressing limitations of neighborhood aggregation in capturing distant but semantically useful context. The sparse-routing codebook is a positive addition for robustness. However, significance is tempered by the absence of parameter-free derivations, machine-checked proofs, or falsifiable predictions; the contribution is primarily empirical and incremental over existing graph retrieval and transformer ideas.

major comments (2)
  1. [Section 3.1] Section 3.1 (modality-aware subgraph retrieval): The mechanism is presented as reliably selecting useful subgraphs for any query node with missing features, yet the description does not specify a modality-robust similarity measure or explicit fallback when the query node's modality (used for retrieval scoring) is absent. This directly engages the stress-test concern and risks the selected subgraphs being no better than local neighbors or random, undermining the core motivation that retrieval supplies non-trivial cues beyond simple aggregation.
  2. [Section 4] Section 4 (experiments): The reported consistent outperformance lacks sufficient detail on missing-data simulation protocols (e.g., per-modality missing rates, random vs. structured missingness), exact baseline re-implementations, hyperparameter ranges for the subgraph retrieval and codebook size, and statistical significance testing across runs. Without these, the central empirical claim cannot be fully assessed for reproducibility or robustness.
minor comments (2)
  1. Notation for the sparse-routing codebook (e.g., size and routing parameters) is introduced without a clear table or equation cross-reference, making it hard to map to the free parameters listed in the axiom ledger.
  2. [Abstract] The abstract and introduction would benefit from one additional sentence clarifying the exact modalities (vision/text) and graph construction details used in the recommendation benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address the major concerns point by point below and plan to incorporate revisions to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Section 3.1] Section 3.1 (modality-aware subgraph retrieval): The mechanism is presented as reliably selecting useful subgraphs for any query node with missing features, yet the description does not specify a modality-robust similarity measure or explicit fallback when the query node's modality (used for retrieval scoring) is absent. This directly engages the stress-test concern and risks the selected subgraphs being no better than local neighbors or random, undermining the core motivation that retrieval supplies non-trivial cues beyond simple aggregation.

    Authors: We thank the referee for highlighting this important aspect of the retrieval mechanism. The modality-aware subgraph retrieval in Section 3.1 computes similarity using available modalities of the query node via projected embeddings and cosine similarity. To handle cases where the query node has no available modalities, we will revise the section to include an explicit fallback mechanism using structural graph features for retrieval. We will also provide the precise formulation of the similarity measure to demonstrate its robustness. This revision will clarify how non-trivial cues are obtained beyond local neighbors. revision: yes

  2. Referee: [Section 4] Section 4 (experiments): The reported consistent outperformance lacks sufficient detail on missing-data simulation protocols (e.g., per-modality missing rates, random vs. structured missingness), exact baseline re-implementations, hyperparameter ranges for the subgraph retrieval and codebook size, and statistical significance testing across runs. Without these, the central empirical claim cannot be fully assessed for reproducibility or robustness.

    Authors: We agree that additional experimental details are essential for assessing reproducibility. In the revised manuscript, we will expand Section 4 with: detailed missing-data simulation protocols including specific per-modality missing rates and both random and structured missingness; information on how baselines were re-implemented; the ranges and chosen values for hyperparameters such as subgraph retrieval size and codebook size; and results from statistical significance tests (e.g., t-tests) with standard deviations over multiple runs. These additions will support the empirical claims more robustly. revision: yes

Circularity Check

0 steps flagged

No circularity: new framework components are independently specified

full rationale

The paper presents GRE-MC as a composite framework consisting of modality-aware subgraph retrieval, a joint-encoding graph transformer, and a learnable sparse-routing codebook. These are introduced as novel mechanisms to address modality incompleteness, with no equations shown that define one component in terms of another or that rename a fitted parameter as a prediction. The abstract and described contributions contain no self-citation chains that bear the central claim, no uniqueness theorems imported from prior author work, and no ansatzes smuggled via citation. The derivation chain therefore remains self-contained; performance claims rest on experimental validation rather than tautological reduction to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard graph ML assumptions plus two new mechanisms introduced without independent evidence beyond the reported experiments.

free parameters (2)
  • subgraph retrieval parameters
    Size and selection criteria for retrieved subgraphs are chosen to balance context and noise.
  • sparse-routing codebook size
    Number of bases in the learnable codebook is a hyperparameter fitted during training.
axioms (2)
  • domain assumption Graphs contain semantically relevant substructures beyond immediate neighbors that can be retrieved for modality completion.
    Invoked in the description of the modality-aware subgraph retrieval mechanism.
  • domain assumption Global attention in the graph transformer can effectively integrate query node and retrieved subgraph for feature reconstruction.
    Stated as the joint-encoding step.
invented entities (2)
  • modality-aware subgraph retrieval mechanism no independent evidence
    purpose: Selects semantically relevant subgraphs from the entire graph to provide richer context for missing modality reconstruction.
    New component introduced to overcome limitations of neighborhood aggregation.
  • learnable sparse-routing codebook no independent evidence
    purpose: Regularizes latent embeddings into compact bases for improved robustness.
    New regularization component added to the framework.

pith-pipeline@v0.9.0 · 5556 in / 1475 out tokens · 20016 ms · 2026-05-09T18:35:57.302220+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, and Meng Wang. 2024. Multimodality invariant learning for multimedia-based new item recommendation. InProceedings of the 47th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval. 677–686

  2. [2]

    Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, and Changsheng Xu. 2022. Adaptive anti-bottleneck multi-modal graph learning network for personalized micro-video recommendation. InProceedings of the 30th ACM International Con- ference on Multimedia. 581–590

  3. [3]

    L. Chen, M. Wang, and Y. Li. 2024. Graph-Based Multimodal Contrastive Learning for Chart Question Answering.arXiv preprint arXiv:2501.04303(2024)

  4. [4]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG] Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion SIGIR ’26, July 2024, 2026, Melbourne, VIC, Australia

  5. [5]

    Vijay Prakash Dwivedi, Chaitanya K Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2023. Benchmarking graph neural networks. Journal of Machine Learning Research24, 43 (2023), 1–48

  6. [6]

    William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research(2022)

  7. [7]

    Fangze Fu, Wei Ai, Fan Yang, Yuntao Shou, Tao Meng, and Keqin Li. 2024. SDR- GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  8. [8]

    2008.Exploring network structure, dynamics, and function using NetworkX

    Aric Hagberg, Pieter J Swart, and Daniel A Schult. 2008.Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)

  9. [9]

    Hamilton, Rex Ying, and Jure Leskovec

    William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc

  10. [10]

    Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. Inproceedings of the 25th international conference on world wide web. 507–517

  11. [11]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648

  12. [12]

    Jun Hu, Shangheng Chen, Yufei He, Yuan Li, Bryan Hooi, and Bingsheng He. 2026. Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 14865–14873

  13. [13]

    Jun Hu, Yufei He, Yuan Li, Bryan Hooi, and Bingsheng He. 2026. NTSFormer: A Self-Teaching Graph Transformer for Multimodal Isolated Cold-Start Node Classification. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 14856–14864

  14. [14]

    Jun Hu, Bryan Hooi, Bingsheng He, and Yinwei Wei. 2025. Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommen- dation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 11790–11798

  15. [15]

    Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing. 604–613

  16. [16]

    Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. InInternational Conference on Learning Representations

  17. [17]

    Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah

  18. [18]

    InInternational Confer- ence on Learning Representations

    Graph Condensation for Graph Neural Networks. InInternational Confer- ence on Learning Representations

  19. [19]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Opti- mization. arXiv:1412.6980 [cs.LG]

  20. [20]

    Kipf and Max Welling

    Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. InInternational Conference on Learning Repre- sentations. https://openreview.net/forum?id=SJU4ayYgl

  21. [21]

    H. Li, Y. Zhang, and X. Wang. 2024. A missing multimodal imputation diffusion model for 2D X-ray and CT images.Expert Systems with Applications213 (2024), 119174

  22. [22]

    Jin Li, Shoujin Wang, Qi Zhang, Shui Yu, and Fang Chen. 2025. Generating with fairness: A modality-diffused counterfactual framework for incomplete multimodal recommendations. InProceedings of the ACM on Web Conference 2025. 2787–2798

  23. [23]

    Yuan Li, Jun Hu, Bryan Hooi, Bingsheng He, and Cheng Chen. 2026. DGP: A Dual- Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 15171–15179

  24. [24]

    Yuan Li, Jun Hu, Jiaxin Jiang, Zemin Liu, Bryan Hooi, and Bingsheng He. 2025. RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs.arXiv preprint arXiv:2503.19314(2025)

  25. [25]

    Yuan Li, Jun Hu, Zemin Liu, Bryan Hooi, Jia Chen, and Bingsheng He. 2025. Adapting Precomputed Features for Efficient Graph Condensation. InForty- second International Conference on Machine Learning

  26. [26]

    Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distri- bution: A continuous relaxation of discrete random variables.arXiv preprint arXiv:1611.00712(2016)

  27. [27]

    Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Tommaso Di Noia, and Fragkiskos D Malliaros. 2024. Do We Really Need to Drop Items with Miss- ing Modalities in Multimodal Recommendation?. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 3943–3948

  28. [28]

    Sohir Maskey, Ali Parviz, Maximilian Thiessen, Hannes Stärk, Ylli Sadikaj, and Haggai Maron. 2022. Generalized laplacian positional encoding for graph repre- sentation learning.arXiv preprint arXiv:2210.15956(2022)

  29. [29]

    Costas Mavromatis and George Karypis. 2025. GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. InFindings of the Association for Computational Linguistics: ACL 2025. 16682–16699

  30. [30]

    Julian John McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. InProceedings of the 22nd international conference on World Wide Web. 897–908

  31. [31]

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)

  32. [32]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992

  33. [33]

    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

  34. [34]

    InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

    BPR: Bayesian personalized ranking from implicit feedback. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452–461

  35. [35]

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, et al. 2017. Outrageously Large Neural Networks: The Sparsely- Gated Mixture-of-Experts Layer. InInternational Conference on Learning Repre- sentations

  36. [36]

    Zhulin Tao, Xiaohao Liu, Yewei Xia, Xiang Wang, Lifang Yang, Xianglin Huang, and Tat-Seng Chua. 2022. Self-supervised learning for multimedia recommenda- tion.IEEE Transactions on Multimedia25 (2022), 5107–5116

  37. [37]

    Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. InProceedings of the conference. Association for computational linguistics. Meeting, Vol. 2019. 6558

  38. [38]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  39. [39]

    Cheng Wang, Mathias Niepert, and Hui Li. 2018. LRMM: Learning to Recom- mend with Missing Modalities. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3360–3370

  40. [40]

    Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, and Gustavo Carneiro. 2023. Learnable cross-modal knowledge distillation for multi- modal learning with missing modality. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 216–226

  41. [41]

    Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, et al. 2019. Deep graph library: A graph- centric, highly-performant package for graph neural networks.arXiv preprint arXiv:1909.01315(2019)

  42. [42]

    Qi Wang, Liang Zhan, Paul Thompson, and Jiayu Zhou. 2020. Multimodal learning with incomplete modalities by knowledge distillation. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1828–1838

  43. [43]

    Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. InProceedings of the 28th ACM international conference on multimedia. 3541–3549

  44. [44]

    Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. InProceedings of the 27th ACM international conference on multimedia. 1437–1445

  45. [45]

    Renjie Wu, Hu Wang, Hsiang-Ting Chen, and Gustavo Carneiro. 2024. Deep Multi- modal Learning with Missing Modality: A Survey.arXiv preprint arXiv:2409.07825 (2024)

  46. [46]

    Xinyi Wu, Donald Loveland, Runjin Chen, Yozen Liu, Xin Chen, Leonardo Neves, Ali Jadbabaie, Mingxuan Ju, Neil Shah, and Tong Zhao. 2025. GraphHash: Graph Clustering Enables Parameter Efficiency in Recommender Systems. InProceedings of the ACM on Web Conference 2025. 357–369

  47. [47]

    Tao Xing, Yutao Dou, Xianliang Chen, Jiansong Zhou, Xiaolan Xie, and Shaoliang Peng. 2024. An adaptive multi-graph neural network with multimodal feature fusion learning for MDD detection.Scientific Reports14 (2024), 28400

  48. [48]

    Jeffrey Xu Yu, Lu Qin, and Lijun Chang. 2010. Keyword search in relational databases: A survey.IEEE Data Eng. Bull.(2010)

  49. [49]

    Penghang Yu, Zhiyi Tan, Guanming Lu, and Bing-Kun Bao. 2025. Mind Individual Information! Principal Graph Learning for Multimedia Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 13096–13105

  50. [50]

    Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim

  51. [51]

    Graph transformer networks.Advances in neural information processing systems32 (2019)

  52. [52]

    Zhang, Y

    Q. Zhang, Y. Liu, and Z. Wang. 2024. DGLF: A Dual Graph-based Learning Framework for Multi-modal Feature Fusion. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  53. [53]

    Zhang, X

    Y. Zhang, X. Liu, and J. Wang. 2024. Multimodal missing data in healthcare: A comprehensive review and future directions.Journal of Biomedical Informatics 135 (2024), 104226

  54. [54]

    Hongyu Zhou, Xin Zhou, Lingzi Zhang, and Zhiqi Shen. 2023. Enhancing dyadic relations with homogeneous graphs for multimodal recommendation. InECAI SIGIR ’26, July 2024, 2026, Melbourne, VIC, Australia Yuan Li, Jun Hu, Jiaxin Jiang, Bryan Hooi, and Bingsheng He

  55. [55]

    IOS Press, 3123–3130

  56. [56]

    Jiajun Zhou, Xuanze Chen, Chenxuan Xie, Yu Shanqing, Qi Xuan, and Xiao- niu Yang. 2024. Rethinking Graph Transformer Architecture Design for Node Classification.arXiv preprint arXiv:2410.11189(2024)

  57. [57]

    Xin Zhou and Zhiqi Shen. 2023. A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. InProceedings of the 31st ACM International Conference on Multimedia. 935–943

  58. [58]

    Xin Zhou, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. 2023. Bootstrap latent representations for multi- modal recommendation. InProceedings of the ACM web conference 2023. 845–854