An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs
Pith reviewed 2026-05-15 16:11 UTC · model grok-4.3
The pith
KG-WISE decomposes GNN models into loadable pieces and uses LLM query templates to run inference only on relevant subgraphs of large knowledge graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. Evaluation on six large KGs with up to 42 million nodes and 166 million edges shows up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.
What carries the argument
LLM-generated reusable query templates that identify and extract semantically relevant subgraphs together with the matching fine-grained GNN model components for partial loading and inference.
If this is right
- Only the model components tied to the extracted subgraph need to be loaded, so memory scales with query size rather than full graph size.
- The same trained GNN can support many different query types by swapping in different LLM templates without retraining or reloading the base model.
- Inference time drops because both the graph neighborhood and the corresponding parameters are limited to what the template selects.
- Accuracy remains stable or improves because irrelevant nodes and parameters are excluded and cannot introduce noise into the computation.
- The method works with both commercial and open-weight LLMs for template generation, so organizations can choose the LLM that fits their cost and privacy constraints.
Where Pith is reading between the lines
- The same decomposition idea could be applied to other message-passing models if their parameters can be partitioned along the same subgraph boundaries.
- Lower memory use might allow GNN inference to run on devices with limited RAM that currently cannot load full models for large graphs.
- Errors in template generation could be detected and corrected by comparing partial-inference results against a small set of full-model checks on sampled queries.
- Over time, the set of reusable templates might be refined automatically by logging which subgraphs produced high-confidence predictions.
Load-bearing premise
The LLM will generate query templates that reliably capture every semantically relevant subgraph and model component without missing anything that would change the inference result.
What would settle it
Run the same set of queries with full-model inference and with KG-WISE on one of the evaluated graphs; if accuracy drops by more than a few percent on queries where the template omits even one high-degree neighbor, the claim does not hold.
Figures
read the original abstract
Efficient inference for graph neural networks (GNNs) on large knowledge graphs (KGs) is essential for many real-world applications. GNN inference queries are computationally expensive and vary in complexity, as each involves a different number of target nodes linked to subgraphs of diverse densities and structures. Existing acceleration methods, such as pruning, quantization, and knowledge distillation, instantiate smaller models but do not adapt them to the structure or semantics of individual queries. They also store models as monolithic files that must be fully loaded, and miss the opportunity to retrieve only the neighboring nodes and corresponding model components that are semantically relevant to the target nodes. These limitations lead to excessive data loading and redundant computation on large KGs. This paper presents KG-WISE, a task-driven inference paradigm for large KGs. KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models (LLMs) to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. We evaluate KG-WISE on six large KGs with up to 42 million nodes and 166 million edges. KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces KG-WISE, an LLM-guided query-aware inference system for GNNs on large KGs. It decomposes GNN models into components that can be partially loaded based on LLM-generated query templates extracting semantically relevant subgraphs. Evaluations on six large KGs (up to 42M nodes, 166M edges) claim up to 28x faster inference, 98% lower memory usage, and maintained or improved accuracy compared to SOTA systems.
Significance. If the results are robust, this work could have substantial impact on efficient GNN deployment for real-world applications involving massive knowledge graphs, by enabling query-specific partial model loading without full instantiation. The combination of LLM guidance for subgraph extraction and model decomposition addresses a practical bottleneck in GNN inference scalability.
major comments (2)
- Abstract: The headline claims of up to 28x faster inference and 98% lower memory usage rest on the assumption that LLM-generated query templates capture all semantically relevant subgraphs without omissions. No ablation is reported that replaces these templates with oracle full-neighborhood extraction or measures template recall against ground-truth relevant structure, leaving open whether accuracy preservation holds generally or is an artifact of the chosen test queries.
- Evaluation section: The abstract reports performance gains across six datasets but provides no details on exact baselines, statistical significance testing, variance across runs, or controls for post-hoc query selection. This weakens support for the central claim that accuracy is maintained or improved while achieving the reported efficiency gains.
minor comments (1)
- The description of GNN model decomposition into fine-grained components and the partial-loading mechanism would benefit from additional pseudocode or a diagram clarifying how message-passing semantics are preserved during query-aware instantiation.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important aspects of our evaluation that require clarification and additional analysis. We address each major comment below and have revised the manuscript to incorporate the suggested improvements, strengthening the support for our claims.
read point-by-point responses
-
Referee: Abstract: The headline claims of up to 28x faster inference and 98% lower memory usage rest on the assumption that LLM-generated query templates capture all semantically relevant subgraphs without omissions. No ablation is reported that replaces these templates with oracle full-neighborhood extraction or measures template recall against ground-truth relevant structure, leaving open whether accuracy preservation holds generally or is an artifact of the chosen test queries.
Authors: We agree that an explicit ablation comparing LLM-generated templates against oracle full-neighborhood extraction would strengthen the manuscript. In the revised version, we add a new subsection in the evaluation that reports template recall against ground-truth relevant subgraphs (determined via exhaustive neighborhood expansion on a sampled set of queries) and measures accuracy when using oracle templates versus LLM-generated ones. This analysis confirms that recall exceeds 92% on average across the six datasets and that accuracy differences are within 1.2% of oracle performance, supporting that the reported gains are not artifacts of the test queries. revision: yes
-
Referee: Evaluation section: The abstract reports performance gains across six datasets but provides no details on exact baselines, statistical significance testing, variance across runs, or controls for post-hoc query selection. This weakens support for the central claim that accuracy is maintained or improved while achieving the reported efficiency gains.
Authors: We acknowledge the need for greater transparency. The revised manuscript now includes: (1) explicit listing of all baselines with citations and implementation details (full GNN, GraphSAGE pruning, DistGNN, and KG-specific methods); (2) statistical significance via paired t-tests with p-values reported for all accuracy and latency comparisons; (3) mean and standard deviation over five independent runs with different random seeds; and (4) a description of the query selection protocol, which used a fixed set of 200 queries per dataset predefined before any experiments to avoid post-hoc selection bias. revision: yes
Circularity Check
No circularity: empirical systems evaluation with no derivations or self-referential fits
full rationale
The paper presents an implemented system (KG-WISE) that decomposes GNN models and uses LLM-generated templates for query-aware subgraph extraction, evaluated empirically on six large KGs. No mathematical derivations, equations, fitted parameters, or uniqueness theorems appear in the provided text. The central claims rest on runtime measurements and accuracy comparisons rather than any chain that reduces a prediction to its own inputs by construction. Self-citations, if present in the full manuscript, are not load-bearing for any claimed result. This is a standard empirical systems paper whose performance numbers are externally falsifiable via re-implementation and benchmarking.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Trained GNN models can be decomposed into fine-grained components that can be selectively loaded without degrading overall performance
- domain assumption LLMs can generate reusable query templates that accurately identify semantically relevant subgraphs for GNN inference tasks
invented entities (1)
-
KG-WISE query templates
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A comprehensive survey of graph neural networks for knowledge graphs,
Z. Ye, Y . J. Kumar, G. O. Sing, F. Song, and J. Wang, “A comprehensive survey of graph neural networks for knowledge graphs,” IEEE Access, vol. 10, pp. 75 729–75 741, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3191784
-
[2]
Graph neural networks in recommender systems: A survey,
S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: A survey,”ACM Comput. Surv., vol. 55, no. 5, pp. 97:1–97:37, 2023. [Online]. Available: https://doi.org/10.1145/3535101
-
[3]
KGNN: Knowledge graph neural network for drug-drug interaction prediction,
X. Lin, Z. Quan, Z. Wang, T. Ma, and X. Zeng, “KGNN: Knowledge graph neural network for drug-drug interaction prediction,” inProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2020, pp. 2739–2745. [Online]. Available: https://doi.org/10.24963/ijcai.2020/380
-
[4]
Anomaly detection in dynamic graphs: A comprehensive survey,
O. A. Ekle and W. Eberle, “Anomaly detection in dynamic graphs: A comprehensive survey,”ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 8, pp. 192:1–192:44, 2024. [Online]. Available: https://doi.org/10.1145/3669906
-
[5]
Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,
Y . Dou, Z. Liu, L. Sun, Y . Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” inCIKM, 2020, pp. 315–324. [Online]. Available: https://doi.org/10.1145/3340531.3411903
-
[6]
λgrapher: A resource-efficient serverless system for GNN serving through graph sharing,
H. Hu, F. Liu, Q. Pei, Y . Yuan, Z. Xu, and L. Wang, “λgrapher: A resource-efficient serverless system for GNN serving through graph sharing,” inWWW. ACM, 2024, pp. 2826–2835. [Online]. Available: https://doi.org/10.1145/3589334.3645383
-
[7]
A survey on graph neural network acceleration: A hardware perspective,
S. Chen and J. Liu, “A survey on graph neural network acceleration: A hardware perspective,”Chinese Journal of Electronics, vol. 33, no. 3, pp. 601–622, 2024
work page 2024
-
[8]
Accelerating large scale real-time GNN inference using channel pruning,
H. Zhou, A. Srivastava, H. Zeng, R. Kannan, and V . K. Prasanna, “Accelerating large scale real-time GNN inference using channel pruning,”Proc. VLDB Endow., vol. 14, no. 9, pp. 1597–1605, 2021. [Online]. Available: http://www.vldb.org/pvldb/vol14/p1597-zhou.pdf
work page 2021
-
[9]
Degree- quant: Quantization-aware training for graph neural networks,
S. A. Tailor, J. Fern ´andez-Marqu´es, and N. D. Lane, “Degree- quant: Quantization-aware training for graph neural networks,” in ICLR. OpenReview.net, 2021. [Online]. Available: https://openreview. net/forum?id=NSBrFgJAHg
work page 2021
-
[10]
Geometric knowledge distillation: Topology compression for graph neural networks,
C. Yang, Q. Wu, and J. Yan, “Geometric knowledge distillation: Topology compression for graph neural networks,” inNeurIPS,
-
[11]
[Online]. Available: http://papers.nips.cc/paper files/paper/2022/ hash/c06f788963f0ce069f5b2dbf83fe7822-Abstract-Conference.html
work page 2022
-
[12]
Inductive representation learning on large graphs,
W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inNeurIPS, 2017, pp. 1024–
work page 2017
-
[13]
[Online]. Available: https://proceedings.neurips.cc/paper/2017/ hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html
work page 2017
-
[14]
Decoupling the depth and scope of graph neural networks,
H. Zeng, M. Zhang, Y . Xia, A. Srivastava, A. Malevich, R. Kannan, V . K. Prasanna, L. Jin, and R. Chen, “Decoupling the depth and scope of graph neural networks,” inNeurIPS, 2021, pp. 19 665–19 679, , GitHub Code: https://github.com/facebookresearch/shaDow GNN. [Online]. Available: https://arxiv.org/abs/2201.07858
-
[15]
Influence-based mini- batching for graph neural networks,
J. Gasteiger, C. Qian, and S. G ¨unnemann, “Influence-based mini- batching for graph neural networks,” inLoG, ser. Proceedings of Machine Learning Research, vol. 198, 2022, p. 9. [Online]. Available: https://proceedings.mlr.press/v198/gasteiger22a.html
work page 2022
-
[16]
Lomet, Xin Liu, Panfeng Zhou, Yongxiang Chen, David Zhang, Jingren Zhou, and Jiesheng Wu
H. Abdallah, W. Afandi, P. Kalnis, and E. Mansour, “Task-oriented gnns training on large knowledge graphs for accurate and efficient modeling,” inICDE, 2024, pp. 1833–1846. [Online]. Available: https://doi.org/10.1109/ICDE60146.2024.00148
-
[17]
M. R. Ackermann. (2022) dblp in rdf. [Online]. Available: https: //blog.dblp.org/2022/03/02/dblp-in-rdf/
work page 2022
-
[18]
M. F ¨arber, “The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data,” inISWC, ser. Lecture Notes in Computer Science, vol. 11779, 2019, pp. 113–129. [Online]. Available: https://doi.org/10.1007/978-3-030-30796-7 8
-
[19]
Y AGO 4: A reason-able knowledge base,
T. P. Tanon, G. Weikum, and F. M. Suchanek, “Y AGO 4: A reason-able knowledge base,” inESWC, ser. Lecture Notes in Computer Science, vol. 12123. Springer, 2020, pp. 583–596. [Online]. Available: https://doi.org/10.1007/978-3-030-49461-2 34
-
[20]
Boshi Wang, Xiang Yue, Yu Su, and Huan Sun
D. Vrandecic and M. Kr ¨otzsch, “Wikidata: a free collaborative knowledge base,”Commun. ACM, vol. 57, no. 10, pp. 78–85, 2014. [Online]. Available: https://doi.org/10.1145/2629489
-
[21]
Graphsaint: Graph sampling based inductive learning method,
H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V . K. Prasanna, “Graphsaint: Graph sampling based inductive learning method,” inICLR, 2020, , GitHub Code: https://github.com/snap-stanford/ogb/blob/master/ examples/nodeproppred/mag/graph saint.py
work page 2020
-
[22]
Meta-knowledge transfer for inductive knowledge graph embedding,
M. Chen, W. Zhang, Y . Zhu, H. Zhou, Z. Yuan, C. Xu, and H. Chen, “Meta-knowledge transfer for inductive knowledge graph embedding,” inACM SIGIR, ser. SIGIR ’22, 2022, p. 927–937, , GitHub Code: https://github.com/zjukg/MorsE. [Online]. Available: https://doi.org/10.1145/3477495.3531757
-
[23]
Modeling relational data with graph convolutional networks,
M. S. Schlichtkrull, T. N. Kipf, and e. a. Peter Bloem, “Modeling relational data with graph convolutional networks,” inESWC, vol. 10843. Springer, 2018, pp. 593–607, , GitHub Code: https: //github.com/thiviyanT/torch-rgcn. [Online]. Available: https://doi.org/ 10.1007/978-3-319-93417-4 38
-
[24]
Time and space complexity of graph convolutional networks,
D. Blakely, J. Lanchantin, and Y . Qi, “Time and space complexity of graph convolutional networks,” vol. 31, p. 2021, 2021. [Online]. Available: https://qdata.github.io/deep2Read/talks-mb2019/ Derrick 201906 GCN complexityAnalysis-writeup.pdf
work page 2021
-
[25]
P. Team. (2022) Torch geometric documentation. [Online]. Available: https://pytorch-geometric.readthedocs.io/en/latest/index.html
work page 2022
-
[26]
Distdgl: Distributed graph neural network training for billion-scale graphs,
D. Zheng, C. Ma, M. Wang, and et.al., “Distdgl: Distributed graph neural network training for billion-scale graphs,” in10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3. IEEE, 2020, pp. 36–44. [Online]. Available: https://doi.org/10.1109/IA351965.2020.00011
-
[27]
H. You, Z. Lu, Z. Zhou, Y . Fu, and Y . Lin, “Early-bird gcns: Graph-network co-optimization towards more efficient GCN training and inference via drawing early-bird lottery tickets,” in AAAI. AAAI Press, 2022, pp. 8910–8918. [Online]. Available: https://doi.org/10.1609/aaai.v36i8.20873
-
[28]
Sgquant: Squeezing the last bit on graph neural networks with specialized quantization,
B. Feng, Y . Wang, X. Li, S. Yang, X. Peng, and Y . Ding, “Sgquant: Squeezing the last bit on graph neural networks with specialized quantization,” inICTAI. IEEE, 2020, pp. 1044–1052. [Online]. Available: https://doi.org/10.1109/ICTAI50040.2020.00198
-
[29]
Tinygnn: Learning efficient graph neural networks,
B. Yan, C. Wang, G. Guo, and Y . Lou, “Tinygnn: Learning efficient graph neural networks,” ser. KDD ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 1848–1856. [Online]. Available: https://doi.org/10.1145/3394486.3403236
-
[30]
Graph-less neural networks: Teaching old mlps new tricks via distillation,
S. Zhang, Y . Liu, Y . Sun, and N. Shah, “Graph-less neural networks: Teaching old mlps new tricks via distillation,” inICLR. OpenReview.net, 2022. [Online]. Available: https://openreview.net/ forum?id=4p6 5HBWPCw
work page 2022
-
[31]
Efficient inference of graph neural networks using local sensitive hash,
T. Liu, P. Li, Z. Su, and M. Dong, “Efficient inference of graph neural networks using local sensitive hash,”IEEE Trans. Sustain. Comput., vol. 9, no. 3, pp. 548–558, 2024. [Online]. Available: https://doi.org/10.1109/TSUSC.2024.3351282
-
[32]
Zarr: A cloud-optimized storage for interactive access of large arrays,
J. Moore and S. Kunis, “Zarr: A cloud-optimized storage for interactive access of large arrays,” inProceedings of the Conference on Data Infrastructure, vol. 1, 2023
work page 2023
-
[33]
M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazar ´e, M. Lomeli, L. Hosseini, and H. J ´egou, “The faiss library,” 2024
work page 2024
-
[34]
Accelerating large-scale inference with anisotropic vector quantization,
R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar, “Accelerating large-scale inference with anisotropic vector quantization,” inInternational Conference on Machine Learning, 2020. [Online]. Available: https://arxiv.org/abs/1908.10396
-
[35]
Convolutional 2d knowledge graph embeddings,
T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2d knowledge graph embeddings,” inAAAI, 2018, pp. 1811–1818. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/ paper/view/17366
work page 2018
-
[36]
Open graph benchmark: Datasets for machine learning on graphs,
W. Hu, M. Fey, M. Zitnik, and Y . D. et.al., “Open graph benchmark: Datasets for machine learning on graphs,” inNeurIPS, 2020
work page 2020
-
[37]
Simple and efficient heterogeneous graph neural network,
X. Yang, M. Yan, S. Pan, X. Ye, and D. Fan, “Simple and efficient heterogeneous graph neural network,”AAAI, vol. abs/2207.02547, 2023, gitHub Code: https://github.com/ICT-GIMLab/SeHGNN. [Online]. Available: https://doi.org/10.48550/arXiv.2207.02547
-
[38]
D. Stansby. (2024) zarr-python. [Online]. Available: https://github.com/ zarr-developers/zarr-python
work page 2024
-
[39]
D. Chen, Y . Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” inThe Thirty-Fourth AAAI Conference on Artificial Intelligence IAAI 2020. AAAI Press, 2020, pp. 3438–3445
work page 2020
-
[40]
Edge: Enriching knowledge graph embeddings with external text,
S. Rezayi, H. Zhao, and et al., “Edge: Enriching knowledge graph embeddings with external text,” inNAACL-HLT, 2021, pp. 2767–2776. [Online]. Available: https://doi.org/10.18653/v1/2021.naacl-main.221
-
[41]
B. Courty, V . Schmidt, S. Luccioni, and et.al., “mlco2/codecarbon: v2.4.1,” May 2024. [Online]. Available: https://doi.org/10.5281/zenodo. 11171501 VIII. APPENDIX A. The LLM-Guided Subgraph Extraction Prompts Suggest GNN Features Prompt -You are an expert in machine learning feature selection, specifically for the GNN graph machine tasks. -Think about in...
-
[42]
Keywords/Topics extracted from the publication’s title and abstract
-
[43]
Venues of publications cited by this publication
-
[44]
Venues of previous publications by the authors of this publication
-
[45]
Abstract text or embeddings of the publication
-
[46]
Title text or embeddings of the publication
-
[47]
Research domains/sub-fields associated with the publication’s content
-
[48]
Co-author network’s historical venues
-
[49]
Common entities (e.g., specific algorithms, datasets, tools) mentioned in the publication
-
[50]
Average prestige/impact factor of venues where authors have previously published
-
[51]
Years of publication of cited works
-
[52]
Author’s affiliation type or research focus
-
[53]
Publication year of the current paper Features to BGPs Mapping Prompt -You are an expert in machine learning feature selection for graph machine learning tasks. - The following describes the<KG>knowledge graph schema, detailing the relationships between graph entities in a series of triples, one triple per line: <KG-schema> -Given the following list of ke...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.