arxiv: 2605.02452 · v1 · submitted 2026-05-04 · 💻 cs.AI

Recognition: 1 theorem link

Position: How can Graphs Help Large Language Models?

Xiyuan Wang , Yi Hu , Yanbo Wang , Chuan Shi , Muhan Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:42 UTC · model grok-4.3

classification 💻 cs.AI

keywords graphslarge language modelshallucinationspromptingknowledge graphsstructured datareasoninggraph neural networks

0 comments

The pith

Graphs can reduce hallucinations in large language models by serving as current knowledge sources and strengthen reasoning through structured prompting methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper poses the reverse of the common question about LLMs aiding graph tasks and instead examines how graphs can support LLMs. It identifies three concrete mechanisms: graphs as dynamic knowledge bases that supply fresh facts to curb factual errors, graph-organized prompting that guides step-by-step thinking, and direct incorporation of graph structures that lets models handle relational information more naturally. These approaches matter because LLMs frequently produce plausible but incorrect outputs and struggle with data that has explicit connections rather than linear text. The authors also sketch future model designs that might use graphs for sparsity and memory organization.

Core claim

Graphs help large language models by acting as up-to-date knowledge sources that reduce hallucinations, by enabling prompting techniques such as Chain-of-Thought, Tree-of-Thought, and Graph-of-Thought that improve reasoning, and by allowing integration of graph structures that extends LLM applicability to structured domains including e-commerce, code, and relational databases.

What carries the argument

The three perspectives of graph assistance: knowledge sourcing to combat hallucinations, graph-based prompting for reasoning, and structural integration for relational data.

If this is right

LLMs can maintain accuracy on time-sensitive facts without full retraining.
Reasoning tasks that involve branching or relational paths become more tractable.
Models gain native support for domains that rely on tables, graphs, or code dependencies.
Future LLM designs may adopt sparse graph connections to lower compute needs.
Memory systems modeled on brain-like graph structures could emerge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid LLM-graph systems could set new standards for reliability in knowledge-intensive applications.
Benchmarks that jointly test textual fluency and structural consistency may become necessary.
Graph integration might allow smaller models to match larger ones on tasks that benefit from explicit relations.

Load-bearing premise

The assumption that adding graph components will produce clear performance gains in LLMs without creating new engineering burdens or unforeseen limitations.

What would settle it

A controlled test on factual question-answering benchmarks that shows no measurable drop in hallucination rate when LLMs are given access to an up-to-date graph knowledge source versus text-only retrieval.

read the original abstract

With the rapid advancement of large language models (LLMs), classic graph learning tasks have greatly benefited from LLMs, including improved encoding of textual features, more efficient construction of graphs from text, and enhanced reasoning over knowledge graphs. In this paper, we ask a complementary question: How can graphs help LLMs? We address this question from three perspectives: 1) graphs provide an up-to-date knowledge source that helps reduce LLM hallucinations, 2) graph-based prompting techniques-such as Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT)-enhance LLM reasoning capabilities, and 3) integrating graphs into LLMs improves their understanding of structured data, expanding their applicability to domains such as e-commerce, code, and relational databases (RDBs). We further outlook some future directions including designing sparse LLM architectures based on graphs and brain-inspired memory systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is a position paper posing the complementary question of how graphs can help large language models (LLMs), in contrast to the more common use of LLMs to aid graph tasks. It addresses this via three forward-looking perspectives: (1) graphs as an up-to-date knowledge source to reduce hallucinations, (2) graph-based prompting techniques (e.g., Chain-of-Thought, Tree-of-Thought, Graph-of-Thought) to enhance reasoning, and (3) integration of graphs into LLMs to improve structured data understanding and expand applicability to domains like e-commerce, code, and relational databases. The paper concludes with an outlook on future directions including sparse graph-based LLM architectures and brain-inspired memory systems.

Significance. If the perspectives hold and are pursued in follow-on work, the paper could help steer research toward hybrid graph-LLM systems that address key LLM limitations in knowledge freshness and structured reasoning. Its primary contribution is the clear framing of a complementary research agenda rather than any new empirical results or formal derivations; this framing itself has value in highlighting underexplored synergies.

minor comments (3)

[Abstract] The abstract introduces 'Graph-of-Thought (GoT)' without a brief definition or citation, which reduces accessibility for readers new to the prompting literature.
[Perspectives] In the discussion of the three perspectives, the mechanisms (e.g., how graphs are dynamically updated or retrieved to mitigate hallucinations) are described at a high level only; adding one concrete illustrative example per perspective would strengthen the exposition without altering the position-paper nature.
[Outlook] The future-directions paragraph lists sparse architectures and brain-inspired memory but does not identify any concrete open challenges or evaluation metrics that would help readers design follow-up experiments.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our position paper and the recommendation for minor revision. We are pleased that the complementary framing of how graphs can aid LLMs is viewed as a valuable contribution to steering future research on hybrid systems.

Circularity Check

0 steps flagged

No significant circularity; position paper with no derivations

full rationale

This is a position paper that poses a complementary question and addresses it through three discursive perspectives plus an outlook on future directions. It contains no equations, formal derivations, fitted parameters, or performance claims that could reduce to self-referential inputs. The perspectives are framed as potential benefits drawn from external concepts rather than proven results, and no load-bearing self-citations or uniqueness theorems are invoked to force conclusions. The manuscript is self-contained against external benchmarks as a forward-looking discussion without any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The position rests on domain assumptions about the effectiveness of graph-LLM synergies that are not tested or derived within the paper itself.

axioms (1)

domain assumption Graphs can serve as reliable, up-to-date knowledge sources that reduce LLM hallucinations when integrated appropriately.
Invoked in the first perspective without supporting evidence or mechanisms detailed in the abstract.

pith-pipeline@v0.9.0 · 5458 in / 1128 out tokens · 62268 ms · 2026-05-08T18:42:54.474729+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

113 extracted references · 45 canonical work pages · 4 internal anchors

[1]

Institute for Artificial Intelligence, Peking University, Beijing 100871, China
[2]

Position: How can Graphs Help Large Language Models?

School of Computer Sciences, Beijing University of Posts and Telecommunications, Beijing 100876, China Received month dd, yyyy; accepted month dd, yyyy E-mail: muhan@pku.edu.cn. * These authors contributed equally to this work. ©Higher Education Press 2026 Abstract With the rapid advancement of large language models (LLMs), classic graph learning tasks ha...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Another line of research proposes post-training LLMs using instruction tuning [77,83,84] or preference tuning [85,86] on graph problems

and linearization orders [81, 82] have generally resulted in only modest improvements. Another line of research proposes post-training LLMs using instruction tuning [77,83,84] or preference tuning [85,86] on graph problems. These methods achieve good performance on problems related to basic graph structural properties. •Graph as Embedding A more effective...

2026
[4]

A survey of graph meets large language model: Progress and future directions.arXiv preprint arXiv:2311.12399, 2023

Li Y, Li Z, Wang P, Li J, Sun X, Cheng H, Yu J X. A survey of graph meets large language model: Progress and future directions. arXiv preprint arXiv:2311.12399, 2023

work page arXiv 2023
[5]

Exploring the potential of large language models (llms) in learning on graphs, 2025

Chen Z, Mao H, Li H, Jin W, Wen H, Wei X, Wang S, Yin D, Fan W, Liu H, Tang J. Exploring the potential of large language models (llms) in learning on graphs, 2025

2025
[6]

Graph machine learning in the era of large language models (llms)

Wang S, Huang J, Chen Z, Song Y, Tang W, Mao H, Fan W, Liu H, Liu X, Yin D, others . Graph machine learning in the era of large language models (llms). ACM Transactions on Intelligent Systems and Technology, 2025, 16(5): 1–40

2025
[7]

Large language models on graphs: A comprehensive survey

Jin B, Liu G, Han C, Jiang M, Ji H, Han J. Large language models on graphs: A comprehensive survey. IEEE Transactions on Knowledge and Data Engineering, 2024

2024
[8]

A survey of large lan- guage models for graphs

Ren X, Tang J, Yin D, Chawla N, Huang C. A survey of large lan- guage models for graphs. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 6616– 6626

2024
[9]

Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities

Zhu Y, Wang X, Chen J, Qiao S, Ou Y, Yao Y, Deng S, Chen H, Zhang N. Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. World Wide Web, 2024, 27(5): 58

2024
[10]

Unifying large language models and knowledge graphs: A roadmap

Pan S, Luo L, Wang Y, Chen C, Wang J, Wu X. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(7): 3580– 3599

2024
[11]

Combining knowledge graphs and large language models

Kau A, He X, Nambissan A, Astudillo A, Yin H, Aryani A. Combining knowledge graphs and large language models. arXiv preprint arXiv:2407.06564, 2024

work page arXiv 2024
[12]

K- bert: Enabling language representation with knowledge graph

Liu W, Zhou P, Zhao Z, Wang Z, Ju Q, Deng H, Wang P. K- bert: Enabling language representation with knowledge graph. In: Proceedings of the AAAI conference on artificial intelligence. 2020, 2901–2908

2020
[13]

Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model, 2019

Xiong W, Du J, Wang W Y, Stoyanov V. Pretrained encyclope- dia: Weakly supervised knowledge-pretrained language model. arXiv preprint arXiv:1912.09637, 2019

work page arXiv 1912
[14]

Colake: Contextualized language and knowledge embedding

Sun T, Shao Y, Qiu X, Guo Q, Hu Y, Huang X, Zhang Z. Colake: Contextualized language and knowledge embedding. arXiv preprint arXiv:2010.00309, 2020

work page arXiv 2010
[15]

Ex- ploiting structured knowledge in text via graph-guided representation learning

Shen T, Mao Y, He P, Long G, Trischler A, Chen W. Exploiting structured knowledge in text via graph-guided representation learning. arXiv preprint arXiv:2004.14224, 2020

work page arXiv 2004
[16]

Dkplm: decomposable knowledge-enhanced pre-trained language model for FrontiersofComputer Science|Issue 0|Volume 0|January 2026|1–7 Xiyuan Wang et al

Zhang T, Wang C, Hu N, Qiu M, Tang C, He X, Huang J. Dkplm: decomposable knowledge-enhanced pre-trained language model for FrontiersofComputer Science|Issue 0|Volume 0|January 2026|1–7 Xiyuan Wang et al. Position: How can Graphs Help Large Language Models? natural language understanding. In: Proceedings of the AAAI Confer- ence on Artificial Intelligence....

2026
[17]

Ernie: Enhanced language representation with informative entities

Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019

work page arXiv 1905
[18]

Integrat- ing graph contextualized knowledge into pre-trained language models

He B, Zhou D, Xiao J, Liu Q, Yuan N J, Xu T, others . Integrat- ing graph contextualized knowledge into pre-trained language models. arXiv preprint arXiv:1912.00147, 2019

work page arXiv 1912
[19]

Deep bidirectional language-knowledge graph pre- training

Yasunaga M, Bosselut A, Ren H, Zhang X, Manning C D, Liang P S, Leskovec J. Deep bidirectional language-knowledge graph pre- training. Advances in Neural Information Processing Systems, 2022, 35: 37309–37323

2022
[20]

Klmo: Knowledge graph en- hanced pretrained language model with fine-grained relationships

He L, Zheng S, Yang T, Zhang F. Klmo: Knowledge graph en- hanced pretrained language model with fine-grained relationships. In: Findings of the Association for Computational Linguistics: EMNLP
[21]

Knowledge enhanced contextual word represen- tations

Peters M E, Neumann M, Logan IV R L, Schwartz R, Joshi V, Singh S, Smith N A. Knowledge enhanced contextual word represen- tations. arXiv preprint arXiv:1909.04164, 2019

work page arXiv 1909
[22]

Jaket: Joint pre-training of knowledge graph and language understanding

Yu D, Zhu C, Yang Y, Zeng M. Jaket: Joint pre-training of knowledge graph and language understanding. In: Proceedings of the AAAI conference on artificial intelligence. 2022, 11630–11638

2022
[23]

Trelm: Towards robust and efficient pre-training for knowledge- enhanced language models

Yan J, Wang C, Zhang T, He X, Huang J, Huang L, Xue H, Zhang W. Trelm: Towards robust and efficient pre-training for knowledge- enhanced language models. arXiv preprint arXiv:2403.11203, 2024

work page arXiv 2024
[24]

Greaselm: Graph reasoning enhanced language models for question answering

Zhang X, Bosselut A, Yasunaga M, Ren H, Liang P, Manning C D, Leskovec J. Greaselm: Graph reasoning enhanced language models for question answering. arXiv preprint arXiv:2201.08860, 2022

work page arXiv 2022
[25]

Jointlk: Joint reasoning with language models and knowledge graphs for commonsense question answering

Sun Y, Shi Q, Qi L, Zhang Y. Jointlk: Joint reasoning with language models and knowledge graphs for commonsense question answering. arXiv preprint arXiv:2112.02732, 2021

work page arXiv 2021
[26]

Gap: A graph-aware language model framework for knowledge graph-to-text generation

Colas A, Alvandipour M, Wang D Z. Gap: A graph-aware language model framework for knowledge graph-to-text generation. arXiv preprint arXiv:2204.06674, 2022

work page arXiv 2022
[27]

K-adapter: Infusing knowledge into pre-trained models with adapters

Wang R, Tang D, Duan N, Wei Z, Huang X, Cao G, Jiang D, Zhou M, others . K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808, 2020

work page arXiv 2002
[28]

Kg- adapter: Enabling knowledge graph integration in large language mod- els through parameter-efficient fine-tuning

Tian S, Luo Y, Xu T, Yuan C, Jiang H, Wei C, Wang X. Kg- adapter: Enabling knowledge graph integration in large language mod- els through parameter-efficient fine-tuning. In: Findings of the Asso- ciation for Computational Linguistics ACL 2024. 2024, 3813–3828

2024
[30]

Lego-graphrag: Modularizing graph-based retrieval-augmented generation for design space exploration

Cao Y, Gao Z, Li Z, Xie X, Zhou K, Xu J. Lego-graphrag: Modularizing graph-based retrieval-augmented generation for design space exploration. arXiv preprint arXiv:2411.05844, 2024

work page arXiv 2024
[31]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Guo Z, Xia L, Yu Y, Ao T, Huang C. Lightrag: Simple and fast retrieval-augmented generation. arXiv preprint arXiv:2410.05779, 2024

work page internal anchor Pith review arXiv 2024
[32]

Costas Mavromatis and George Karypis

Ma S, Xu C, Jiang X, Li M, Qu H, Yang C, Mao J, Guo J. Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge-guided retrieval augmented generation. arXiv preprint arXiv:2407.10805, 2024

work page arXiv 2024
[33]

Hy- bridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction

Sarmah B, Mehta D, Hall B, Rao R, Patel S, Pasquali S. Hy- bridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In: Proceedings of the 5th ACM International Conference on AI in Finance. 2024, 608–616

2024
[34]

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering

He X, Tian Y, Sun Y, Chawla N, Laurent T, LeCun Y, Bresson X, Hooi B. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. Advances in Neural Information Processing Systems, 2024, 37: 132876–132907

2024
[35]

Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation.arXiv preprint arXiv:2410.20724, 2024

Li M, Miao S, Li P. Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation. arXiv preprint arXiv:2410.20724, 2024

work page arXiv 2024
[36]

Aligning llms for the classroom with knowledge-based retrieval–a comparative rag study

Jain A, Cui L, Chen S. Aligning llms for the classroom with knowledge-based retrieval–a comparative rag study. arXiv preprint arXiv:2509.07846, 2025

work page arXiv 2025
[37]

Tri-Graph

Zhuang L, Chen S, Xiao Y, Zhou H, Zhang Y, Chen H, Zhang Q, Huang X. Linearrag: Linear graph retrieval augmented generation on large-scale corpora. arXiv preprint arXiv:2510.10114, 2025

work page arXiv 2025
[38]

Erarag: Efficient and incremental retrieval augmented generation for growing corpora,

Zhang F, Huang Z, Zhou Y, Guo Q, Li Z, Luo W, Jiang D, Fang Y, Zhou X. Erarag: Efficient and incremental retrieval augmented generation for growing corpora. arXiv preprint arXiv:2506.20963, 2025

work page arXiv 2025
[39]

Subgcache: Accel- erating graph-based rag with subgraph-level kv cache

Zhu Q, Zhang L, Xu Q, Long C, Zhang J. Subgcache: Accel- erating graph-based rag with subgraph-level kv cache. arXiv preprint arXiv:2505.10951, 2025

work page arXiv 2025
[40]

Grapheval: A knowledge-graph based llm hallucination evaluation framework,

Sansford H, Richardson N, Maretic H P, Saada J N. Grapheval: A knowledge-graph based llm hallucination evaluation framework. arXiv preprint arXiv:2407.10793, 2024

work page arXiv 2024
[41]

Zero-resource hallucination detection for text generation via graph-based contextual knowledge triples modeling

Fang X, Huang Z, Tian Z, Fang M, Pan Z, Fang Q, Wen Z, Pan H, Li D. Zero-resource hallucination detection for text generation via graph-based contextual knowledge triples modeling. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 23868–23877

2025
[42]

Evaluating the factuality of large language models using large-scale knowledge graphs

Liu X, Wu F, Xu T, Chen Z, Zhang Y, Wang X, Gao J. Evaluating the factuality of large language models using large-scale knowledge graphs. arXiv preprint arXiv:2404.00942, 2024

work page arXiv 2024
[43]

Mitigat- ing large language model hallucinations via autonomous knowledge graph-based retrofitting

Guan X, Liu Y, Lin H, Lu Y, He B, Han X, Sun L. Mitigat- ing large language model hallucinations via autonomous knowledge graph-based retrofitting. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 18126–18134

2024
[44]

Mitigating hallucinations in large language models via self-refinement-enhanced knowledge re- trieval

Niu M, Li H, Shi J, Haddadi H, Mo F. Mitigating hallucinations in large language models via self-refinement-enhanced knowledge re- trieval. arXiv preprint arXiv:2405.06545, 2024

work page arXiv 2024
[45]

Trustful llms: Customizing and grounding text generation with knowledge bases and dual decoders

Zhu X, Mandivarapu J K. Trustful llms: Customizing and grounding text generation with knowledge bases and dual decoders. arXiv preprint arXiv:2411.07870, 2024

work page arXiv 2024
[46]

Reducing hallucinations in language model-based sparql query generation using post-generation memory retrieval

Sharma A, Lara L, Pal C J, Zouaq A. Reducing hallucinations in language model-based sparql query generation using post-generation memory retrieval. arXiv preprint arXiv:2502.13369, 2025

work page arXiv 2025
[47]

Barkley L, Merwe v. d B. Investigating the role of prompting and FrontiersofComputer Science|Issue 0|Volume 0|January 2026|1–8 Front. Comput. Sci., 2026, 0(0): 1 external tools in hallucination rates of large language models. arXiv preprint arXiv:2410.19385, 2024

work page arXiv 2026
[48]

Chain-of-thought prompting elicits reasoning in large language models

Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le Q V, Zhou D, others . Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 2022, 35: 24824–24837

2022
[49]

Large lan- guage models are zero-shot reasoners

Kojima T, Gu S S, Reid M, Matsuo Y, Iwasawa Y. Large lan- guage models are zero-shot reasoners. Advances in neural information processing systems, 2022, 35: 22199–22213

2022
[50]

Tree of thoughts: Deliberate problem solving with large language models

Yao S, Yu D, Zhao J, Shafran I, Griffiths T, Cao Y, Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 2023, 36: 11809–11822

2023
[51]

Graph of thoughts: Solving elaborate problems with large language models

Besta M, Blach N, Kubicek A, Gerstenberger R, Podstawski M, Gianinazzi L, Gajda J, Lehmann T, Niewiadomski H, Nyczyk P, others . Graph of thoughts: Solving elaborate problems with large language models. In: Proceedings of the AAAI conference on artificial intelligence. 2024, 17682–17690

2024
[52]

Everything of thoughts: Defying the law of penrose triangle for thought generation

Ding R, Zhang C, Wang L, Xu Y, Ma M, Zhang W, Qin S, Rajmohan S, Lin Q, Zhang D. Everything of thoughts: Defying the law of penrose triangle for thought generation. In: Findings of the Association for Computational Linguistics: ACL 2024. 2024, 1638– 1662

2024
[53]

Alphazero-like tree-search can guide large language model decoding and training

Wan Z, Feng X, Wen M, Mcaleer S M, Wen Y, Zhang W, Wang J. Alphazero-like tree-search can guide large language model decoding and training. In: International Conference on Machine Learning. 2024, 49890–49920

2024
[54]

Mutual reasoning makes smaller llms stronger problem-solver

Qi Z, Mingyuan M, Xu J, Zhang L L, Yang F, Yang M. Mutual reasoning makes smaller llms stronger problem-solver. In: The Thir- teenth International Conference on Learning Representations. 2024

2024
[55]

rstar-math: Small llms can master math reasoning with self- evolved deep thinking

Guan X, Zhang L L, Liu Y, Shang N, Sun Y, Zhu Y, Yang F, Yang M. rstar-math: Small llms can master math reasoning with self- evolved deep thinking. In: Forty-second International Conference on Machine Learning. 2025

2025
[56]

Rest-mcts*: Llm self-training via process reward guided tree search

Zhang D, Zhoubian S, Hu Z, Yue Y, Dong Y, Tang J. Rest-mcts*: Llm self-training via process reward guided tree search. Advances in Neural Information Processing Systems, 2024, 37: 64735–64772

2024
[57]

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B,

Zhang D, Huang X, Zhou D, Li Y, Ouyang W. Accessing gpt-4 level mathematical olympiad solutions via monte carlo tree self-refine with llama-3 8b. arXiv preprint arXiv:2406.07394, 2024

work page arXiv 2024
[58]

Toward self-improvement of llms via imagination, searching, and crit- icizing

Tian Y, Peng B, Song L, Jin L, Yu D, Han L, Mi H, Yu D. Toward self-improvement of llms via imagination, searching, and crit- icizing. Advances in Neural Information Processing Systems, 2024, 37: 52723–52748

2024
[59]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowd- hery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

work page Pith review arXiv 2022
[60]

Demystifying chains, trees, and graphs of thoughts

Besta M, Memedi F, Zhang Z, Gerstenberger R, Piao G, Blach N, Nyczyk P, Copik M, Kwa´sniewski G, M¨ uller J, others . Demystifying chains, trees, and graphs of thoughts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[61]

Unleashing the potential of prompt engineering for large language models

Chen B, Zhang Z, Langren ´e N, Zhu S. Unleashing the potential of prompt engineering for large language models. Patterns, 2025

2025
[62]

Pal: Program-aided language models

Gao L, Madaan A, Zhou S, Alon U, Liu P, Yang Y, Callan J, Neubig G. Pal: Program-aided language models. In: International Conference on Machine Learning. 2023, 10764–10799

2023
[63]

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Chen W, Ma X, Wang X, Cohen W W. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022

work page internal anchor Pith review arXiv 2022
[64]

Code prompting: a neural symbolic method for complex reasoning in large language models

Hu Y, Yang H, Lin Z, Zhang M. Code prompting: a neural symbolic method for complex reasoning in large language models. arXiv preprint arXiv:2305.18507, 2023

work page arXiv 2023
[65]

Case-based or rule-based: How do transformers do the math? In: International Conference on Machine Learning

Hu Y, Tang X, Yang H, Zhang M. Case-based or rule-based: How do transformers do the math? In: International Conference on Machine Learning. 2024, 19438–19474

2024
[66]

arXiv preprint arXiv:2502.11525 , year=

Hu Y, Kang S, Yang H, Xu H, Zhang M. Beyond single- task: Robust multi-task length generalization for llms. arXiv preprint arXiv:2502.11525, 2025

work page arXiv 2025
[67]

Graph-enhanced large language models in asynchronous plan reasoning

Lin F, La Malfa E, Hofmann V, Yang E M, Cohn A G, Pierre- humbert J B. Graph-enhanced large language models in asynchronous plan reasoning. In: International Conference on Machine Learning. 2024, 30108–30134

2024
[68]

Can graph learning improve planning in llm-based agents? Advances in Neural Information Processing Systems, 2024, 37: 5338–5383

Wu X, Shen Y, Shan C, Song K, Wang S, Zhang B, Feng J, Cheng H, Chen W, Xiong Y, others . Can graph learning improve planning in llm-based agents? Advances in Neural Information Processing Systems, 2024, 37: 5338–5383

2024
[69]

Benchmarking agentic workflow generation

Qiao S, Fang R, Qiu Z, Wang X, Zhang N, Jiang Y, Xie P, Huang F, Chen H. Benchmarking agentic workflow generation. In: The Thirteenth International Conference on Learning Representations. 2025

2025
[70]

Scaling large language model-based multi- agent collaboration

Qian C, Xie Z, Wang Y, Liu W, Zhu K, Xia H, Dang Y, Du Z, Chen W, Yang C, others . Scaling large language model-based multi- agent collaboration. In: The Thirteenth International Conference on Learning Representations. 2025

2025
[71]

Gptswarm: Language agents as optimizable graphs

Zhuge M, Wang W, Kirsch L, Faccio F, Khizbullin D, Schmid- huber J. Gptswarm: Language agents as optimizable graphs. In: Forty-first International Conference on Machine Learning. 2024

2024
[72]

Cut the crap: An economical communication pipeline for llm-based multi-agent systems.arXiv preprint arXiv:2410.02506, 2024

Zhang G, Yue Y, Li Z, Yun S, Wan G, Wang K, Cheng D, Yu J X, Chen T. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. arXiv preprint arXiv:2410.02506, 2024

work page arXiv 2024
[73]

G-designer: Architecting multi-agent communication topologies via graph neural networks.arXiv preprint arXiv:2410.11782, 2024

Zhang G, Yue Y, Sun X, Wan G, Yu M, Fang J, Wang K, Chen T, Cheng D. G-designer: Architecting multi-agent com- munication topologies via graph neural networks. arXiv preprint arXiv:2410.11782, 2024

work page arXiv 2024
[74]

Can language models solve graph problems in natural language? In: NeurIPS

Wang H, Feng S, He T, Tan Z, Han X, Tsvetkov Y. Can language models solve graph problems in natural language? In: NeurIPS. 2023

2023
[75]

Grapharena: Benchmarking large language models on graph computational problems

Tang J, Zhang Q, Li Y, Li J. Grapharena: Benchmarking large language models on graph computational problems. In: ICLR. 2025

2025
[76]

Gracore: Benchmarking graph comprehension and complex reasoning in large language models

Yuan Z, Liu M, Wang H, Qin B. Gracore: Benchmarking graph comprehension and complex reasoning in large language models. In: COLING. 2025

2025
[77]

Grapheval2000: Benchmarking and improving large language models on graph datasets

Wu Q, Chen Z, Corcoran W, Sra M, Singh A K. Grapheval2000: Benchmarking and improving large language models on graph datasets. FrontiersofComputer Science|Issue 0|Volume 0|January 2026|1–9 Xiyuan Wang et al. Position: How can Graphs Help Large Language Models? Technical report, 2024

2026
[78]

Can large language models analyze graphs like professionals? a benchmark, datasets and models

Li X, Chen W, Chu Q, Li H, Sun Z, Li R, Qian C, Wei Y, Shi C, Liu Z, others . Can large language models analyze graphs like professionals? a benchmark, datasets and models. In: NeurIPS. 2024

2024
[79]

Graphomni: A comprehensive and extendable benchmark framework for large language models on graph- theoretic tasks

Xu H, Jian X, Zhao X, Pang W, Zhang C, Wang S, Zhang Q, Monteiro J, Sun Q, Yu T. Graphomni: A comprehensive and extendable benchmark framework for large language models on graph- theoretic tasks. 2025

2025
[80]

G1: Teaching llms to reason on graphs with reinforcement learning

Guo X, Li A, Wang Y, Jegelka S, Wang Y. G1: teaching llms to reason on graphs with reinforcement learning. CoRR, 2025, abs/2505.18499

work page arXiv 2025
[81]

Evaluating large language models on graphs: Performance insights and comparative analysis, 2023

Liu C, Wu B. Evaluating large language models on graphs: Performance insights and comparative analysis, 2023

2023

Showing first 80 references.